Emergency plan for server outtages
From WormBaseWiki
This document describes emergency procedures should one of the WormBase servers crash. It doesn't cover situations where two (or more) servers are out simultaneously. I will leave that case as a test exercise in system admin and network topology for the interested reader.
Contents |
If fe.wormbase.org crashes
This is the most critical machine in the entire WormBase infrastructure. If this machine crashes, WormBase will be offline as no requests can be served to the back end machines.
Remedy 1 (suggested)
The quick and easy solution for a crash of fe.wormbase.org is to bring up gene or vab with the same IP address as fe.wormbase.org. fe.wormbase.org answers requests on port 80 with squid sending them to the correct backend server. Since vab and gene contain complete installs of the site, they can stand in for fe.wormbase.org, answering requests on port 80 with httpd, serving WormBase pages directly. The performance of the site will be subpar relative to the load-balanced infrastructure but the site will still be available. vab is the most appropriate machine to handle this task.
- Reboot vab or gene with the IP of fe.wormbase.org (143.48.220.124)
- Edit httpd.conf to Listen on 80, Port 80, and fix the virtual host as appropriate. Note: This step is only necessary as long as vab.wormbase.org is listening on port 8080. In the future, we may place vab behind the firewall and set it to listen to 80.
Remedy 2 (untested)
A second remedy is to bring up vab or gene with the same IP as fe.wormbase.org but use it as a squid reverse proxy. This will allow us to utilize the pool of back end servers providing acceptable performance. gene is the suggested machine.
1. Reboot gene with the IP address of fe.wormbase.org (143.48.220.124).
2. Disable httpd We do not want httpd handling requests on port 80
3. Uncomment fe.wormbase.org and roundrobin.wormbase.org entries in /etc/host and comment out the localhost entry
This configuration is appropriate for gene.wormbase.org but should work for vab also. In essence we are telling the machine to send roundrobin requests to localhost.
143.48.220.56 gene.wormbase.org gene # roundrobin.wormbase.org is a pseudo-host where all redirection # requests are sent. This corresponds to localhost. #143.48.220.124 roundrobin.wormbase.org #143.48.220.124 fe.wormbase.org fe
becomes
#143.48.220.56 gene.wormbase.org gene # roundrobin.wormbase.org is a pseudo-host where all redirection # requests are sent. This corresponds to localhost. 143.48.220.124 roundrobin.wormbase.org 143.48.220.124 fe.wormbase.org fe
4. Copy the initd file to the system
sudo cp /usr/local/wormbase-admin/squid/util/squid.initd /etc/rc.d/init.d/squid
5. Create the squid swap directories
sudo /usr/local/squid/sbin/squid -z -f /usr/local/wormbase-admin/squid/etc/squid.gene.wormbase.org.conf
6. Start squid
sudo /etc/rc.d/init.d/squid start
7. Test that squid is running
sudo /etc/rc.d/init.d/squid status
8. Check that the proxy is running by doing a once around about the site.
If brie6.cshl.org crashes
brie6 plays a major role at WormBase, hosting the complete site (as do all nodes) and the mailing lists. In addition, brie6 hosts www.wormbook.org and stein.cshl.edu. If brie6 fails these sites will be offline.
If gene.wormbase.org crashes
If vab.wormbase.org crashes
In case vab needs to be rebooted:
1) go to http://blademanager2.cshl.edu (need to be at CSHL or using VPN).
2) Login using username="wormbase" and password=""
3) Choose timeout period. The default 5 minutes should be sufficient
4) In the left column under "Blade Tasks" click on "Power/Restart"
5) In the right pane, put a check mark in the box next to the vab blade.
6) Below, click on "restart blade"
7) You will be asked "Are you sure you want to proceed?" Click "ok".
8) You will be told "Operation may take a few moments" Click "ok".
9) In the bottom of the left column, choose "logout".
