Skip to content

Instantly share code, notes, and snippets.

@renoirb
Last active August 29, 2015 14:11
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save renoirb/0601115cfbe10284e3da to your computer and use it in GitHub Desktop.
Save renoirb/0601115cfbe10284e3da to your computer and use it in GitHub Desktop.
WebPlatform Production guard scripts
#!/bin/bash
#
# A short version of wpd-apache-watchdog
#
# This is a watchdog script that ensures apache runs, it can be installed under two paths:
# - /usr/local/sbin/apache-watchdog
# - /usr/local/sbin/wpd-apache-watchdog
# Salt stack should ensure its at `/usr/local/sbin/wpd-apache-watchdog`, but could be installed manually too.
#
if [ -f /etc/no_monitor ]; then
exit
fi
for attempt in 1 2 3 ; do
( echo 'HEAD /server-status HTTP/1.0'
echo 'User-Agent: W3C/apache-watchdog'
echo
) | nc -w 5 localhost 80 2> /dev/null | egrep -q '^HTTP/... [0-9][0-9][0-9] ' && exit 0
sleep 10
done
# no response, so restart apache after killing any lingering processes
( date
/etc/init.d/apache2 stop
sleep 5
killall -q apache2
sleep 15
killall -q apache2
sleep 5
/etc/init.d/apache2 start
) 2>&1 >> /tmp/apache-watchdog.log
#!/bin/bash
echo 'For hostnames ending with wpdn, see https://docs.webplatform.org/wiki/WPD:Infrastructure/architecture/Base_configuration_of_a_VM#Accessing_a_VM_using_SSH'
#
# Check Apache2 /server-status health through this local tunnel script.
#
# Should work on Mac OS X, on fish/zsh shell and open Google Chrome tabs to all VM Apache Status views.
#
# docs:
# - app7: 208.113.157.134
# - app8: 208.113.157.135
# - app9: 208.113.157.136
# - app10: 208.113.157.137
# webat25:
# - webat25: 208.113.157.119
#
# Alternate app node to run test is:
# - app11: 208.113.157.138
#
# That one isn’t exposed to Fastly, you can superseed and test locally in /etc/hosts
#
# 208.113.157.138 docs.webplatform.org
#
ssh -T -N -L 8007:127.0.0.1:80 app7 &
ssh -T -N -L 8008:127.0.0.1:80 app8 &
ssh -T -N -L 8009:127.0.0.1:80 app9 &
ssh -T -N -L 8010:127.0.0.1:80 app10 &
ssh -T -N -L 8011:127.0.0.1:80 app11 &
ssh -T -N -L 8020:127.0.0.1:80 blog.production.wpdn &
ssh -T -N -L 8030:127.0.0.1:80 project.production.wpdn &
ssh -T -N -L 8040:127.0.0.1:80 webat25 &
ssh -T -N -L 2238:127.0.0.1:80 mail.production.wpdn &
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome 'http://127.0.0.1:8007/server-status' 'http://127.0.0.1:8008/server-status' 'http://127.0.0.1:8009/server-status' 'http://127.0.0.1:8010/server-status' 'http://127.0.0.1:8020/server-status' 'http://127.0.0.1:8030/server-status' 'http://127.0.0.1:8040/server-status' 'http://127.0.0.1:2238/cgi-bin/mailgraph.cgi' 'http://monitor.webplatform.org/ganglia'
# For new infra, see https://docs.webplatform.org/wiki/WPD:Infrastructure/architecture/Base_configuration_of_a_VM#Accessing_a_VM_using_SSH

Finding your way around

Remember that this is a brain dump of the current server setup state. It is in a big refactor to have every particularities handled automatically but isn’t ready to be rolled out until end of January.

Updated on Mar 20 2015

Migration status

DONE.

Except for webat25.org

Links to manually check service health

From time to time I open all those links in tabs and I can get a quick overview whether or not all is fine. Its a poorman’s uptime status check that I do until I get better metrics.

New production has more fine-grained checks overall. Refer to Reports to review system status.

Nevertheless, here are a few sanity checks;

Random notes

  1. EVERY VMs (except mail, in both new and old production) runs exim4, and relays to mail.webplatform.org, see Accessing a VM through SSH in the new documentation
  2. ElasticSearch is ONLY required by Hypothesis, nothing else yet.
  3. Any non vital, or migrated to the new cluster, VMs are stopped
  4. To see which VMs runs, from salt.webplatform.org, use nova list

Backups

Should be handled automatically just fine

  • backup role VM type rsyncs from root cronjob what is on both hosts salt AND masterdb
    • db1-masterdb:
      • what: MySQL databases
      • Crontabs defined in: salt:/srv/salt/backup/db.sls
      • Script: /usr/local/sbin/db.sh as root cronjob

Logs and poking around

For logs, refer to Centralized logging in the new documentation. Both new and old production has receives logs through UDP and the documentation is valid in both clusters.

Other poking in new production can be done by following what’s described in Reports to review system status.

Ganglia

htop
sar  // sysstat is only in old infrastructure
netstat -tulpn
lsof -P -i -n | cut -f 1 -d " "| uniq | tail -n +2
lsof -P -i -n
lsof -w -l
initctl list | grep running
netstat -ntu | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -n

Note, those commands are my favourite "lazy" commands aliases I’m gathering. They are available as either wpd-lsof (they’ll be renamed as lazy-lsof). To get them, you can look in /etc/profile.d/wpd_aliases.sh in both old and new environments.

New production has more fine-grained metrics and uptime checks system. Refer to Reports to review system status.

As said in The Salt Master, at Centralized logging its not an ideal solution. It should be fixed by webplatform/ops#117.

Apache server-status

  • appN, webat25 runs it
  • To check apache health, look at the script server_stats_tunnels.sh below and use Apache /server-status from tunnel
  • The script wpd-apache-watchdog uses it to see if apache2 is running, logs restarts in /tmp/apache-watchdog.log. See script below apache-watchdog.sh

New production has more fine-grained apache/nginx checks. Refer to Reports to review system status.

Monit

Some VMs has Monit to ensure services are UP and restarts it for us. It will do basicall what apache-watchdog does, but won’t be limited to checking if an HTTP server responds on localhost port 80 and restart the apache service.

New production has Monit accros the board. Refer to Reports to review system status.

MailGraph

Moved into new production.

Refer to Reports to review system status.

VM types

Refer to Roles and environment levels page.

appN

See also new architecture documentation for app

Web application generic application container. Currently runs from Ubuntu 14.04, serving HTTP requests from Apache2 2.4.x with MPM Prefork.

Note salt commands below are examples to run them against app11. To deploy to production; you will have to deploy on each other app nodes available. To list them, run salt-run manage.status on salt VM.

  • Code:
    • Deploy all code new production: wpd-deploy app
    • Homepage:
      • VHost: /etc/apache2/sites-enabled/00-webplatform.conf
      • DocRoot: /var/www
      • Deploy command: salt app11 state.sls code.root
      • Salt Master code deployment sls: salt:/srv/salt/code/root.sls
      • Salt Master code clone: salt:/srv/code/www
    • MediaWiki:
      • VHost: /etc/apache2/sites-enabled/01-docs.conf
      • DocRoot: /srv/webplatform/wiki/wpwiki
      • Salt Master deploy command: salt app11 state.sls code.docs_nextgen
      • Salt Master code deployment sls: salt:/srv/salt/code/docs_nextgen.sls
      • Salt master code clone: salt:/srv/code/docs/nextgen
    • WebPlatform.com:
      • VHost: /etc/apache2/sites-enabled/05-webplatform-com.conf
      • DocRoot: /srv/webplatform/webplatform-com/out
      • Salt Master deploy command: salt app11 state.sls code.root-com
      • Salt Master code deployment sls: salt:/srv/salt/code/root-com.sls
      • Salt master code clone: None, only a static file in salt:/srv/salt/code/files/root-com/index.html
    • Dabblet:
      • VHost: /etc/apache2/sites-enabled/09-dabblet.conf
      • DocRoot: /srv/webplatform/dabblet
      • Salt Master deploy command: salt app11 state.sls code.dabblet
      • Salt Master code deployment sls: salt:/srv/salt/code/dabblet.sls
      • Salt master code clone: salt:/srv/code/dabblet/
    • LumberJack Web UI:
      • VHost: An alias in /etc/apache2/sites-enabled/00-webplatform.conf
      • DocRoot: /srv/webplatform/bots/lumberjack
      • Salt master code clone: salt:/srv/code/bots/lumberjack/
      • Note: Nothing should be needed to change here, its a sketchy zone for now.
  • Relies on VMs (service):
    • db (MySQL server, for LumberJack Web UI)
    • memcache (Memcache)
  • Health checks in MediaWiki below, only a wpd-apache-watchdog script through cron

accounts

See also new architecture documentation for accounts

Its the upcoming accounts system we are using, currently only in use for notes.webplatform.org. Software is a fork of Mozilla Firefox Accounts (a.k.a. FxA).

  • Code (Listed in prefered startup order):
    • Deploy all code: salt accounts state.sls code.accounts
    • VHost: /etc/nginx/sites-enabled/accounts
    • OAuth:
      • DocRoot: /srv/webplatform/auth/fxa-oauth-server
      • Init script: /etc/init/fxa-oauth-server.conf
      • Restart command: monit restart fxa-oauth-server
    • Auth:
      • DocRoot: /srv/webplatform/auth/fxa-auth-server
      • Init script: /etc/init/fxa-auth-server.conf
      • Restart command: monit restart fxa-auth-server
    • Content:
      • DocRoot: /srv/webplatform/auth/fxa-content-server
      • Init script: /etc/init/fxa-content-server.conf
      • Restart command: monit restart fxa-content-server
    • Profile:
      • DocRoot: /srv/webplatform/auth/fxa-profile-server
      • Init script: /etc/init/fxa-profile-server.conf
      • Restart command: monit restart fxa-profile-server
  • Local services:
    • fxa-oauth-server
    • fxa-auth-server
    • fxa-content-server
    • nginx
    • monit
  • Relies on VMs (service):
    • masterdb (mysql)
  • Health checks in Accounts below

notesN

See also new architecture documentation for notes

  • Code:
    • Hypothesis:
      • VHost: /etc/nginx/sites-enabled/notes
      • DocRoot: /srv/webplatform/notes-server
      • Restart command: monit restart hypothesis-service
      • Salt Master code clone: None, manual clone at the moment
  • Local services:
    • hypothesis
    • nginx
    • monit
  • Relies on VMs (service):
    • accounts (fxa-content-server, fxa-auth-server, fxa-oauth-server, fxa-profile-server)
    • elastic1 (elasticsearch)
  • Health checks in Hypothesis below

blogN

See also new architecture documentation for blog

  • Code:
    • Deploy all code: wpd-deploy blog
    • WordPress:
      • VHost: /etc/apache2/sites-enabled/blog
      • DocRoot: /srv/webplatform/blog/current

project

See also new architecture documentation for accounts

  • Code:
    • Deploy all code: wpd-deploy project
    • BugGenie:
      • VHost: /etc/apache2/sites-enabled/buggenie
      • DocRoot: /srv/webplatform/buggenie
      • Salt master code clone: salt:/srv/code/buggenie/

botsN

See also new architecture documentation for bots

It only runs a custom Python IRC logger that was called LumberJack, now known as Pierc. But we are using our own fork. That service will be phased out with something else soon.

There are two components, a web viewer (in php) and a lister (in Python)

  • Code:
    • LumberJack:
      • Clone: /srv/webplatform/lumberjack
      • Init script: /etc/init/lumberjack.conf
      • Restart command: service lumberjack restart
  • Local services:
    • lumberjack

webat25

Not migrated. It wont be, will run as is until end of May.

Do not invest anything here. The full site will be exported as static site in a few months.

  • Code:
    • ExpressionEngine:
      • VHost: /etc/apache2/sites-enabled/buggenie
      • DocRoot: /srv/webplatform/web25ee/
      • Salt master code clone: salt:/srv/code/web25ee/
  • Relies on VMs (service):
    • db4 (mysql)
    • memcacheN (Memcached) see /etc/php5/conf.d/memcached.ini
  • Health checks in ExpressionEngine below, only a wpd-apache-watchdog script through cron

Web apps

Refer to Roles and environment levels page. Concepts are the same in both old and new production.

Let’s keep those notes in case of need;

MediaWiki

  • Hosted on VMs with role app
  • Typical URLs:
  • Exposed by Fastly, to test and see associations refer to server_stats_tunnels.sh and hosts.txt below
  • Main wiki config on Salt Master server (salt.webplatform.org) is /srv/salt/code/files/docs/wpwiki.php.jinja
    • gets renamed as /srv/webplatform/wiki/wpwiki/LocalSettings.php on appN VMs
    • Handled by salt state in /srv/salt/code/docs_nextgen.sls
  • File /srv/webplatform/wiki/Settings.php is called by both wikis (wpwiki, wptestwiki)
    • Deployment server is /srv/salt/code/files/docs/Settings.php.jinja
    • gets renamed as /srv/webplatform/wiki/Settings.php on appN VMs
    • Handled by salt state in /srv/salt/code/docs_nextgen.sls
    • Can be called like this salt app8 state.sls code.docs_nextgen
  • Main wiki config file is in /srv/webplatform/wiki/wpwiki/LocalSettings.php which sets database config and how to get image uploads
  • To check apache health, look at the script server_stats_tunnels.sh below and use Apache /server-status from tunnel
  • Health checks:
    • root crontab runs /usr/local/sbin/wpd-apache-watchdog every 2 minutes, restarts are logged in /tmp/apache-watchdog.log, see apache-watchdog.sh below.

Hypothesis

  • Hosted on VMs with role notes

  • Typical URL is notes.webplatform.org

  • NOT Exposed by Fastly, to get IPs use nova list from salt.webplatform.org

  • Served directly from NGINX

  • Configs:

    • /srv/webplatform/notes-server/production.ini
  • Health checks through Monit:

      root@accounts:~# monit summary
      The Monit daemon 5.6 uptime: 1d 11h 44m
      System 'notes.webplatform.org'      Running
      Remote Host 'elasticsearch-remote'  Online with all services
      Remote Host 'hypothesis-service'    Online with all services
      Process 'nginx'                     Running
      File 'nginx_bin'                    Accessible
      File 'nginx_rc'                     Accessible
    

    Checks configs are described in /etc/monit/conf.d/hypothesis.

Accounts (a.k.a. FxA)

  • Hosted on VMs with role accounts

  • NOT Exposed by Fastly, to get IPs use nova list

  • Served directly from NGINX

  • Typical URLs:

  • Configs:

    • fxa-content-server: /srv/webplatform/auth/fxa-content-server/server/config/production.json
    • fxa-auth-server: /srv/webplatform/auth/fxa-auth-server/config/prod.json
    • fxa-profile-server: /srv/webplatform/auth/fxa-profile-server/config/prod.json
    • fxa-oauth-server: /srv/webplatform/auth/fxa-oauth-server/config/prod.json
  • Health checks, through Monit:

      root@accounts:~# monit summary
      The Monit daemon 5.6 uptime: 4h 20m
      Remote Host 'fxa-profile-server'    Online with all services
      Program 'fxa-profile-server-check'  Status ok
      Remote Host 'fxa-oauth-server'      Online with all services
      Remote Host 'fxa-content-server'    Online with all services
      Remote Host 'fxa-auth-server'       Online with all services
      System 'accounts.webplatform.org'   Running
      Process 'nginx'                     Running
      File 'nginx_bin'                    Accessible
      File 'nginx_rc'                     Accessible
    

    Checks configs are described in /etc/monit/conf.d/*.

WordPress

  • Hosted on VMs with role blog
  • Typical URL is blog.webplatform.org/docs/
  • Exposed by Fastly, to test and see associations refer to server_stats_tunnels.sh
  • Configs:
    • Main config: /srv/webplatform/blog/current/wp-config.php
    • Code in VM: /srv/webplatform/blog/current/
    • Code in Deployment: none. Its currently a git clone from WordPress GitHub mirror, theme in Deployment:/srv/code/blog/webplatform-wordpress-theme/ as /srv/webplatform/blog/current/wp-content/themes/webplatform/
  • Health checks:
    • root crontab runs /usr/local/sbin/wpd-apache-watchdog every 2 minutes, restarts are logged in /tmp/apache-watchdog.log, see apache-watchdog.sh below.

BugGenie

  • Hosted on VMs with role project
  • Typical URL is project.webplatform.org
  • Exposed by Fastly, to test and see associations refer to server_stats_tunnels.sh
  • Configs:
    • /srv/webplatform/buggenie/core/b2db_bootstrap.inc.php
    • /srv/webplatform/buggenie/installed (if you have to reinstall, BugGenie checks this)
  • Health checks:
    • root crontab runs /usr/local/sbin/wpd-apache-watchdog every 2 minutes, restarts are logged in /tmp/apache-watchdog.log, see apache-watchdog.sh below.

LumberJack (an IRC logger and web UI)

  • Hosted on VMs with role bots (listener)
  • Hosted on VMs with role app (web viewer)
  • Typical URL is www.webplatform.org/talk/chatlogs
  • Exposed by Fastly, to test and see associations refer to server_stats_tunnels.sh
  • Two components:
    • Web UI, hosted on appN VMs
    • Listener daemon running on bots VM, as LumberJack
  • Configs:
    • appN:/srv/webplatform/bots/lumberjack/config.php
    • bots:/srv/webplatform/lumberjack/mysql_config.txt
  • Health checks:
    • root crontab runs /usr/local/sbin/wpd-apache-watchdog every 2 minutes, restarts are logged in /tmp/apache-watchdog.log, see apache-watchdog.sh below.

ExpressionEngine

Not migrated. Won’t be.

  • Hosted on webat25 (only one, will be replaced by a static version after holidays)
  • Typical URL are:
  • Exposed by Fastly, to test and see associations refer to server_stats_tunnels.sh and hosts.txt below
  • Configs:
    • /srv/webplatform/web25ee/backoffice/expressionengine/config/database.php
    • /srv/webplatform/web25ee/backoffice/expressionengine/config/config.php
  • Health checks:
    • root crontab runs /usr/local/sbin/wpd-apache-watchdog every 2 minutes, restarts are logged in /tmp/apache-watchdog.log, see apache-watchdog.sh below.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment