renoirb/WEBPLATFORM_DOCS_PRODUCTION_CURRENT.md Secret

## apache-watchdog.sh
#!/bin/bash

#
# A short version of wpd-apache-watchdog
#
# This is a watchdog script that ensures apache runs, it can be installed under two paths:
#   - /usr/local/sbin/apache-watchdog
#   - /usr/local/sbin/wpd-apache-watchdog
# Salt stack should ensure its at `/usr/local/sbin/wpd-apache-watchdog`, but could be installed manually too.
#

if [ -f /etc/no_monitor ]; then
    exit
fi

for attempt in 1 2 3 ; do
    ( echo 'HEAD /server-status HTTP/1.0'
      echo 'User-Agent: W3C/apache-watchdog'
      echo
    ) | nc -w 5 localhost 80 2> /dev/null | egrep -q '^HTTP/... [0-9][0-9][0-9] ' && exit 0
    sleep 10
done

# no response, so restart apache after killing any lingering processes
( date
  /etc/init.d/apache2 stop
  sleep 5
  killall -q apache2
  sleep 15
  killall -q apache2
  sleep 5
  /etc/init.d/apache2 start
) 2>&1 >> /tmp/apache-watchdog.log

## server_stats_tunnels.sh
#!/bin/bash

echo 'For hostnames ending with wpdn, see https://docs.webplatform.org/wiki/WPD:Infrastructure/architecture/Base_configuration_of_a_VM#Accessing_a_VM_using_SSH'


#
# Check Apache2 /server-status health through this local tunnel script.
#
# Should work on Mac OS X, on fish/zsh shell and open Google Chrome tabs to all VM Apache Status views.
#
# docs:
#   - app7: 208.113.157.134
#   - app8: 208.113.157.135
#   - app9: 208.113.157.136
#   - app10: 208.113.157.137
# webat25:
#   - webat25: 208.113.157.119
#
# Alternate app node to run test is:
#   - app11: 208.113.157.138
#
# That one isn’t exposed to Fastly, you can superseed and test locally in /etc/hosts
#
# 208.113.157.138 docs.webplatform.org
#

ssh -T -N -L 8007:127.0.0.1:80 app7 &
ssh -T -N -L 8008:127.0.0.1:80 app8 &
ssh -T -N -L 8009:127.0.0.1:80 app9 &
ssh -T -N -L 8010:127.0.0.1:80 app10 &
ssh -T -N -L 8011:127.0.0.1:80 app11 &
ssh -T -N -L 8020:127.0.0.1:80 blog.production.wpdn &
ssh -T -N -L 8030:127.0.0.1:80 project.production.wpdn &
ssh -T -N -L 8040:127.0.0.1:80 webat25 &
ssh -T -N -L 2238:127.0.0.1:80 mail.production.wpdn &

/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome 'http://127.0.0.1:8007/server-status' 'http://127.0.0.1:8008/server-status' 'http://127.0.0.1:8009/server-status' 'http://127.0.0.1:8010/server-status'  'http://127.0.0.1:8020/server-status' 'http://127.0.0.1:8030/server-status' 'http://127.0.0.1:8040/server-status' 'http://127.0.0.1:2238/cgi-bin/mailgraph.cgi' 'http://monitor.webplatform.org/ganglia'

## ssh-config.conf
# For new infra, see https://docs.webplatform.org/wiki/WPD:Infrastructure/architecture/Base_configuration_of_a_VM#Accessing_a_VM_using_SSH

## WEBPLATFORM_DOCS_PRODUCTION_CURRENT.md

      
    Raw
  

              WEBPLATFORM_DOCS_PRODUCTION_CURRENT.md
            
          
    Finding your way around

Remember that this is a brain dump of the current server setup state. It is in a big refactor to have every particularities handled automatically but isn’t ready to be rolled out until end of January.
Updated on Mar 20 2015
Migration status

DONE.
Except for webat25.org
Links to manually check service health

From time to time I open all those links in tabs and I can get a quick overview whether or not all is fine. Its a poorman’s uptime status check that I do until I get better metrics.
New production has more fine-grained checks overall. Refer to Reports to review system status.
Nevertheless, here are a few sanity checks;

https://stats.webplatform.org/index.php?module=API&method=VisitsSummary.get&idSite=1&period=day&date=today&format=JSON&token_auth=REPLACEME  (use token_auth provided in an email you should have received)
http://www.webplatform.org/talk/chatlogs/#home
http://blog.webplatform.org/2013/02/pointing-toward-the-future/
http://project.webplatform.org/blog
http://www.webat25.org/news/the-web-at-25-reflections-from-germany
https://notes.webplatform.org/
https://docs.webplatform.org/test/css/properties/border-radius  (test wiki)
https://docs.webplatform.org/wiki/css/properties/border-radius  (live wiki)
https://docs.webplatform.org/t/api.php?action=query&meta=siteinfo&siprop=statistics
https://docs.webplatform.org/w/api.php?action=query&meta=siteinfo&siprop=statistics
https://docs.webplatform.org/w/api.php?action=query&list=recentchanges&rcprop=user%7Cparsedcomment%7Cflags%7Ctimestamp%7Ctitle%7Csizes%7Credirect%7Cids%7Cloginfo&rclimit=10
https://accounts.webplatform.org/
https://profile.accounts.webplatform.org/
https://api.accounts.webplatform.org/
https://oauth.accounts.webplatform.org/
https://docs.webplatform.org/compat/data.json
https://docs.webplatform.org/compat/data-human.json

Random notes


EVERY VMs (except mail, in both new and old production) runs exim4, and relays to mail.webplatform.org, see Accessing a VM through SSH in the new documentation
ElasticSearch is ONLY required by Hypothesis, nothing else yet.
Any non vital, or migrated to the new cluster, VMs  are stopped
To see which VMs runs, from salt.webplatform.org, use nova list

Backups

Should be handled automatically just fine

backup role VM type rsyncs from root cronjob what is on both hosts salt AND masterdb

db1-masterdb:

what: MySQL databases
Crontabs defined in: salt:/srv/salt/backup/db.sls
Script: /usr/local/sbin/db.sh as root cronjob


Logs and poking around

For logs, refer to Centralized logging in the new documentation. Both new and old production has receives logs through UDP and the documentation is valid in both clusters.
Other poking in new production can be done by following what’s described in Reports to review system status.
Ganglia

htop
sar  // sysstat is only in old infrastructure
netstat -tulpn
lsof -P -i -n | cut -f 1 -d " "| uniq | tail -n +2
lsof -P -i -n
lsof -w -l
initctl list | grep running
netstat -ntu | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -n
Note, those commands are my favourite "lazy" commands aliases I’m gathering. They are available as either wpd-lsof (they’ll be renamed as lazy-lsof). To get them, you can look in /etc/profile.d/wpd_aliases.sh in both old and new environments.
New production has more fine-grained metrics and uptime checks system. Refer to Reports to review system status.
As said in The Salt Master, at Centralized logging its not an ideal solution. It should be fixed by webplatform/ops#117.
Apache server-status


appN, webat25 runs it
To check apache health, look at the script server_stats_tunnels.sh below and use Apache /server-status from tunnel
The script wpd-apache-watchdog uses it to see if apache2 is running, logs restarts in /tmp/apache-watchdog.log. See script below apache-watchdog.sh

New production has more fine-grained apache/nginx checks. Refer to Reports to review system status.
Monit

Some VMs has Monit to ensure services are UP and restarts it for us. It will do basicall what apache-watchdog does, but won’t be limited to checking if an HTTP server responds on localhost port 80 and restart the apache service.
New production has Monit accros the board. Refer to Reports to review system status.
MailGraph

Moved into new production.
Refer to Reports to review system status.
VM types

Refer to Roles and environment levels page.
appN

See also new architecture documentation for app
Web application generic application container. Currently runs from Ubuntu 14.04, serving HTTP requests from Apache2 2.4.x with MPM Prefork.
Note salt commands below are examples to run them against app11. To deploy to production; you will have to deploy on each other app nodes available. To list them, run salt-run manage.status on salt VM.

Code:

Deploy all code new production: wpd-deploy app
Homepage:

VHost: /etc/apache2/sites-enabled/00-webplatform.conf
DocRoot: /var/www
Deploy command: salt app11 state.sls code.root
Salt Master code deployment sls: salt:/srv/salt/code/root.sls
Salt Master code clone: salt:/srv/code/www


MediaWiki:

VHost: /etc/apache2/sites-enabled/01-docs.conf
DocRoot: /srv/webplatform/wiki/wpwiki
Salt Master deploy command: salt app11 state.sls code.docs_nextgen
Salt Master code deployment sls: salt:/srv/salt/code/docs_nextgen.sls
Salt master code clone: salt:/srv/code/docs/nextgen


WebPlatform.com:

VHost: /etc/apache2/sites-enabled/05-webplatform-com.conf
DocRoot: /srv/webplatform/webplatform-com/out
Salt Master deploy command: salt app11 state.sls code.root-com
Salt Master code deployment sls: salt:/srv/salt/code/root-com.sls
Salt master code clone: None, only a static file in salt:/srv/salt/code/files/root-com/index.html


Dabblet:

VHost: /etc/apache2/sites-enabled/09-dabblet.conf
DocRoot: /srv/webplatform/dabblet
Salt Master deploy command: salt app11 state.sls code.dabblet
Salt Master code deployment sls: salt:/srv/salt/code/dabblet.sls
Salt master code clone: salt:/srv/code/dabblet/


LumberJack Web UI:

VHost: An alias in /etc/apache2/sites-enabled/00-webplatform.conf
DocRoot: /srv/webplatform/bots/lumberjack
Salt master code clone: salt:/srv/code/bots/lumberjack/
Note: Nothing should be needed to change here, its a sketchy zone for now.


Relies on VMs (service):

db (MySQL server, for LumberJack Web UI)
memcache (Memcache)


Health checks in MediaWiki below, only a wpd-apache-watchdog script through cron

accounts

See also new architecture documentation for accounts
Its the upcoming accounts system we are using, currently only in use for notes.webplatform.org. Software is a fork of Mozilla Firefox Accounts (a.k.a. FxA).

Code  (Listed in prefered startup order):

Deploy all code: salt accounts state.sls code.accounts
VHost: /etc/nginx/sites-enabled/accounts
OAuth:

DocRoot: /srv/webplatform/auth/fxa-oauth-server
Init script: /etc/init/fxa-oauth-server.conf
Restart command: monit restart fxa-oauth-server


Auth:

DocRoot: /srv/webplatform/auth/fxa-auth-server
Init script: /etc/init/fxa-auth-server.conf
Restart command: monit restart fxa-auth-server


Content:

DocRoot: /srv/webplatform/auth/fxa-content-server
Init script: /etc/init/fxa-content-server.conf
Restart command: monit restart fxa-content-server


Profile:

DocRoot: /srv/webplatform/auth/fxa-profile-server
Init script: /etc/init/fxa-profile-server.conf
Restart command: monit restart fxa-profile-server


Local services:

fxa-oauth-server
fxa-auth-server
fxa-content-server
nginx
monit


Relies on VMs (service):

masterdb (mysql)


Health checks in Accounts below

notesN

See also new architecture documentation for notes

Code:

Hypothesis:

VHost: /etc/nginx/sites-enabled/notes
DocRoot: /srv/webplatform/notes-server
Restart command: monit restart hypothesis-service
Salt Master code clone: None, manual clone at the moment


Local services:

hypothesis
nginx
monit


Relies on VMs (service):

accounts (fxa-content-server, fxa-auth-server, fxa-oauth-server, fxa-profile-server)
elastic1 (elasticsearch)


Health checks in Hypothesis below

blogN

See also new architecture documentation for blog

Code:

Deploy all code: wpd-deploy blog
WordPress:

VHost: /etc/apache2/sites-enabled/blog
DocRoot: /srv/webplatform/blog/current


project

See also new architecture documentation for accounts

Code:

Deploy all code: wpd-deploy project
BugGenie:

VHost: /etc/apache2/sites-enabled/buggenie
DocRoot: /srv/webplatform/buggenie
Salt master code clone: salt:/srv/code/buggenie/


botsN

See also new architecture documentation for bots
It only runs a custom Python IRC logger that was called LumberJack, now known as Pierc. But we are using our own fork. That service will be phased out with something else soon.
There are two components, a web viewer (in php) and a lister (in Python)

Code:

LumberJack:

Clone: /srv/webplatform/lumberjack
Init script: /etc/init/lumberjack.conf
Restart command: service lumberjack restart


Local services:

lumberjack


webat25

Not migrated. It wont be, will run as is until end of May.
Do not invest anything here. The full site will be exported as static site in a few months.

Code:

ExpressionEngine:

VHost: /etc/apache2/sites-enabled/buggenie
DocRoot: /srv/webplatform/web25ee/
Salt master code clone: salt:/srv/code/web25ee/


Relies on VMs (service):

db4 (mysql)
memcacheN (Memcached) see /etc/php5/conf.d/memcached.ini


Health checks in ExpressionEngine below, only a wpd-apache-watchdog script through cron

Web apps

Refer to Roles and environment levels page. Concepts are the same in both old and new production.
Let’s keep those notes in case of need;
MediaWiki


Hosted on VMs with role app
Typical URLs:

Main wiki is docs.webplatform.org/docs/ is the main one, called wpwiki
Test wiki is docs.webplatform.org/test/ is the main one, called wptestwiki


Exposed by Fastly, to test and see associations refer to server_stats_tunnels.sh and hosts.txt  below
Main wiki config on Salt Master server (salt.webplatform.org) is /srv/salt/code/files/docs/wpwiki.php.jinja

gets renamed as  /srv/webplatform/wiki/wpwiki/LocalSettings.php on appN VMs
Handled by salt state in /srv/salt/code/docs_nextgen.sls


File /srv/webplatform/wiki/Settings.php is called by both wikis (wpwiki, wptestwiki)

Deployment server is /srv/salt/code/files/docs/Settings.php.jinja
gets renamed as  /srv/webplatform/wiki/Settings.php on  appN VMs
Handled by salt state in /srv/salt/code/docs_nextgen.sls
Can be called like this salt app8 state.sls code.docs_nextgen


Main wiki config file is in /srv/webplatform/wiki/wpwiki/LocalSettings.php which sets database config and how to get image uploads
To check apache health, look at the script server_stats_tunnels.sh below and use Apache /server-status from tunnel
Health checks:

root crontab runs /usr/local/sbin/wpd-apache-watchdog every 2 minutes, restarts are logged in /tmp/apache-watchdog.log, see apache-watchdog.sh below.


Hypothesis


Hosted on VMs with role notes


Typical URL is notes.webplatform.org


NOT Exposed by Fastly, to get IPs use nova list from salt.webplatform.org


Served directly from NGINX


Configs:

/srv/webplatform/notes-server/production.ini


Health checks through Monit:
  root@accounts:~# monit summary
  The Monit daemon 5.6 uptime: 1d 11h 44m
  System 'notes.webplatform.org'      Running
  Remote Host 'elasticsearch-remote'  Online with all services
  Remote Host 'hypothesis-service'    Online with all services
  Process 'nginx'                     Running
  File 'nginx_bin'                    Accessible
  File 'nginx_rc'                     Accessible

Checks configs are described in /etc/monit/conf.d/hypothesis.


Accounts (a.k.a. FxA)


Hosted on VMs with role accounts


NOT Exposed by Fastly, to get IPs use nova list


Served directly from NGINX


Typical URLs:

accounts.webplatform.org (a.k.a. fxa-content-server)
oauth.accounts.webplatform.org (a.k.a. fxa-oauth-server)
api.accounts.webplatform.org (a.k.a. fxa-auth-server)
profile.accounts.webplatform.org (a.k.a. fxa-profile-server)


Configs:

fxa-content-server: /srv/webplatform/auth/fxa-content-server/server/config/production.json
fxa-auth-server: /srv/webplatform/auth/fxa-auth-server/config/prod.json
fxa-profile-server: /srv/webplatform/auth/fxa-profile-server/config/prod.json
fxa-oauth-server: /srv/webplatform/auth/fxa-oauth-server/config/prod.json


Health checks, through Monit:
  root@accounts:~# monit summary
  The Monit daemon 5.6 uptime: 4h 20m
  Remote Host 'fxa-profile-server'    Online with all services
  Program 'fxa-profile-server-check'  Status ok
  Remote Host 'fxa-oauth-server'      Online with all services
  Remote Host 'fxa-content-server'    Online with all services
  Remote Host 'fxa-auth-server'       Online with all services
  System 'accounts.webplatform.org'   Running
  Process 'nginx'                     Running
  File 'nginx_bin'                    Accessible
  File 'nginx_rc'                     Accessible

Checks configs are described in /etc/monit/conf.d/*.


WordPress


Hosted on VMs with role blog
Typical URL is blog.webplatform.org/docs/
Exposed by Fastly, to test and see associations refer to server_stats_tunnels.sh
Configs:

Main config: /srv/webplatform/blog/current/wp-config.php
Code in VM: /srv/webplatform/blog/current/
Code in Deployment: none. Its currently a git clone from WordPress GitHub mirror, theme in Deployment:/srv/code/blog/webplatform-wordpress-theme/ as /srv/webplatform/blog/current/wp-content/themes/webplatform/


Health checks:

root crontab runs /usr/local/sbin/wpd-apache-watchdog every 2 minutes, restarts are logged in /tmp/apache-watchdog.log, see apache-watchdog.sh below.


BugGenie


Hosted on VMs with role project
Typical URL is project.webplatform.org
Exposed by Fastly, to test and see associations refer to server_stats_tunnels.sh
Configs:

/srv/webplatform/buggenie/core/b2db_bootstrap.inc.php
/srv/webplatform/buggenie/installed (if you have to reinstall, BugGenie checks this)


Health checks:

root crontab runs /usr/local/sbin/wpd-apache-watchdog every 2 minutes, restarts are logged in /tmp/apache-watchdog.log, see apache-watchdog.sh below.


LumberJack (an IRC logger and web UI)


Hosted on VMs with role bots (listener)
Hosted on VMs with role app (web viewer)
Typical URL is www.webplatform.org/talk/chatlogs
Exposed by Fastly, to test and see associations refer to server_stats_tunnels.sh
Two components:

Web UI, hosted on appN VMs
Listener daemon running on bots VM,  as LumberJack


Configs:

appN:/srv/webplatform/bots/lumberjack/config.php
bots:/srv/webplatform/lumberjack/mysql_config.txt


Health checks:

root crontab runs /usr/local/sbin/wpd-apache-watchdog every 2 minutes, restarts are logged in /tmp/apache-watchdog.log, see apache-watchdog.sh below.


ExpressionEngine

Not migrated. Won’t be.

Hosted on webat25 (only one, will be replaced by a static version after holidays)
Typical URL are:

www.webat25.org through Fastly
ee.webat25.org (only for EE CMS, no caching)


Exposed by Fastly, to test and see associations refer to server_stats_tunnels.sh and hosts.txt  below
Configs:

/srv/webplatform/web25ee/backoffice/expressionengine/config/database.php
/srv/webplatform/web25ee/backoffice/expressionengine/config/config.php


Health checks:

root crontab runs /usr/local/sbin/wpd-apache-watchdog every 2 minutes, restarts are logged in /tmp/apache-watchdog.log, see apache-watchdog.sh below.
	#!/bin/bash

	#
	# A short version of wpd-apache-watchdog
	#
	# This is a watchdog script that ensures apache runs, it can be installed under two paths:
	# - /usr/local/sbin/apache-watchdog
	# - /usr/local/sbin/wpd-apache-watchdog
	# Salt stack should ensure its at `/usr/local/sbin/wpd-apache-watchdog`, but could be installed manually too.
	#

	if [ -f /etc/no_monitor ]; then
	exit
	fi

	for attempt in 1 2 3 ; do
	( echo 'HEAD /server-status HTTP/1.0'
	echo 'User-Agent: W3C/apache-watchdog'
	echo
	) \| nc -w 5 localhost 80 2> /dev/null \| egrep -q '^HTTP/... [0-9][0-9][0-9] ' && exit 0
	sleep 10
	done

	# no response, so restart apache after killing any lingering processes
	( date
	/etc/init.d/apache2 stop
	sleep 5
	killall -q apache2
	sleep 15
	killall -q apache2
	sleep 5
	/etc/init.d/apache2 start
	) 2>&1 >> /tmp/apache-watchdog.log
	#!/bin/bash

	echo 'For hostnames ending with wpdn, see https://docs.webplatform.org/wiki/WPD:Infrastructure/architecture/Base_configuration_of_a_VM#Accessing_a_VM_using_SSH'


	#
	# Check Apache2 /server-status health through this local tunnel script.
	#
	# Should work on Mac OS X, on fish/zsh shell and open Google Chrome tabs to all VM Apache Status views.
	#
	# docs:
	# - app7: 208.113.157.134
	# - app8: 208.113.157.135
	# - app9: 208.113.157.136
	# - app10: 208.113.157.137
	# webat25:
	# - webat25: 208.113.157.119
	#
	# Alternate app node to run test is:
	# - app11: 208.113.157.138
	#
	# That one isn’t exposed to Fastly, you can superseed and test locally in /etc/hosts
	#
	# 208.113.157.138 docs.webplatform.org
	#

	ssh -T -N -L 8007:127.0.0.1:80 app7 &
	ssh -T -N -L 8008:127.0.0.1:80 app8 &
	ssh -T -N -L 8009:127.0.0.1:80 app9 &
	ssh -T -N -L 8010:127.0.0.1:80 app10 &
	ssh -T -N -L 8011:127.0.0.1:80 app11 &
	ssh -T -N -L 8020:127.0.0.1:80 blog.production.wpdn &
	ssh -T -N -L 8030:127.0.0.1:80 project.production.wpdn &
	ssh -T -N -L 8040:127.0.0.1:80 webat25 &
	ssh -T -N -L 2238:127.0.0.1:80 mail.production.wpdn &

	/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome 'http://127.0.0.1:8007/server-status' 'http://127.0.0.1:8008/server-status' 'http://127.0.0.1:8009/server-status' 'http://127.0.0.1:8010/server-status' 'http://127.0.0.1:8020/server-status' 'http://127.0.0.1:8030/server-status' 'http://127.0.0.1:8040/server-status' 'http://127.0.0.1:2238/cgi-bin/mailgraph.cgi' 'http://monitor.webplatform.org/ganglia'