Skip to content

Instantly share code, notes, and snippets.

@rdev5
Last active May 8, 2018 19:28
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rdev5/94664668715dce0018b506c083e924d0 to your computer and use it in GitHub Desktop.
Save rdev5/94664668715dce0018b506c083e924d0 to your computer and use it in GitHub Desktop.
Notes on upgrading SnaptOS and load balancer migration

Upgrading SnaptOS

Upgrade: openSUSE Leap 42.1 -> 42.3

Base Setup

  1. Provision a new instance using the latest image (https://downloads.snapt.net/) and a demo license if necessary
  2. Verify OS version and architecture of new instance (/etc/os-release, uname -a)
  3. Apply baseline server configuration and hardening:
    • Change the administrator password (Snapt UI)
    • Change shell account passwords (passwd && sudo passwd)
    • Deploy ~/.ssh/authorized_keys for public key authentication (SSH)
    • Deploy /etc/ssh/sshd_config to disable password authentication and bind to specific ListenAddress
    • Deploy /etc/lighttpd/lighttpd.conf to disable external access to insecure HTTP on port 8080
  4. Complete pending items in Status Check in Snapt UI
    • Framework Update
    • System Patches
      • All non-interactive (may require several attempts to install all patches; inspect with ps aux | grep zypper)
      • All interactive (sudo zypper patch and reboot)
    • Time Sync
  5. Install plugins to match master server, including Redundancy V2

Configuration Export

  • Add new instance as a new slave node to the master server:
    • Redundancy Servers
    • Local Replication
  • On the master server, perform a Force Sync to export configuration to the new slave node (inspect ARP with sudo tail -f /var/log/messages)
  • On the new slave node, ensure all services are running:
    • Balancer (HAProxy)
    • Accelerator (NGINX)

Promoting an upgraded slave node to master

It is important to note that not all server settings are copied over during a Force Sync. The following areas MUST be reviewed to ensure the new master node has the appropriate configuration:

  • Ensure all slave nodes are up-to-date on slave nodes (some plugins and system patches available)
  • Ensure all services on master node are properly configured, tested, and reloaded
  • Cleanup extra files on slave node file systems (i.e. /etc/nginx/...)
  • Perform Force Sync from master node to ensure configuration has replicated
  • Reload all services on all slave nodes to ensure configuration has been applied
  • Compare and manually resolve any configuration discrepancies between master and slave nodes, including (but not limited to):
    • Setup -> Configuration -> Snapt Configuration
      • Remember to turn HTTP Access off once HTTPS has been successfully configured to avert adming login using an insecure URL
    • Setup -> Configuration -> Email Configuration
    • Important: Verify ALL tunings for handling max. concurrent connections have been applied (see ticket #3219) in the following order and reboot if necessary
      1. /etc/sysctl.conf (reload with sysctl -p)
      2. /etc/security/limits.conf
      3. /proc/sys/net/ipv4/ip_local_port_range
    • Ensure monitor scripts have been copied
  • Take a backup of the master/active node (Utilities -> Snapt Backup)
  • Stop Redundancy on any additional slave nodes to facilitate explicit failover to upgraded node and Reload Redundancy
    • Note: Redundancy will automatically be restarted on all nodes once Start/Reload has been performed on destination master node
  • Perform initial failover to upgraded master node:
    • Ping a single VIP continuously (i.e. ping <VIP> -t) and run the following simultaenously to monitor VRRP activity and verify traffic switch over
      • tail -f /var/log/messages
      • /usr/sbin/tcpdump ip proto \\icmp
    • Verify service health

Promoting to master

  1. Ensure Redundancy has been stopped on current master and failover to new master is successful
  2. Change SLAVE01 operation mode to master
  3. Change MASTER operation mode to slave, mapped to new master
    • Be sure to visit Local Replication to obtain Slave Server Key and copy to new master
  4. Optionally map additional slave nodes to new master
  5. Reload Redundancy on the new master node. Then Start Redundancy on the former master (now slave) node
  6. Verify VIPs in Standby mode
  7. Important: Several anomalies were encountered when attempting to join new slave nodes, including lighttpd service unable to bind during startup and configurations not replicating properly. Be sure to review items under Redundancy -> Local Replication, re-enabling configuration items which should be copied in future synchronizations initiated from the new master node before performing a Force Sync.

Licensing (Notes)

  • As of 5/3/18, new licenses must be downloaded to each node individual in Snapt UI (Dashboard -> License) in order to apply any extensions, etc.

Load Balancer Migration (Notes)

The following assumes use of an internal DNS server used to split traffic to new load balancer during production pilot while an old load balancer was being decommissioned.

Overview:

  1. Add original VIP(s) to Snapt for services ready to be activated in production, Force Sync, and Reload Redundancy
  2. Delete service in old load balancer to release IP
  3. Add second bind directive to listen/frontend groups (HAProxy) referencing original service IP. Retain existing bind addresses to allow time for DNS update
  4. Update internal DNS for services to use original service IP
  5. Remove original bind address no longer in use from both HAProxy/NGINX and VIPs (Redundancy)

Hardening SnaptOS

File permissions

Review the following files and directories for appropriate permissions and adjust as necessary:

  • /etc/lighttpd/lighttpd.conf
  • /etc/keepalived/keepalived.conf
  • /etc/haproxy/haproxy.cfg
  • /etc/nginx
  • /var/snapt/certs

Web-accessible resources

Review the url.access-deny parameter in /etc/lighttpd/lighttpd.conf to ensure access to sensitive files and directories are prohibited.

The following command may be used to periodically review and update this blacklist with any new files or directories introduced after framework updates:

find /srv/www/htdocs -type f | grep -Ev '\.(php|js|css|html?|map|gif|png|jpg|less|woff|eot|svg|ttf|scss|md|otf|woff2)$'

Review and deploy appropriate DH parameters

See Guide to Deploying Diffie-Hellman for TLS

Applies to: lighttpd, haproxy, and nginx

#### SNAPT OPTIONS ####
# Change this port to alter the port Snapt runs on
server.port=8080
# Configure a single IP for Snapt to bind to?
server.bind = "localhost"
# IMPORTANT: Change this line to a single IP before moving to production!
$SERVER["socket"] == "0.0.0.0:8081" {
ssl.engine = "enable"
ssl.pemfile = "/etc/lighttpd/lighttpd.pem"
}
#### DO NOT CHANGE ####
var.log_root = "/var/log/lighttpd"
var.server_root = "/srv/www"
var.state_dir = "/var/run"
var.home_dir = "/var/lib/lighttpd"
var.conf_dir = "/etc/lighttpd"
include "modules.conf"
server.username = "lighttpd"
server.groupname = "lighttpd"
server.core-files = "disable"
server.document-root = server_root + "/htdocs"
server.tag = "snapt"
server.pid-file = state_dir + "/lighttpd.pid"
server.errorlog = log_root + "/error.log"
server.event-handler = "linux-sysepoll"
server.network-backend = "linux-sendfile"
server.max-fds = 2048
server.stat-cache-engine = "simple"
server.max-connections = 128
index-file.names += ( "index.php" )
url.access-deny = ( "~", ".inc" , ".ini" , ".db" , ".snp" )
$HTTP["url"] =~ "\.pdf$" { server.range-requests = "disable" }
url.rewrite-once = ( "/(.*)\.(.*)" => "$0", "^/([^?]*)(.*)" => "/index.php$2", "^/([^.]+)$" => "/index.php", "^/$" => "/index.php" )
static-file.exclude-extensions = ( ".php", ".pl", ".fcgi", ".scgi" )
include "conf.d/mime.conf"
include "conf.d/dirlisting.conf"
server.follow-symlink = "enable"
server.upload-dirs = ( "/var/tmp" )
ssl.honor-cipher-order = "enable"
ssl.cipher-list = "EECDH+AESGCM:EDH+AESGCM:AES256+EECDH:AES256+EDH"
ssl.use-compression = "disable"
setenv.add-response-header = ( "X-Frame-Options" => "DENY", "X-Content-Type-Options" => "nosniff" )
ssl.use-sslv2 = "disable"
ssl.use-sslv3 = "disable"
# $OpenBSD: sshd_config,v 1.98 2016/02/17 05:29:04 djm Exp $
# This is the sshd server system-wide configuration file. See
# sshd_config(5) for more information.
# This sshd was compiled with PATH=/usr/bin:/bin:/usr/sbin:/sbin
# The strategy used for options in the default sshd_config shipped with
# OpenSSH is to specify options with their default value where
# possible, but leave them commented. Uncommented options override the
# default value.
Port 22
#AddressFamily any
# IMPORTANT: Change this line to a single IP before moving to production!
#ListenAddress 0.0.0.0
#ListenAddress ::
# Disable legacy (protocol version 1) support in the server for new
# installations. In future the default will change to require explicit
# activation of protocol 1
Protocol 2
# HostKey for protocol version 1
#HostKey /etc/ssh/ssh_host_key
# HostKeys for protocol version 2
HostKey /etc/ssh/ssh_host_rsa_key
#HostKey /etc/ssh/ssh_host_dsa_key
#HostKey /etc/ssh/ssh_host_ecdsa_key
#HostKey /etc/ssh/ssh_host_ed25519_key
# Minimum accepted size of the DH parameter p. By default this is set to 1024
# to maintain compatibility with RFC4419, but should be set higher.
# Upstream default is identical to setting this to 2048.
#KexDHMin 1024
# Lifetime and size of ephemeral version 1 server key
#KeyRegenerationInterval 1h
#ServerKeyBits 1024
# Ciphers and keying
#RekeyLimit default none
# Logging
# obsoletes QuietMode and FascistLogging
SyslogFacility AUTHPRIV
LogLevel INFO
# Authentication:
LoginGraceTime 60
PermitRootLogin no
#StrictModes yes
MaxAuthTries 4
#MaxSessions 10
#RSAAuthentication yes
#PubkeyAuthentication yes
# The default is to check both .ssh/authorized_keys and .ssh/authorized_keys2
# but this is overridden so installations will only check .ssh/authorized_keys
AuthorizedKeysFile .ssh/authorized_keys
#AuthorizedPrincipalsFile none
#AuthorizedKeysCommand none
#AuthorizedKeysCommandUser nobody
# For this to work you will also need host keys in /etc/ssh/ssh_known_hosts
#RhostsRSAAuthentication no
# similar for protocol version 2
HostbasedAuthentication no
# Change to yes if you don't trust ~/.ssh/known_hosts for
# RhostsRSAAuthentication and HostbasedAuthentication
#IgnoreUserKnownHosts no
# Don't read the user's ~/.rhosts and ~/.shosts files
IgnoreRhosts yes
# To disable tunneled clear text passwords, change to no here!
PasswordAuthentication no
PermitEmptyPasswords no
# Change to no to disable s/key passwords
ChallengeResponseAuthentication no
# Kerberos options
#KerberosAuthentication no
#KerberosOrLocalPasswd yes
#KerberosTicketCleanup yes
#KerberosGetAFSToken no
#KerberosUseKuserok yes
# GSSAPI options
GSSAPIAuthentication no
GSSAPICleanupCredentials yes
#GSSAPIStrictAcceptorCheck yes
#GSSAPIKeyExchange no
# Set this to 'yes' to enable PAM authentication, account processing,
# and session processing. If this is enabled, PAM authentication will
# be allowed through the ChallengeResponseAuthentication and
# PasswordAuthentication. Depending on your PAM configuration,
# PAM authentication via ChallengeResponseAuthentication may bypass
# the setting of "PermitRootLogin without-password".
# If you just want the PAM account and session checks to run without
# PAM authentication, then enable this but set PasswordAuthentication
# and ChallengeResponseAuthentication to 'no'.
UsePAM yes
#AllowAgentForwarding yes
#AllowTcpForwarding yes
#GatewayPorts no
X11Forwarding no
#X11DisplayOffset 10
#X11UseLocalhost yes
#PrintMotd yes
#PrintLastLog yes
#TCPKeepAlive yes
#UseLogin no
#UsePrivilegeSeparation yes
PermitUserEnvironment no
#Compression delayed
ClientAliveInterval 300
ClientAliveCountMax 5
#ShowPatchLevel no
UseDNS no
#PidFile /run/sshd.pid
#MaxStartups 10:30:100
#PermitTunnel no
#ChrootDirectory none
#VersionAddendum none
# no default banner path
Banner none
# override default of no subsystems
Subsystem sftp /usr/lib/ssh/sftp-server
# This enables accepting locale enviroment variables LC_* LANG, see sshd_config(5).
AcceptEnv LANG LC_CTYPE LC_NUMERIC LC_TIME LC_COLLATE LC_MONETARY LC_MESSAGES
AcceptEnv LC_PAPER LC_NAME LC_ADDRESS LC_TELEPHONE LC_MEASUREMENT
AcceptEnv LC_IDENTIFICATION LC_ALL
# Example of overriding settings on a per-user basis
#Match User anoncvs
# X11Forwarding no
# AllowTcpForwarding no
# PermitTTY no
# ForceCommand cvs server
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment