Skip to content

Instantly share code, notes, and snippets.

@fapestniegd
Created May 19, 2011 15:15
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save fapestniegd/980993 to your computer and use it in GitHub Desktop.
Save fapestniegd/980993 to your computer and use it in GitHub Desktop.
Once we know how many nodes of what size and type we're talking about, we'll want to phase it out:
We have a few options along the way.
All hardware prices below are list (no tax included) as of 20100615, and deals could undoubtedly be found.
"server" prices are estimates for Dell 2970s with DRAC boards, redundant power supplies
Costs for : server racks,
cabling,
patch panels,
power distribution units,
breaker boxes,
uninterruptible power supplies,
generators,
and uplink bandwith
have been omitted as a Colocation facility is assumed.
If self-hosting, these items will be in scope.
Parts list:
$ 4k - 24-port gigE cisco switches (that support etherchannel and vlans)
$ 3k - cyclades console server with a modem
$ 8k - ESX server, 8 cores, 64GB RAM (w/dual dual-port NICs, but only 2 disks)
$10k - ESX server, 8 cores, 64GB RAM (w/dual dual-port NICs, fully populated (8 500GB disks)
$ 5k - Openfiler server 4GB RAM w/dual dual-port NICs + fully populated (8 500GB disks)
$ 9k - 2-socket ESX license
$ 8k - vCenter license
$20k - Cisco ASA 5520 firewall (optional)
A "virtual machine" is defined as: 1 virtual CPU, 2GB RAM, 30GB of HDD (10GB Operating System / 20GB of Applications)
If you need N CPUs or X times as much disk, then divide the number of supported VMs by N or X (whichever is greater)
Labor Costs: Labor will come to about 20% of the harware costs (just the hardware, not the hardware + licensing cost)
On average, and for these designs, this works out about right, add $5k/week if someone needs to be on site, which
shouldn't be necessary, if the console server and hosts are cabled correctly. (the rack, stack, cabling, and DRAC
configuration of the time) once that's done, the rest of the work can be done remotely (over a reasonable link) from
anywhere from one day to 2 weeks, depending on the design. But I've seen the cabling take months in large shops.
################################################################################
# Phase I: Virtualization of Existing environments
################################################################################
Stage 1: (prototyping) [ ~$10k of hardware ]
Single ESXi Server with internal disks.
With 12 cores and 64 GB of RAM, it's capable of running about 50 virtual Machines (Memory most likely constraining first)
--
No licenses needed as ESXi is free.
RISK: *no redundancy of any kind*. (The box having a hardware problem actually takes all 50 VMs down with it.)
*only recommended for lab environments*
Stage 2: (adding ESXi redundancy or doubling capacity) [ ~$21k of hardware ]
Dual ESXi Servers + 1 OpenFiler Server
With 24 cores and 128GB RAM, this will support 100 VMs max (or 50 w/failover) RAM or Disk probable constrain first
--
No licenses needed as ESXi is free.
RISK: no redundancy if running at capacity
RISK: no storage redundancy
RISK: no automatic failover
RISK: no support from vendor
(if the storage node fails, but the disks are still good, another machine could be manually
re-purposed as the OpenFiler, everything would be down in the mean time )
*only recommended for lab environments*
Stage 3: (SAN redundancy) [ ~$34k of hardware ]
2 ESXi Servers + 2 OpenFiler Servers w/DRDB replication redundant Cisco switches
With 24 cores and 128GB RAM, this will support 100 VMs max (50 max w/failover)
--
Previous stages required wiring servers directly to one-another via NICs, this stage adds switches
for additional configuration options such as VLANs and Etherchanneled ports. It also adds redundant storage.
Switches may be omitted if the Colocation provider will carve out VLANs for the ESX servers.
If one OpenFiler dies the other can take it's IP and recover automatically. (There may be some disk
inconsistencies that the journaling filesystems would have to recover should a hard crash occur)
RISK: All recovery of any failed nodes is manual.
RISK: losing configuration on switches requires someone on location to re-configure them
RISK: no support contract from vendor
--
*only recommended for lab environments*
Stage 4: (Adds management features that abstract the work load from the hardware) [ $34k hardware + $26k licenses ]
2 ESX Servers + 2 OpenFiler Servers w/DRDB, cisco switches, vcenter managed, console server
--
Adding the console server allows for remote re-configuration of the switches, serial acces to DRACs, even by modem
With 24 cores and 128GB RAM, this will support 100 VMs max (50 max w/failover)
This is identical to stage 3 with the exception that VMware ESX replaces ESXi, and vCenter Licenses are aquired
Added:
VMotion: virtual machines may be migrated between ESX servers without an outage (with the VM in production)
Maintenance Mode: All VMs may be evacuated from an ESX server with one command
(the hardware can be worked on without a VM outage)
Storage VMotion: Virtual machine disks may be moved between OpenFiler Boxes without an VM outage
VMware HA: In the event of an ESX server failure, all VMs are automatically recovered on second ESX server
VMware DRS: Load between the nodes are automatically balanced with vmotion
Snapshots linked clones, other things to deduplicate Storage use.
RISK: Split-brain cluster situation possible if networking is lost and both ESX servers go into isolation mode.
RISK: OpenFiler is not on the VMware HCL, although it works with no issues
Stage 5: (reduces the split-brain risk, adds capacity) [ $42k hardware + $35k licenses ]
With 36 cores and 192GB RAM, this will support 150 VMs max (75 max w/failover) Disk may constrain this first
--
Split brain may be avoided by disabling HA manually before entering maintenance mode.
3 ESX Servers + 2 OpenFiler Servers w/DRDB, cisco switches, vcenter managed
RISK: Split-brain cluster situation possible if one system is in maintenance mode
RISK: OpenFiler is not on the VMware HCL, although it works with no issues
Stage 6: (removes the split-brain risk, adds capacity) [ $55k hardware + $44k licenses ]
4 ESXi Servers + 3 OpenFiler Servers w/DRDB, cisco switches, vcenter managed, console server
--
Adding a spare node to prevent split-brain HA when 4th node is in maintenance mode.(without disabling HA)
Adding another Openfiler to keep disk from constraining our ~150 VMs
With 48 cores and 256GB RAM, this will support 200 VMs max (100 w/failover) (Disk will constrain first)
RISK: the entire infrastructure can withstand any single component failing and some dual component failures
RISK: OpenFiler is not on the VMware HCL, although it works with no issues
Stage 7 and beyond: At this point adding: [ + $13k for every 50 VMs added capacity ]
$8k in hardware for an ESX server adds 50 VMs capacity CPU & RAM
$5k in hardware for an Openfiler adds 50 VMs capacity of Disk
(only 1/2 the storage is useable, the other half is a DRDB target for another OpenFiler for failover)
(with the caveat that the DRDB configurations become very complex beyond 2-3 nodes [ a->b, b->c, c->a ]
Adding an enterprise SAN (~$50k - $100k) and buying 3 more ESX licenses (~ $27k), and RAM for the OpenFilers would
add 150 more VM capacity, and make LUN management/expansion signifigantly easier beyond this point; But it more than
Doubles the amount already spent on
--
RISK: *It is strongly Recommended that the OpenFilers be replaced with Enterprise Storage beyond this point (see below)*
Additional switches will need purchased upon port depletion (4-ports used/server) + uplink ports
Additional console servers will be needed upon serial port depletion (1/switch, 1/server)
VMware support contracts will incur a yearly renewal.
# Enerprise Storage Consideration
When you get much larger than 4ESX / 3 OpenFilers, replacing the OpenFilers with a Storage Area network (SAN)
like a Clariion or NetApp (either Fiber or iSCSI) should be given serious consideration for performance and
ease of management and expansion. An entry-level enterprise SAN can cost $50-$300k (depending on whether
fiber switches are needed, how many shelves of what size disk, how close you are to the end of the quarter,
etc) RAM can be added to the old OpenFiler nodes and they can be re-purposed as ESX nodes. (or still used
as tier-2 storage, or lab storage)
################################################################################
# Phase II: infrastructure, some vms to manage the cluster, user management, etc
################################################################################
Deployment of base Operating systems, vCenter Servers, ESX Servers, OpenFiler
Servers, Cisco switches, Console servers, LDAP Servers, AD servers
(if desired), AD/LDAP password synchronization (if desired)
################################################################################
# Phase III: Configuration management of applications (difficulty will depend
# on what is configuration files and what are .jar/.war files)
################################################################################
Extend base operating system configuraton rules to include customer
applicatons and their dependencies.
################################################################################
# Phase IV: Automated backups (to iron mountian and/or elsewhere on demand
# and/or at regular intervals) [ optional phase ]
################################################################################
################################################################################
# Phase V: Automated deployments of customer stacks, (I) is a prerequisite for
# this, as we can't provision hardware on demand
################################################################################
################################################################################
# Phase VI: Monitoring for performance and fault-tolerance
################################################################################
################################################################################
# Phase VII: Elasticity expanding and reducing tiers horizontally based on
# load (II, III, V, & VI) are prerequisites for this phase
################################################################################
################################################################################
# Phase VIII: Hooking the automated deployments into a Customer Relations
# Management System (this will depend on what CRMS, and there would
# be development hours) [ optional phase ]
################################################################################
################################################################################
# Phase IX: Migration into the cloud (move off ESX and to other cloud services)
# [ optional phase ]
################################################################################
################################################################################
# Phase X: DR components ( geographical disparity, site-to-site replication )
# [ optional phase ]
################################################################################
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment