fapestniegd/gist:980993

## gistfile1.txt
Once we know how many nodes of what size and type we're talking about, we'll want to phase it out:
We have a few options along the way.
All hardware prices below are list (no tax included) as of 20100615, and deals could undoubtedly be found.
"server" prices are estimates for Dell 2970s with DRAC boards, redundant power supplies
Costs for : server racks,
            cabling,
            patch panels,
            power distribution units,
            breaker boxes,
            uninterruptible power supplies,
            generators,
            and uplink bandwith
have been omitted as a Colocation facility is assumed.
If self-hosting, these items will be in scope.

Parts list:
    $ 4k - 24-port gigE cisco switches (that support etherchannel and vlans)
    $ 3k - cyclades console server with a modem
    $ 8k - ESX server, 8 cores, 64GB RAM (w/dual dual-port NICs, but only 2 disks)
    $10k - ESX server, 8 cores, 64GB RAM (w/dual dual-port NICs, fully populated (8 500GB disks)
    $ 5k - Openfiler server 4GB RAM w/dual dual-port NICs + fully populated (8 500GB disks)
    $ 9k - 2-socket ESX license
    $ 8k - vCenter license
    $20k - Cisco ASA 5520 firewall (optional)

A "virtual machine" is defined as: 1 virtual CPU, 2GB RAM, 30GB of HDD (10GB Operating System / 20GB of Applications)
If you need N CPUs or X times as much disk, then divide the number of supported VMs by N or X (whichever is greater)

Labor Costs: Labor will come to about 20% of the harware costs (just the hardware, not the hardware + licensing cost)
On average, and for these designs, this works out about right, add $5k/week if someone needs to be on site, which
shouldn't be necessary, if the console server and hosts are cabled correctly. (the rack, stack, cabling, and DRAC
configuration of the time) once that's done, the rest of the work can be done remotely (over a reasonable link) from
anywhere from one day to 2 weeks, depending on the design. But I've seen the cabling take months in large shops.

################################################################################
# Phase I: Virtualization of Existing environments
################################################################################
Stage 1: (prototyping) [ ~$10k of hardware ]
    Single ESXi Server with internal disks.
    With 12 cores and 64 GB of RAM, it's capable of running about 50 virtual Machines (Memory most likely constraining first)
    --
    No licenses needed as ESXi is free.
    RISK: *no redundancy of any kind*. (The box having a hardware problem actually takes all 50 VMs down with it.)
    *only recommended for lab environments*

Stage 2: (adding ESXi redundancy or doubling capacity) [ ~$21k of hardware  ]
    Dual ESXi Servers + 1 OpenFiler Server
    With 24 cores and 128GB RAM, this will support 100 VMs max (or 50 w/failover) RAM or Disk probable constrain first
    --
    No licenses needed as ESXi is free.
    RISK: no redundancy if running at capacity
    RISK: no storage redundancy
    RISK: no automatic failover
    RISK: no support from vendor
    (if the storage node fails, but the disks are still good, another machine could be manually
     re-purposed as the OpenFiler, everything would be down in the mean time )
    *only recommended for lab environments*

Stage 3: (SAN redundancy)  [ ~$34k of hardware ]
    2 ESXi Servers + 2 OpenFiler Servers w/DRDB replication redundant Cisco switches
    With 24 cores and 128GB RAM, this will support 100 VMs max (50 max w/failover)
    --
    Previous stages required wiring servers directly to one-another via NICs, this stage adds switches
    for additional configuration options such as VLANs and Etherchanneled ports. It also adds redundant storage.
    Switches may be omitted if the Colocation provider will carve out VLANs for the ESX servers.
    If one OpenFiler dies the other can take it's IP and recover automatically. (There may be some disk
    inconsistencies that the journaling filesystems would have to recover should a hard crash occur)
    RISK: All recovery of any failed nodes is manual.
    RISK: losing configuration on switches requires someone on location to re-configure them
    RISK: no support contract from vendor
    --
    *only recommended for lab environments*

Stage 4: (Adds management features that abstract the work load from the hardware) [ $34k hardware + $26k licenses ]
    2 ESX Servers + 2 OpenFiler Servers w/DRDB, cisco switches, vcenter managed, console server
    --
    Adding the console server allows for remote re-configuration of the switches, serial acces to DRACs, even by modem
    With 24 cores and 128GB RAM, this will support 100 VMs max (50 max w/failover)
    This is identical to stage 3 with the exception that VMware ESX replaces ESXi, and vCenter Licenses are aquired
    Added:
      VMotion: virtual machines may be migrated between ESX servers without an outage (with the VM in production)
      Maintenance Mode: All VMs may be evacuated from an ESX server with one command
          (the hardware can be worked on without a VM outage)
      Storage VMotion: Virtual machine disks may be moved between OpenFiler Boxes without an VM outage
      VMware HA: In the event of an ESX server failure, all VMs are automatically recovered on second ESX server
      VMware DRS: Load between the nodes are automatically balanced with vmotion
      Snapshots linked clones, other things to deduplicate Storage use.
    RISK: Split-brain cluster situation possible if networking is lost and both ESX servers go into isolation mode.
    RISK: OpenFiler is not on the VMware HCL, although it works with no issues

Stage 5: (reduces the split-brain risk, adds capacity)  [ $42k hardware + $35k licenses ]
    With 36 cores and 192GB RAM, this will support 150 VMs max (75 max w/failover) Disk may constrain this first
    --
    Split brain may be avoided by disabling HA manually before entering maintenance mode.
    3 ESX Servers + 2 OpenFiler Servers w/DRDB, cisco switches, vcenter managed
    RISK: Split-brain cluster situation possible if one system is in maintenance mode
    RISK: OpenFiler is not on the VMware HCL, although it works with no issues

Stage 6: (removes the split-brain risk, adds capacity) [ $55k hardware + $44k licenses ]
    4 ESXi Servers + 3 OpenFiler Servers w/DRDB, cisco switches, vcenter managed, console server
    --
    Adding a spare node to prevent split-brain HA when 4th node is in maintenance mode.(without disabling HA)
    Adding another Openfiler to keep disk from constraining our ~150 VMs
    With 48 cores and 256GB RAM, this will support 200 VMs max (100 w/failover) (Disk will constrain first)
    RISK: the entire infrastructure can withstand any single component failing and some dual component failures
    RISK: OpenFiler is not on the VMware HCL, although it works with no issues

Stage 7 and beyond: At this point adding: [ + $13k for every 50 VMs added capacity ]
    $8k in hardware for an ESX server adds 50 VMs capacity CPU & RAM
    $5k in hardware for an Openfiler  adds 50 VMs capacity of Disk
        (only 1/2 the storage is useable, the other half is a DRDB target for another OpenFiler for failover)
        (with the caveat that the DRDB configurations become very complex beyond 2-3 nodes [ a->b, b->c, c->a ]
    Adding an enterprise SAN (~$50k - $100k) and buying 3 more ESX licenses (~ $27k), and RAM for the OpenFilers would
    add 150 more VM capacity, and make LUN management/expansion signifigantly easier beyond this point; But it more than
    Doubles the amount already spent on
    --
    RISK: *It is strongly Recommended that the OpenFilers be replaced with Enterprise Storage beyond this point (see below)*
    Additional switches will need purchased upon port depletion (4-ports used/server) + uplink ports
    Additional console servers will be needed upon serial port depletion (1/switch, 1/server)
    VMware support contracts will incur a yearly renewal.

# Enerprise Storage Consideration
When you get much larger than 4ESX / 3 OpenFilers, replacing the OpenFilers with a Storage Area network (SAN)
    like a Clariion or NetApp (either Fiber or iSCSI) should be given serious consideration for performance and
    ease of management and expansion. An entry-level enterprise  SAN can cost $50-$300k (depending on whether
    fiber switches are needed, how many shelves of what size disk, how close you are to the end of the quarter,
    etc) RAM can be added to the old OpenFiler nodes and they can be re-purposed as ESX nodes. (or still used
    as tier-2 storage, or lab storage)

################################################################################
# Phase II: infrastructure, some vms to manage the cluster, user management, etc
################################################################################
   Deployment of base Operating systems, vCenter Servers, ESX Servers, OpenFiler
   Servers, Cisco switches, Console servers, LDAP Servers, AD servers
   (if desired), AD/LDAP password synchronization (if desired)

################################################################################
# Phase III:  Configuration management of applications (difficulty will depend
#             on what is configuration files and what are .jar/.war files)
################################################################################
  Extend base operating system configuraton rules to include customer
  applicatons and their dependencies.

################################################################################
# Phase IV:  Automated backups (to iron mountian and/or elsewhere on demand
#            and/or at regular intervals) [ optional phase ]
################################################################################

################################################################################
# Phase V: Automated deployments of customer stacks, (I) is a prerequisite for
#          this, as we can't provision hardware on demand
################################################################################

################################################################################
# Phase VI:  Monitoring for performance and fault-tolerance
################################################################################

################################################################################
# Phase VII:  Elasticity expanding and reducing tiers horizontally based on
#             load (II, III, V, & VI) are prerequisites for this phase
################################################################################

################################################################################
# Phase VIII: Hooking the automated deployments into a Customer Relations
#             Management System (this will depend on what CRMS, and there would
#             be development hours) [ optional phase ]
################################################################################

################################################################################
# Phase IX: Migration into the cloud (move off ESX and to other cloud services)
#           [ optional phase ]
################################################################################

################################################################################
# Phase X: DR components ( geographical disparity, site-to-site replication )
#          [ optional phase ]
################################################################################
	Once we know how many nodes of what size and type we're talking about, we'll want to phase it out:
	We have a few options along the way.
	All hardware prices below are list (no tax included) as of 20100615, and deals could undoubtedly be found.
	"server" prices are estimates for Dell 2970s with DRAC boards, redundant power supplies
	Costs for : server racks,
	cabling,
	patch panels,
	power distribution units,
	breaker boxes,
	uninterruptible power supplies,
	generators,
	and uplink bandwith
	have been omitted as a Colocation facility is assumed.
	If self-hosting, these items will be in scope.

	Parts list:
	$ 4k - 24-port gigE cisco switches (that support etherchannel and vlans)
	$ 3k - cyclades console server with a modem
	$ 8k - ESX server, 8 cores, 64GB RAM (w/dual dual-port NICs, but only 2 disks)
	$10k - ESX server, 8 cores, 64GB RAM (w/dual dual-port NICs, fully populated (8 500GB disks)
	$ 5k - Openfiler server 4GB RAM w/dual dual-port NICs + fully populated (8 500GB disks)
	$ 9k - 2-socket ESX license
	$ 8k - vCenter license
	$20k - Cisco ASA 5520 firewall (optional)

	A "virtual machine" is defined as: 1 virtual CPU, 2GB RAM, 30GB of HDD (10GB Operating System / 20GB of Applications)
	If you need N CPUs or X times as much disk, then divide the number of supported VMs by N or X (whichever is greater)

	Labor Costs: Labor will come to about 20% of the harware costs (just the hardware, not the hardware + licensing cost)
	On average, and for these designs, this works out about right, add $5k/week if someone needs to be on site, which
	shouldn't be necessary, if the console server and hosts are cabled correctly. (the rack, stack, cabling, and DRAC
	configuration of the time) once that's done, the rest of the work can be done remotely (over a reasonable link) from
	anywhere from one day to 2 weeks, depending on the design. But I've seen the cabling take months in large shops.

	################################################################################
	# Phase I: Virtualization of Existing environments
	################################################################################
	Stage 1: (prototyping) [ ~$10k of hardware ]
	Single ESXi Server with internal disks.
	With 12 cores and 64 GB of RAM, it's capable of running about 50 virtual Machines (Memory most likely constraining first)
	--
	No licenses needed as ESXi is free.
	RISK: no redundancy of any kind. (The box having a hardware problem actually takes all 50 VMs down with it.)
	only recommended for lab environments

	Stage 2: (adding ESXi redundancy or doubling capacity) [ ~$21k of hardware ]
	Dual ESXi Servers + 1 OpenFiler Server
	With 24 cores and 128GB RAM, this will support 100 VMs max (or 50 w/failover) RAM or Disk probable constrain first
	--
	No licenses needed as ESXi is free.
	RISK: no redundancy if running at capacity
	RISK: no storage redundancy
	RISK: no automatic failover
	RISK: no support from vendor
	(if the storage node fails, but the disks are still good, another machine could be manually
	re-purposed as the OpenFiler, everything would be down in the mean time )
	only recommended for lab environments

	Stage 3: (SAN redundancy) [ ~$34k of hardware ]
	2 ESXi Servers + 2 OpenFiler Servers w/DRDB replication redundant Cisco switches
	With 24 cores and 128GB RAM, this will support 100 VMs max (50 max w/failover)
	--
	Previous stages required wiring servers directly to one-another via NICs, this stage adds switches
	for additional configuration options such as VLANs and Etherchanneled ports. It also adds redundant storage.
	Switches may be omitted if the Colocation provider will carve out VLANs for the ESX servers.
	If one OpenFiler dies the other can take it's IP and recover automatically. (There may be some disk
	inconsistencies that the journaling filesystems would have to recover should a hard crash occur)
	RISK: All recovery of any failed nodes is manual.
	RISK: losing configuration on switches requires someone on location to re-configure them
	RISK: no support contract from vendor
	--
	only recommended for lab environments

	Stage 4: (Adds management features that abstract the work load from the hardware) [ $34k hardware + $26k licenses ]
	2 ESX Servers + 2 OpenFiler Servers w/DRDB, cisco switches, vcenter managed, console server
	--
	Adding the console server allows for remote re-configuration of the switches, serial acces to DRACs, even by modem
	With 24 cores and 128GB RAM, this will support 100 VMs max (50 max w/failover)
	This is identical to stage 3 with the exception that VMware ESX replaces ESXi, and vCenter Licenses are aquired
	Added:
	VMotion: virtual machines may be migrated between ESX servers without an outage (with the VM in production)
	Maintenance Mode: All VMs may be evacuated from an ESX server with one command
	(the hardware can be worked on without a VM outage)
	Storage VMotion: Virtual machine disks may be moved between OpenFiler Boxes without an VM outage
	VMware HA: In the event of an ESX server failure, all VMs are automatically recovered on second ESX server
	VMware DRS: Load between the nodes are automatically balanced with vmotion
	Snapshots linked clones, other things to deduplicate Storage use.
	RISK: Split-brain cluster situation possible if networking is lost and both ESX servers go into isolation mode.
	RISK: OpenFiler is not on the VMware HCL, although it works with no issues

	Stage 5: (reduces the split-brain risk, adds capacity) [ $42k hardware + $35k licenses ]
	With 36 cores and 192GB RAM, this will support 150 VMs max (75 max w/failover) Disk may constrain this first
	--
	Split brain may be avoided by disabling HA manually before entering maintenance mode.
	3 ESX Servers + 2 OpenFiler Servers w/DRDB, cisco switches, vcenter managed
	RISK: Split-brain cluster situation possible if one system is in maintenance mode
	RISK: OpenFiler is not on the VMware HCL, although it works with no issues

	Stage 6: (removes the split-brain risk, adds capacity) [ $55k hardware + $44k licenses ]
	4 ESXi Servers + 3 OpenFiler Servers w/DRDB, cisco switches, vcenter managed, console server
	--
	Adding a spare node to prevent split-brain HA when 4th node is in maintenance mode.(without disabling HA)
	Adding another Openfiler to keep disk from constraining our ~150 VMs
	With 48 cores and 256GB RAM, this will support 200 VMs max (100 w/failover) (Disk will constrain first)
	RISK: the entire infrastructure can withstand any single component failing and some dual component failures
	RISK: OpenFiler is not on the VMware HCL, although it works with no issues

	Stage 7 and beyond: At this point adding: [ + $13k for every 50 VMs added capacity ]
	$8k in hardware for an ESX server adds 50 VMs capacity CPU & RAM
	$5k in hardware for an Openfiler adds 50 VMs capacity of Disk
	(only 1/2 the storage is useable, the other half is a DRDB target for another OpenFiler for failover)
	(with the caveat that the DRDB configurations become very complex beyond 2-3 nodes [ a->b, b->c, c->a ]
	Adding an enterprise SAN (~$50k - $100k) and buying 3 more ESX licenses (~ $27k), and RAM for the OpenFilers would
	add 150 more VM capacity, and make LUN management/expansion signifigantly easier beyond this point; But it more than
	Doubles the amount already spent on
	--
	RISK: It is strongly Recommended that the OpenFilers be replaced with Enterprise Storage beyond this point (see below)
	Additional switches will need purchased upon port depletion (4-ports used/server) + uplink ports
	Additional console servers will be needed upon serial port depletion (1/switch, 1/server)
	VMware support contracts will incur a yearly renewal.

	# Enerprise Storage Consideration
	When you get much larger than 4ESX / 3 OpenFilers, replacing the OpenFilers with a Storage Area network (SAN)
	like a Clariion or NetApp (either Fiber or iSCSI) should be given serious consideration for performance and
	ease of management and expansion. An entry-level enterprise SAN can cost $50-$300k (depending on whether
	fiber switches are needed, how many shelves of what size disk, how close you are to the end of the quarter,
	etc) RAM can be added to the old OpenFiler nodes and they can be re-purposed as ESX nodes. (or still used
	as tier-2 storage, or lab storage)

	################################################################################
	# Phase II: infrastructure, some vms to manage the cluster, user management, etc
	################################################################################
	Deployment of base Operating systems, vCenter Servers, ESX Servers, OpenFiler
	Servers, Cisco switches, Console servers, LDAP Servers, AD servers
	(if desired), AD/LDAP password synchronization (if desired)

	################################################################################
	# Phase III: Configuration management of applications (difficulty will depend
	# on what is configuration files and what are .jar/.war files)
	################################################################################
	Extend base operating system configuraton rules to include customer
	applicatons and their dependencies.

	################################################################################
	# Phase IV: Automated backups (to iron mountian and/or elsewhere on demand
	# and/or at regular intervals) [ optional phase ]
	################################################################################

	################################################################################
	# Phase V: Automated deployments of customer stacks, (I) is a prerequisite for
	# this, as we can't provision hardware on demand
	################################################################################

	################################################################################
	# Phase VI: Monitoring for performance and fault-tolerance
	################################################################################

	################################################################################
	# Phase VII: Elasticity expanding and reducing tiers horizontally based on
	# load (II, III, V, & VI) are prerequisites for this phase
	################################################################################

	################################################################################
	# Phase VIII: Hooking the automated deployments into a Customer Relations
	# Management System (this will depend on what CRMS, and there would
	# be development hours) [ optional phase ]
	################################################################################

	################################################################################
	# Phase IX: Migration into the cloud (move off ESX and to other cloud services)
	# [ optional phase ]
	################################################################################

	################################################################################
	# Phase X: DR components ( geographical disparity, site-to-site replication )
	# [ optional phase ]
	################################################################################