byllc/capacity planning

## capacity planning
## Capacity Planning Notes


#### Is your target Public cloud or private cloud?
- Private Cloud
  - easier
  - you know your workload and purpose
  - Fewer variables
  - Cost / Revenue is an important consideration
  - often single tenant
  - Provisioning normallly easier / slower churn

- Public
  - Designing for a generic use case and workload.

- Important Openstack Considerations
  - network model
  - neutron nodes vs. DVR
  - Assuming 10G Networking?
  - Floating IP's (None Now)
  - Future SDN provider?
  - Are we providing HA Compute?
    - Trade off is performance.

  - Volumes are mounted from SAN?

#### Hopes and dreams
  - We want to encourage every project being developed from the ground up for redundancy.

#### Questions For US
  - What configuration management and provisioning tools are in use besides Bosh.
  - What other monitoring tools are in place in the organization besides waht I've seen in the Grafana stack.
  - What hypervisor is being used? KVM? HyperV?
  - Overcommit fractions * physical cores / virtual cores per instance to figure out how many VMS of whatever flavor we can host on a compute node.

#### Questions for BC
- Look up performance characteristics of different VCPU to core ratios.

#### Cloud Controller Notes
- Box that hosts yoru db/messageque/and endpoits
- Is our CC HA?
- How active is use of the OS API?

#### Neutron notes?
- How overloaded are the network nodes? (not very)

#### Thoughts

- getting good and reliable data in a virtual environment can be a tremendous effort.

- assume cloud users will follow similar patters to what we've witnessed already? But this is not a guarantee. so we need to seperate forecaasts in to behavior-driven and process driven forecasts.

- CF planning and service planning, different dashboards?

- How well known are high utilization periods
  - Make a list per app
  - per service
  - note the statistics of the app (framework/end user/etc.. )
  - for network utilization concerns we can collocate apps with different utilization periods on the same platform.

- load tests new apps to get usage patterns, ensure dashboard and metrics are being collected during the test, we can get a footprint here.

- real time isn't the key,  historical and aggregated data is what matters.
  - but being able to report on and spin up capacity quickkly is important
  - our work would be the basis for autoscaling activities in the platform

- most infrastructures are under peak load for under an hour or 2 per day, a widget that displays the peak load hour over time would be interesting.

- How about a visualizaton that shows peak usage windows per application
  - over time
  - on average
  - network utilization
  - memory and cpu utilization?
  - can we even get at this reliably?

- do we care about quotas as well as capacity? yes,

- discriminatory pricing (lookup airline analogy)

- difference between bare metal, vm, and container, start from top or start from bottom?

- we need to come up with a list of forecasting models.

- Do we have any projections for future utilization, 6 mo? 12mo?

- Overcommit Ratio * physical cores / virtual cores per instance to figure out how many VMS of whatever flavor we can host on a compute node. We could put this in the dashboard.

- assign a cost to a flavor and we have value to display

- We could theorically show how many containers we could provide for at a given point it time.

- We could show if we have a full AZ available red/green button.

- Make sure ephemeral storage plan has enough capacity to support numbers from the VM capacity number.

-  If we know what flavor is used for the the Container host this would be an interesting calculation WRT CF.

- Mixing lots of flavors can hurt performance under KVM, something about time slice utilization. Maybe a flavor variance number?

- Do we have Ceilometer Available in Juno?

- The number of variables in play is huge, we need to keep it simple to start.

- KVM with AMD overcommit 2 or 4:1 gives great density but you won't have the iops to support all of those VMS hitting local storage.  References the hardware from my opentsack notes.

- Should we run Unix Bench on a test stack? Has there every been a stress test?

- Putting Like VM's on like compute nodes? Flavor + nova Schedular can ensure flavor goes to a specific host aggregate.

- How do we get meaningful metrics for network utiilization from OS? Baremetal? An applicatoin that is a resource hog isn't necessarily one with a very active API. How do we segregate these?

- what ways can we assign dollar values to dashboard?
  - Want to show totaal percentage of aggregate resources in use by a given app? CF? Bosh job?
	## Capacity Planning Notes


	#### Is your target Public cloud or private cloud?
	- Private Cloud
	- easier
	- you know your workload and purpose
	- Fewer variables
	- Cost / Revenue is an important consideration
	- often single tenant
	- Provisioning normallly easier / slower churn

	- Public
	- Designing for a generic use case and workload.

	- Important Openstack Considerations
	- network model
	- neutron nodes vs. DVR
	- Assuming 10G Networking?
	- Floating IP's (None Now)
	- Future SDN provider?
	- Are we providing HA Compute?
	- Trade off is performance.

	- Volumes are mounted from SAN?

	#### Hopes and dreams
	- We want to encourage every project being developed from the ground up for redundancy.

	#### Questions For US
	- What configuration management and provisioning tools are in use besides Bosh.
	- What other monitoring tools are in place in the organization besides waht I've seen in the Grafana stack.
	- What hypervisor is being used? KVM? HyperV?
	- Overcommit fractions * physical cores / virtual cores per instance to figure out how many VMS of whatever flavor we can host on a compute node.

	#### Questions for BC
	- Look up performance characteristics of different VCPU to core ratios.

	#### Cloud Controller Notes
	- Box that hosts yoru db/messageque/and endpoits
	- Is our CC HA?
	- How active is use of the OS API?

	#### Neutron notes?
	- How overloaded are the network nodes? (not very)

	#### Thoughts

	- getting good and reliable data in a virtual environment can be a tremendous effort.

	- assume cloud users will follow similar patters to what we've witnessed already? But this is not a guarantee. so we need to seperate forecaasts in to behavior-driven and process driven forecasts.

	- CF planning and service planning, different dashboards?

	- How well known are high utilization periods
	- Make a list per app
	- per service
	- note the statistics of the app (framework/end user/etc.. )
	- for network utilization concerns we can collocate apps with different utilization periods on the same platform.

	- load tests new apps to get usage patterns, ensure dashboard and metrics are being collected during the test, we can get a footprint here.

	- real time isn't the key, historical and aggregated data is what matters.
	- but being able to report on and spin up capacity quickkly is important
	- our work would be the basis for autoscaling activities in the platform

	- most infrastructures are under peak load for under an hour or 2 per day, a widget that displays the peak load hour over time would be interesting.

	- How about a visualizaton that shows peak usage windows per application
	- over time
	- on average
	- network utilization
	- memory and cpu utilization?
	- can we even get at this reliably?

	- do we care about quotas as well as capacity? yes,

	- discriminatory pricing (lookup airline analogy)

	- difference between bare metal, vm, and container, start from top or start from bottom?

	- we need to come up with a list of forecasting models.

	- Do we have any projections for future utilization, 6 mo? 12mo?

	- Overcommit Ratio * physical cores / virtual cores per instance to figure out how many VMS of whatever flavor we can host on a compute node. We could put this in the dashboard.

	- assign a cost to a flavor and we have value to display

	- We could theorically show how many containers we could provide for at a given point it time.

	- We could show if we have a full AZ available red/green button.

	- Make sure ephemeral storage plan has enough capacity to support numbers from the VM capacity number.

	- If we know what flavor is used for the the Container host this would be an interesting calculation WRT CF.

	- Mixing lots of flavors can hurt performance under KVM, something about time slice utilization. Maybe a flavor variance number?

	- Do we have Ceilometer Available in Juno?

	- The number of variables in play is huge, we need to keep it simple to start.

	- KVM with AMD overcommit 2 or 4:1 gives great density but you won't have the iops to support all of those VMS hitting local storage. References the hardware from my opentsack notes.

	- Should we run Unix Bench on a test stack? Has there every been a stress test?

	- Putting Like VM's on like compute nodes? Flavor + nova Schedular can ensure flavor goes to a specific host aggregate.

	- How do we get meaningful metrics for network utiilization from OS? Baremetal? An applicatoin that is a resource hog isn't necessarily one with a very active API. How do we segregate these?

	- what ways can we assign dollar values to dashboard?
	- Want to show totaal percentage of aggregate resources in use by a given app? CF? Bosh job?