Skip to content

Instantly share code, notes, and snippets.

@byllc
Created December 3, 2018 20:58
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save byllc/654335ae29649062ee2571e9ee2ca32f to your computer and use it in GitHub Desktop.
Save byllc/654335ae29649062ee2571e9ee2ca32f to your computer and use it in GitHub Desktop.
CapacityPlanningForOpenstack.md
## Capacity Planning Notes
#### Is your target Public cloud or private cloud?
- Private Cloud
- easier
- you know your workload and purpose
- Fewer variables
- Cost / Revenue is an important consideration
- often single tenant
- Provisioning normallly easier / slower churn
- Public
- Designing for a generic use case and workload.
- Important Openstack Considerations
- network model
- neutron nodes vs. DVR
- Assuming 10G Networking?
- Floating IP's (None Now)
- Future SDN provider?
- Are we providing HA Compute?
- Trade off is performance.
- Volumes are mounted from SAN?
#### Hopes and dreams
- We want to encourage every project being developed from the ground up for redundancy.
#### Questions For US
- What configuration management and provisioning tools are in use besides Bosh.
- What other monitoring tools are in place in the organization besides waht I've seen in the Grafana stack.
- What hypervisor is being used? KVM? HyperV?
- Overcommit fractions * physical cores / virtual cores per instance to figure out how many VMS of whatever flavor we can host on a compute node.
#### Questions for BC
- Look up performance characteristics of different VCPU to core ratios.
#### Cloud Controller Notes
- Box that hosts yoru db/messageque/and endpoits
- Is our CC HA?
- How active is use of the OS API?
#### Neutron notes?
- How overloaded are the network nodes? (not very)
#### Thoughts
- getting good and reliable data in a virtual environment can be a tremendous effort.
- assume cloud users will follow similar patters to what we've witnessed already? But this is not a guarantee. so we need to seperate forecaasts in to behavior-driven and process driven forecasts.
- CF planning and service planning, different dashboards?
- How well known are high utilization periods
- Make a list per app
- per service
- note the statistics of the app (framework/end user/etc.. )
- for network utilization concerns we can collocate apps with different utilization periods on the same platform.
- load tests new apps to get usage patterns, ensure dashboard and metrics are being collected during the test, we can get a footprint here.
- real time isn't the key, historical and aggregated data is what matters.
- but being able to report on and spin up capacity quickkly is important
- our work would be the basis for autoscaling activities in the platform
- most infrastructures are under peak load for under an hour or 2 per day, a widget that displays the peak load hour over time would be interesting.
- How about a visualizaton that shows peak usage windows per application
- over time
- on average
- network utilization
- memory and cpu utilization?
- can we even get at this reliably?
- do we care about quotas as well as capacity? yes,
- discriminatory pricing (lookup airline analogy)
- difference between bare metal, vm, and container, start from top or start from bottom?
- we need to come up with a list of forecasting models.
- Do we have any projections for future utilization, 6 mo? 12mo?
- Overcommit Ratio * physical cores / virtual cores per instance to figure out how many VMS of whatever flavor we can host on a compute node. We could put this in the dashboard.
- assign a cost to a flavor and we have value to display
- We could theorically show how many containers we could provide for at a given point it time.
- We could show if we have a full AZ available red/green button.
- Make sure ephemeral storage plan has enough capacity to support numbers from the VM capacity number.
- If we know what flavor is used for the the Container host this would be an interesting calculation WRT CF.
- Mixing lots of flavors can hurt performance under KVM, something about time slice utilization. Maybe a flavor variance number?
- Do we have Ceilometer Available in Juno?
- The number of variables in play is huge, we need to keep it simple to start.
- KVM with AMD overcommit 2 or 4:1 gives great density but you won't have the iops to support all of those VMS hitting local storage. References the hardware from my opentsack notes.
- Should we run Unix Bench on a test stack? Has there every been a stress test?
- Putting Like VM's on like compute nodes? Flavor + nova Schedular can ensure flavor goes to a specific host aggregate.
- How do we get meaningful metrics for network utiilization from OS? Baremetal? An applicatoin that is a resource hog isn't necessarily one with a very active API. How do we segregate these?
- what ways can we assign dollar values to dashboard?
- Want to show totaal percentage of aggregate resources in use by a given app? CF? Bosh job?
@byllc
Copy link
Author

byllc commented Dec 3, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment