Created
December 3, 2018 20:58
-
-
Save byllc/654335ae29649062ee2571e9ee2ca32f to your computer and use it in GitHub Desktop.
CapacityPlanningForOpenstack.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## Capacity Planning Notes | |
#### Is your target Public cloud or private cloud? | |
- Private Cloud | |
- easier | |
- you know your workload and purpose | |
- Fewer variables | |
- Cost / Revenue is an important consideration | |
- often single tenant | |
- Provisioning normallly easier / slower churn | |
- Public | |
- Designing for a generic use case and workload. | |
- Important Openstack Considerations | |
- network model | |
- neutron nodes vs. DVR | |
- Assuming 10G Networking? | |
- Floating IP's (None Now) | |
- Future SDN provider? | |
- Are we providing HA Compute? | |
- Trade off is performance. | |
- Volumes are mounted from SAN? | |
#### Hopes and dreams | |
- We want to encourage every project being developed from the ground up for redundancy. | |
#### Questions For US | |
- What configuration management and provisioning tools are in use besides Bosh. | |
- What other monitoring tools are in place in the organization besides waht I've seen in the Grafana stack. | |
- What hypervisor is being used? KVM? HyperV? | |
- Overcommit fractions * physical cores / virtual cores per instance to figure out how many VMS of whatever flavor we can host on a compute node. | |
#### Questions for BC | |
- Look up performance characteristics of different VCPU to core ratios. | |
#### Cloud Controller Notes | |
- Box that hosts yoru db/messageque/and endpoits | |
- Is our CC HA? | |
- How active is use of the OS API? | |
#### Neutron notes? | |
- How overloaded are the network nodes? (not very) | |
#### Thoughts | |
- getting good and reliable data in a virtual environment can be a tremendous effort. | |
- assume cloud users will follow similar patters to what we've witnessed already? But this is not a guarantee. so we need to seperate forecaasts in to behavior-driven and process driven forecasts. | |
- CF planning and service planning, different dashboards? | |
- How well known are high utilization periods | |
- Make a list per app | |
- per service | |
- note the statistics of the app (framework/end user/etc.. ) | |
- for network utilization concerns we can collocate apps with different utilization periods on the same platform. | |
- load tests new apps to get usage patterns, ensure dashboard and metrics are being collected during the test, we can get a footprint here. | |
- real time isn't the key, historical and aggregated data is what matters. | |
- but being able to report on and spin up capacity quickkly is important | |
- our work would be the basis for autoscaling activities in the platform | |
- most infrastructures are under peak load for under an hour or 2 per day, a widget that displays the peak load hour over time would be interesting. | |
- How about a visualizaton that shows peak usage windows per application | |
- over time | |
- on average | |
- network utilization | |
- memory and cpu utilization? | |
- can we even get at this reliably? | |
- do we care about quotas as well as capacity? yes, | |
- discriminatory pricing (lookup airline analogy) | |
- difference between bare metal, vm, and container, start from top or start from bottom? | |
- we need to come up with a list of forecasting models. | |
- Do we have any projections for future utilization, 6 mo? 12mo? | |
- Overcommit Ratio * physical cores / virtual cores per instance to figure out how many VMS of whatever flavor we can host on a compute node. We could put this in the dashboard. | |
- assign a cost to a flavor and we have value to display | |
- We could theorically show how many containers we could provide for at a given point it time. | |
- We could show if we have a full AZ available red/green button. | |
- Make sure ephemeral storage plan has enough capacity to support numbers from the VM capacity number. | |
- If we know what flavor is used for the the Container host this would be an interesting calculation WRT CF. | |
- Mixing lots of flavors can hurt performance under KVM, something about time slice utilization. Maybe a flavor variance number? | |
- Do we have Ceilometer Available in Juno? | |
- The number of variables in play is huge, we need to keep it simple to start. | |
- KVM with AMD overcommit 2 or 4:1 gives great density but you won't have the iops to support all of those VMS hitting local storage. References the hardware from my opentsack notes. | |
- Should we run Unix Bench on a test stack? Has there every been a stress test? | |
- Putting Like VM's on like compute nodes? Flavor + nova Schedular can ensure flavor goes to a specific host aggregate. | |
- How do we get meaningful metrics for network utiilization from OS? Baremetal? An applicatoin that is a resource hog isn't necessarily one with a very active API. How do we segregate these? | |
- what ways can we assign dollar values to dashboard? | |
- Want to show totaal percentage of aggregate resources in use by a given app? CF? Bosh job? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
https://www.ctl.io/blog/post/capacity-planning-cloud-platform/