Skip to content

Instantly share code, notes, and snippets.

@thisismana
Last active February 26, 2020 09:03
Show Gist options
  • Save thisismana/e9463cd63fcb381dffa1dcbf6a480a07 to your computer and use it in GitHub Desktop.
Save thisismana/e9463cd63fcb381dffa1dcbf6a480a07 to your computer and use it in GitHub Desktop.

Living Docs — Self hosting considerations

Architecture

Stacks

Architecture

Delivery

Delivery

Editing

Editing

AWS: Editing Stack

  • Container Runtime: Are there special requirements?
    • Which runtimes would work on AWS? ECS, EKS, Beanstalk, EC2. Which one is prefered?
    • Network
    • Persistent Storage (why, how much)
    • Session stickyness
    • Logging (ELK-Stack possible?)
    • Secrets/Credentials management
  • Telemetry/Metrics
    • Prometheus/Grafana required, or are other tools possible (CloudWatch, Datadog, ..)?
  • AWS Managed ElasicSearch compatiblity
    • Min/Max Version
  • AWS Aurora Postgres compatiblity
    • Min/Max Version
    • Supported?
  • RDS Postgres compatiblity
    • Min/Max Version
  • AWS ElastiCache compatiblity
    • Min/Max Version

AWS: Delivery Stack

  • Container Runtime requirements (same as above).
  • autoscaling, load testing
    • What are the bottlenecks (CPU, RAM, Network, DB)?
  • Postgres: read-only replica supported for Public API to allow scaling read performance?

Playbook questions

  • Re-Sync ES from main data storage Postgres
  • Restore Postgres from Snapshot
  • Prod ⇨ Dev Data Sync
  • ES minor version update
  • ES major version update
  • (Aurora) Postgres minor version update
  • (Aurora) Postgres major version update
  • LivingDocs version update during live traffic / work hours?
    • Reproducable Build Process
    • Do we always need external help when doing releases?
    • How would we allow LivingDocs to access our infrastructure (SSM/IAM or SSH)?
    • What are the boundries for a shared operation mode? Who is allowed to do what? Who is responsible, if somethings fails?
    • What are the SLAs for the self-hosted and Saas variant?
    • Response times and on-call times?
    • Desaster recovery? MTTR, MTBF, MTTF?
    • Multi-Region support make the Editor and the Backend more resillient?
    • Deployment (blue/green, canary for testing)
    • How do Customizations complicate stuff (build-process, deployment, updates, upgrades, migrations, multi-stage)?
  • As Redis is an optional component, what happens once it becomes unavailable

Costs (eu-central-1)

As provided by Gabriel Hase

Postgres

2 Postgres Hosts with Master-Slave Replication (2 x 160USD)

  • 32GB memory, 320GB ssd disk

Savings

  • 1-year-all-upfront ~ 37%
  • 3-year-all-upfront ~ 59%

ElasicSearch

  • 3 - 5 Elasticsearch Hosts for Documents, Images and Publications (amount depending on indexed document size)
  • 32GB memory, 8CPUs, 640GB ssd disk (min 300GB, 1GB/s throughput)

It's recommeneded to use dedicated master nodes (min. 3) and data nodes. Dynamically adding/removing nodes is possible.

Savings

  • 1-year-all-upfront ~ 35%
  • 3-year-all-upfront ~ 52%

EC2

  • 4 Workers for Applications (4 x 160USD)
  • 16GB memory, 320GB ssd disk

Savings

  • 1-year-all-upfront ~ 35%
  • 3-year-all-upfront ~ 54%

Redis (optional)

Hardware requirements uncertain.

Costs summary

Instance Count CPU MEM Costs Per Hour Costs Per Month Total Costs Per Month
db.r5.xlarge 2 4 32 0.70 504 1008
r5.xlarge.elasticsearch 3 4 32 0.448 322 967
t2.small.elasticsearch 3 1 2 0.448 30 90
ECS/EC2 m5.xlarge 4 4 16 0.230 165 662

** Costs per {Hour,Month} in US$ for a single instance

Remarks (mana):

  • Database is way to overprovisioned, @welt.de we managed around 1.000 to 1.500 write queries per second (Piwik, unsampled, AMP & WWW traffic) using a single db.r4.xlarge
  • ElasticSearch also overprovisioned, @welt.de we used 3 x r5.large.elasticsearch for the production API, the editors should require significantly less
  • The worker nodes are also way to overprovisioned, @welt.de we used a ECS cluster with 10 x m5.xlarge for everything (Backends, Frontends, Feeds, Public API, ...)

Todo (Costs):

  • LoadBalancing
  • Data Transfer
    • Images
    • Video
  • Storage
    • ES
    • RDS
    • EBS
  • MAM: Image cropping + Supported Image formats (webp, png, jpg...)
  • Personell Costs
    • Building+Deploying Software
    • AWS Admin
    • Database Admin: Automation, Backup, Restore, Tuning, Monitoring
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment