Skip to content

Instantly share code, notes, and snippets.

@aparnachaudhary
Last active November 10, 2018 08:25
Show Gist options
  • Save aparnachaudhary/2a6b385ee749e0a25b78daece0e477d6 to your computer and use it in GitHub Desktop.
Save aparnachaudhary/2a6b385ee749e0a25b78daece0e477d6 to your computer and use it in GitHub Desktop.

MesosCon EU 2017

  • Nested Containerization (via UCR)

    • Zip/Docker/Jar

    • Jenkins

    • K8

  • Local and External Resource Providers

  • Major installations are on-premise

  • Hierarichal fair-share, quota and reservations *

Building Production Applications for Maximum Debuggability

  • Use environment variables

  • Make application configurable

  • Mesos DNS

  • Minuteman

  • Virtual Network/Overlay with Navstar

  • ContainerLogger concept (written in C++)

Health Checks:

Mesos agent can be shifted to maintainance mode. But marathon does not re-schedule applications. Applications go down with the node and then restarted on other node.

One click provisioning, VIP are good features with DC/OS.

Marathon Performance Tips

  • Metrics

    • Increase metrics gathering interval to 55 seconds i.s.o. 10 seconds default

    • Marathon switched to Kamon from DropWizard

  • Tune JVM

    • Marathon written in Scala (JVM based) and uses Akka?

    • Enable GC logging

    • Sensu provides good summary of GC

    • Marathon Thread Pool

    • IO operations set to 100

    • Maximum threads set to 64

  • Optimize ZooKeeper

  • Update to 1.3.13*

  • Health Checks

    • Use MESOS HTTP health check

    • Marathon HC could not scale beyond 2000 tasks

  • Do not use Event Bus

  • Prefer batching

  • Shard your marathon deployment

Keynotes

  • Use cpu-set for critical apps (CPU isolation)

  • bin-packing - understand the concept *

Custom Executors

  • Types of executors

    • Command

    • Docker

    • Default

    • Custom

  • Marathon application lifecycle phases (See slides) *

VAMP

  • Breed, Blueprint, Deployment

  • Gateway

  • Metrics and Events

  • Workflows - pieces of javascript nodejs

  • User agent based routing e.g. Safari gets new version, others old

  • Similar concept could be used also for Accept Header based routing

  • Conditions are matched first and then the weights

Marathon Roadmap

  • MESOS_HTTP - Provides health check from agent i.s.o. scheduler

  • Marathon HTTP health checks increases network traffic; mesos health checks run on localhost

  • More scalability

  • In case of N/W partition, tasks would still remain healthy

  • Since marathon remains busy performing health checks, it cannot do its core job of orchestration in case of large clusters

  • Marathon /v2/queue provides rejectedOffers summary

  • Secrets API - read from specific file i.s.o. using environment variables

  • What’s Next?

    • CSI - storage interface

    • Fault domains and cloud busting - currently done using labels; first class primitives to enable domain awareness

    • Hierarchichal role support

  • IPv6 support once Mesos is ready with this support; currently marathon performs validations for IPv4

  • Metronome

    • Uses Marathon as a library and hence always behind in the release

  • Improving Deployments

    • Support for Canary/ Blue-Green

    • More pluggable Java API

    • Merge Metronome and Marathon to bring scheduled jobs to Marathon

  • Someone proposed to introduce INGRES HEALTH CHECK, Edge node can reach Marathon tasks.

sysdig

  • sysdig falco - open source

  • sysdig secure - commercial

Multi-DC replication for ArrangoDB

  • Multi-model database

    • JSON, KeyValue, Graph

  • Replication Options

    • Periodic backups

    • Active-Passive

    • Active-active across DC

DC/OS to DC/OS Replication * Synchronous Agents across regions Masters - difficult because of ZK * Asynchronous ** Linker Module/Aware

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment