Skip to content

Instantly share code, notes, and snippets.

@kensipe
Last active July 20, 2020 17:00
Show Gist options
  • Save kensipe/bc95c2bb51edb5e3c671 to your computer and use it in GitHub Desktop.
Save kensipe/bc95c2bb51edb5e3c671 to your computer and use it in GitHub Desktop.
DCOS Topology

Developer's View of DCOS

Mesosphere's Datacenter Operating System (DCOS) provides the best user experience possible to orchestrate and manage a datacenter. If you are an Apache Mesos developer, you are already familiar with developing a framework, however DCOS extends Apache Mesos by including a GUI, a command-line, a service packaging description and a repository that catalogs those packages. This article provides a view into the DCOS specific components to aid a developer in understanding what is necessary in order to package and provide your service on DCOS.

DCOS Environment Overview

DCOS Node Type

As shown in the diagram a common DCOS topology includes Mesos Masters, Private Mesos Agents, Public Mesos Agents, An Admin Router, Mesos-DNS, DCOS UI, DCOS Command-line (CLI), and Universe. The components new to an Apache Mesos developer are:

  • Admin Router - The admin router provides a proxy on the master so that services deployed on a private agent can expose their GUI or RESTful endpoint for administration purposes.
  • DCOS CLI - A set of python scripts that provide a command-line interface that runs on a local machine and controls a remote DCOS environment. DCOS services can extend the CLI with subcommands in order to provide a better user experiences.
  • DCOS UI - Provides a view into the datacenter of what services are running, the health status of those services and resource utilization details of the worker nodes.
  • Universe - The universe provides a catalog of services and defines how to package a service for use by DCOS

DCOS Integration Points

There are a number of ways to extend the capabilities of the DCOS all of which center around datacenter services. DCOS defines 2 types of services:

  • Native - An Apache Mesos framework which requires registration with the Mesos-Master.
  • Non-Native - A standard process that provides value to the cluster without an integration with Mesos. This seems like a standard user application deployed the distinction is this is an infrastructure service.

All of these services are deployed by Marathon and must be packaged and added to the Universe for deployment. For Native applications, it is also possible to extend the CLI and UI. Non-Native service integration support is still being worked out. For the rest of article we will assume a native service. All of these services require packaging and a service catalog.

Universe and Packaging

The use of the word universe often requires context. It is generic used to describe the authoritative packaging structure and a catalog repository regardless of where that is located. It can also be used to specific mean The Universe which is the github repository that hosts all the certified and release DCOS services. When used in the generic sense, you could fork the universe establishing your own universe, or you could clone a local copy creating a local universe. Mesosphere also has a universe which we call the Mulitverse. The multiverse is the location for all submissions of DCOS services for consideration as a datacenter service and is the staging location for experimental or staged services. All services in the multiverse are still required to meet a certain standard defined by the DCOS Service Specification. For more details on submitting a service read Universe Submission.

The standard developer workflow for creating a DCOS service would start with forking the universe and using your repo to create a service. To use your universe with DCOS assuming a forked repository at https://github.com/mesosphere/altuniverse and using the version-1.x branch, add it to your dcos source.packages config by dcos config prepend package.sources https://github.com/mesosphere/altuniverse/archive/version-1.x.zip. Follow that with a dcos package update and you are ready to go. More details DCOS CLI Specification and universe

The typical flow of a service from development to general release would look like (forked universe) -> (multiverse) -> (universe).

The Universe defines the details of packaging. The high-level steps for a developer are:

  1. name your service (ex. unicorn)
  2. establish a directory under repo/packages for your service (ex. repo/packages/U/unicorn)
  3. establish a version index directory (ex. repo/packages/U/unicorn/0)
  4. minimal add package.json, config.json, and marathon.json
  5. if you have a CLI subcommand add command.json

Recommendations and Guidance

  • In naming your service, it is best to use lower case and to avoid the use of the words like apache, mesos or dcos. For instance, the cassandra-mesos framework, the DCOS service is named cassandra.
  • The service name is the name of the directory under packages. It is indexed under a folder named after the capitalized first letter of the service name. In the example above that is U.
  • Under the service name are numbered folders which represent the next version. All files are versioned under this versioned folder.
  • After you add the json files to the index folder, there are scripts under the <universe>/scripts directory.

The universe scripts are intended to be run in order based on the first number of the script name starting with 0-validate-version.sh. If a script passes you can move on to the next script. It is necessary to run scripts 1-3 prior to trying to use the service with DCOS. The script details are:

  • 1-validate-packages.sh - validates the command, config and package json files against the schema
  • 2-build-index.sh - adds the services into the index.json file
  • 3-validate-index.sh - validates the index json file

The details of the json files can be read on the Universe ReadME page. Here is some additional guidance:

  • package.json
    • The name used as the service is the value of the name property.
    • The description should focus on your service. Any user of the service will know that it is for DCOS and Mesos.
    • Tags - This is used for user searches (dcos package search <criteria>). Add tags that distinguish your service in some way. Avoid Mesos, Mesosphere, DCOS, and datacenter, this is true for all services. For unicorns you might have "rainbows", and "mythical".
    • preInstallNotes - Is intended to present the user with information prior to starting the installation process. Of great value to the user is what are the resource requirements of this service. Unicorns take 7 nodes with 1 core each and 1TB of ram.
    • postInstallNotes - Is intended to present the user with information after the installation. Focus on providing a documentation URL, a tutorial or both.
    • postUninstallNotes - If after an uninstall there is need for further cleanup in order to reinstall again, then a link to details is what needs to be provided here. A common issue is cleaning up zookeeper entries.
  • config.json
    • The first top-level property needs to be the name of the service. (ex. unicorn)
    • A second-level (nested) property must be framework-name with a value of the service name (ex. "framework-name" : "unicorn")
    • The requirement block is for all properties that are required by the marathon.json file without a condition block (it is NOT properties that are not provided and thus must be supplied by the user)
  • marathon.json
    • The id value is encouraged to be the framework-name (ex. "{{unicorn.framework-name}}")
    • All URLs used by the service must be passed to the service via commandline or environment variable

NOTE: All services submitted to the Multiverse / Universe are required to use versioned artifacts that will not change.

Admin Router and UI integration

The default deployment of a service is on a private worker node. It is often necessary to have either configuration control or monitoring of a service, however on a private node it isn't possible for the user to access this service endpoint. The purpose of the admin router is to proxy calls on the master node to the service in the cluster. This requires the HTTP service endpoint to use relative paths for artifacts and resources. The service endpoint could provide an HTML UI, a RESTful endpoint or both. When creating a DCOS CLI subcommand it is common to have a RESTful endpoint to communicate with the scheduler service.

The integration to the admin router is automatic when a framework scheduler registers a webui_url during the registration process with the Mesos master. There are currently a couple of limitations:

  1. The url must NOT end with a backslash (/). (good ex. internal.dcos.host.name:10000, bad ex. internal.dcos.host.name:10000/ )
  2. DCOS currently only supports 1 url and port

When the webui_url is provided, the service will be listed on the DCOS UI as a service with a link. That link will be the admin router proxy url name based on a convention of /service/<service_name> resulting in <dcos_host>/service/unicorn that url will proxy the webui_url. If you provide a UI, it will be integrated with the DCOS UI such that a user can get quick access to control your service. Service health check information is provided from the DCOS service tab when:

  • There are service health checks defined in the marathon.json
  • The framework-name property is present with the value of the service
  • The package.json property of framework is "true"

There are 2 ways to provide public access to your services. One is via the admin router as described. The other is by deploying your own proxy or router to the public worker node. It is recommend to use the admin router for scheduler configuration and control allowing integration with the DCOS UI. It is also recommended to provide a CLI subcommand for command-line control of a RESTful service endpoint for the scheduler.

DCOS CLI

Here are some useful tips below the detail of creating a DCOS CLI subcommand. The

command.json schema access the service endpoint

Developer notes: There currently is no support for service dependencies

over riding the framework-name (service endpoint)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment