Skip to content

Instantly share code, notes, and snippets.

@zircote
Last active August 22, 2016 17:11
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save zircote/6b3b973d60d53e1cae89 to your computer and use it in GitHub Desktop.
Save zircote/6b3b973d60d53e1cae89 to your computer and use it in GitHub Desktop.
My early gotchas with Apache Aurora

The mustache gotcha

When using “bound” objects in an .aurora file it is an absolute that you do not have spaces in the “mustaches”.

Examples:

  • Bad: {{ profile.my_var }}
  • Good: {{profile.my_var}}

Docker Container Snafus

When running a docker container, you must ensure that all of the dependent library for the thermos_executor.pex are present in the docker container itself. The thermos_executor runs in the container not the mesos slaves environment.

If you run Debian docker containers on Centos hosts, you will likely need to build a thermos_executor.pex for the in the Debian docker container in addition to the Centos host. You will also need a wrapper that can decide if it is a docker container and run the correct thermos_executor.pex. I wrote a simple bash script that decides if the thermos_executor.pex is a docker container or the host system and ran the correct one. Nothing complicated but effective. The good news is I saw plans are coming to move away from the libmesos.so and to pesos, this will eliminate the pain associated with this.

AWS Linux and the sasl hassle

It seems libmesos.so expects the file libsasl.so.3 to be present, the package that provides this is not available and therefore the thermos_executor.pex just dies. I managed to work around this but I rather not say how…

When the thermos_executor.pex fails to run

When it dies due to the MesosDriver it wont give you a whole lot to go on. It is my opinion when the code handles the ImportError exception it should pay that message forward to the logger. The message it gives now is far to vague to truly figure out why it failed. I managed to figure out the pain by opening a python repl and attempt to import mesos.native this should give you a good idea of what is really broke.

What are all these .pex files

When you build all the parts of Apache Aurora, you will see all sorts of .pex files that need to be put in various places. It can be a challenge to discern the purpose of them all to the uninitiated. I will give a list and a general idea of what they each do, keep in mind I am still learning a lot of this myself.

  • thermos_executor.pex: This is the executor, it is the governor of all mesos tasks that are scheduled. It resides on the mesos slaves, at the path you configured in your aurora-scheduler arguments.
  • gc_executor.pex: This is the janitor, it is scheduled by the aurora-scheduler to seek and clean all cruft and clean up space, reap dead jobs that are lingering around the mesos slaves. It to resides on the mesos slaves in a path that you define in the aurora-scheduler arguments.
  • thermos_observer.pex: This tool is a service that runs on each mesos slave it allows the consumers of aurora to look at the details of each process and status of the jobs.
  • aurora.pex: This is the tools to schedule jobs, it may run on a machine that can speak to the mesos hosts using LIBPROCESS_HOST:LIBPROCESS_PORT It has a -h flag.
  • aurora_admin.pex: This is used for administrator level tasks in the aurora cluster, it is where you set quotas for roles as well as other features. It has a -h flag

Diagnosing trouble

A good place to start is the logs in the mesos slave for such things as LOST tasks etc. It is generally a failure with the thermos_executor.pex I have found; beyond that it’s probably a broke .aurora file that you generally can discern by observing the log entries for each process in the thermos_observer.

@zircote
Copy link
Author

zircote commented Apr 9, 2015

This is by no means exhaustive, and I plan to edit and update as I find them.

@bhouse
Copy link

bhouse commented Apr 26, 2015

very helpful, thanks! 😃

@SEJeff
Copy link

SEJeff commented May 13, 2015

@zircote: See my fork for another one on tasks that fail to run due to constraints.

@zircote
Copy link
Author

zircote commented Jul 30, 2015

@SEJeff: thank you and sorry for the late gratitude...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment