When using “bound” objects in an .aurora
file it is an absolute that you do not have spaces in the “mustaches”.
Examples:
- Bad:
{{ profile.my_var }}
- Good:
{{profile.my_var}}
When running a docker container, you must ensure that all of the dependent library for the thermos_executor.pex
are present in the docker container itself. The thermos_executor
runs in the container not the mesos slaves environment.
If you run Debian docker containers on Centos hosts, you will likely need to build a thermos_executor.pex
for the in the Debian docker container in addition to the Centos host. You will also need a wrapper that can decide if it is a docker container and run the correct thermos_executor.pex
. I wrote a simple bash script that decides if the thermos_executor.pex
is a docker container or the host system and ran the correct one. Nothing complicated but effective.
The good news is I saw plans are coming to move away from the libmesos.so and to pesos, this will eliminate the pain associated with this.
It seems libmesos.so
expects the file libsasl.so.3
to be present, the package that provides this is not available and therefore the thermos_executor.pex
just dies. I managed to work around this but I rather not say how…
When it dies due to the MesosDriver
it wont give you a whole lot to go on. It is my opinion when the code handles the ImportError exception it should pay that message forward to the logger. The message it gives now is far to vague to truly figure out why it failed. I managed to figure out the pain by opening a python repl and attempt to import mesos.native
this should give you a good idea of what is really broke.
When you build all the parts of Apache Aurora, you will see all sorts of .pex files that need to be put in various places. It can be a challenge to discern the purpose of them all to the uninitiated. I will give a list and a general idea of what they each do, keep in mind I am still learning a lot of this myself.
thermos_executor.pex
: This is the executor, it is the governor of all mesos tasks that are scheduled. It resides on the mesos slaves, at the path you configured in your aurora-scheduler arguments.gc_executor.pex
: This is the janitor, it is scheduled by the aurora-scheduler to seek and clean all cruft and clean up space, reap dead jobs that are lingering around the mesos slaves. It to resides on the mesos slaves in a path that you define in the aurora-scheduler arguments.thermos_observer.pex
: This tool is a service that runs on each mesos slave it allows the consumers of aurora to look at the details of each process and status of the jobs.aurora.pex
: This is the tools to schedule jobs, it may run on a machine that can speak to the mesos hosts usingLIBPROCESS_HOST:LIBPROCESS_PORT
It has a-h
flag.aurora_admin.pex
: This is used for administrator level tasks in the aurora cluster, it is where you set quotas for roles as well as other features. It has a -h flag
A good place to start is the logs in the mesos slave for such things as LOST
tasks etc. It is generally a failure with the thermos_executor.pex
I have found; beyond that it’s probably a broke .aurora
file that you generally can discern by observing the log entries for each process in the thermos_observer
.
This is by no means exhaustive, and I plan to edit and update as I find them.