davelester/gist:b40e3b5d0347540abf36

## gistfile1.md

      
    Raw
  

              gistfile1.md
            
          
    What features do you think are missing from Mesos?


Being able to run inside Docker. Problem is that you cannot define public ip address. When Mesos startups in Docker container it only see a private ip address. But it have different public ip address.
Stronger containerisation (Docker as first class citizen).
Documentation - HA setup - Writing custom scheduler and executors - examples in mesos/frameworks are out of date - samples in mesos/src/examples are too simple Docker integration (I know this has been merged recently). I am using custom Docker executor and am looking forward to using Deimos. Better packaging - Mesosphere requires custom egg installation. - Mesos slave and master should be separate packages - Java dependency should be optional
None :)
Dynamicly update role weight. Task Preemption. Framework resource limitation.
Mostly ease of use features for EC2 The mesos master should be dead simple to point an ELB at. It is not. Another feature I found missing was in the mesos-spark integration. It seems there is no way to specify the port or port range used by spark tasks on the mesos slave to recieve callbacks from the mesos master. So you have to keep quite a wide port range open in your security groups and iptables, or muck around with sysctl which isnt always an option. Not positive that this is a mesos issue, but I do see in the mesos code for UPID looks like it defaults to port=0 which does the random selection thing.
Better tooling around total cluster utilization and reporting on job runs.
logs managment (we used to use log4j things) or a better documentation about it
Documentation. Up-to-date documentation. Documentation on the architecture and scheduling that's more than excerpts from an academic paper. Diagnostic information when something's gone wrong. Better APIs. More control over logging. Have I mentioned better documentation?
The web interface is awesome, I would love some kind of proxy capability in the master (or a separate process) to pass through the slave web UI. As it is today, it's difficult to expose the web interface through a proxy since every slave has to be set up independently. It's only a minor issue, but everything else works great!
-Framework scheduler as a mesos task -Windows support -Documentation on configuration regarding when offers are made to what frameworks. -Support for inter-framework communication (IE, this framework is a distributed queue, that framework produces, that other one consumes)
Better support for "long running processes"
The docs are rough. There are quite a few intro docs, but they have typos (at least on mesosphere). Then there is a reference, but not much about why you would do things the way you should. I get the impression much of the docs that more advanced basically start with an implied "So you used to work at Google...", but I didn't. recover frameworks/jobs failure, migrating frameworks/jobs
Mostly non-technical related. 1.) Formalized devel+stable release cadence and processes. 2.) Roadmap as it relates to (1). 3.) Turn-key documentation from the code. It would be nice, but I know it's not easy. 4.) Periodic community IRC meetings would be nice.
More complete documentation. Better custom native c++ framework documentation. Better marathon's app deployment process (!). For our company to really use Mesos/Marathon in production, it really must support continuous deployment of apps in Marathon. Too sad this is still in early stage. We keep an eye on it. Better support for containers(docker)
Centralized logging Better management Embeed Marathon
Documentation. There is also barely any useful information to provide colleagues who need to be convinced of its usefulness.
Pretty much what's on the road map. :) - Flexibility to alter resource limits for running tasks. - Draining and "maintenance mode" (like Aurora, but for Mesos). - More ops-friendly tooling (ps, kill, top, stuff like that) would be nice. - The QoS-aware resource scheduling in the Quasar paper looks really interesting...
Better diagnosis and safety belts regarding task launching and execution UI for managing large numbers of tasks (hundreds or thousands) with ways to group related tasks, see e.g. "10/12 tasks in the User Service group functioning normally" see tasks that should-be-launched but can't because of capacity
Better CLI tools, and web authentication for their UI.
advanced monitoring of tasks better tutorials for building frameworks
ACL & Authentication, integration with jaas systems ( example: atlassian crowd) , integration with Kafka, amongst others
Not sure yet.
blueprint how to make autoscaling with mesos when running in private datacenter (there are two parts to that: 1. autoscaling within framework which is really framework dependent, however would be nice to give some examples with marathon or other custom framework 2. autoscaling mesos-slaves, when mesos is running out of resources and offers are queueing)


Complex Set up slows down adoption. Incompatibilities with Spark depending on version. - Waiting for the Container support in 0.19. I think Docker can be a huge driver of Mesos adoption as fleet is fairly limited and immature. - Production quality frameworks for Kafka, elastic search. - Easier integration with hadoop map reduce. It seems that Yarn has most of the momentum. - Release timeline. It should be clear when new versions will come out and what features they will have. - It's not clear if you can/would run DBs like mysql, postgres or mongo on Mesos.


More stable and easy-to-deploy frameworks. Also there's a lot of space for improvement in documentation.


Security -- enterprise authentication, restricting users 2. Fair play between different frameworks running in same Mesos cluster 3. Better functionality-rich UI


Binary releases for easy installation. Compare the source tarballs from https://mesos.apache.org/downloads/ to the binary packages at http://mesosphere.io/downloads/ I recently freshened the Apache Spark documentation for running Spark on Mesos, and we wanted to be able to point to binary releases straight from Apache but they don't exist: apache/spark#756 (comment) Other than that, I think documentation for how to get up and running from scratch with things like Mesos, Spark, Marathon, Chronos, Aurora, etc should be much more streamlined.


A simpler and more robust installation mechanism. - First class Docker support.


Not really a feature, but better marketing. Most engineering teams today do not understand what Mesos is, yet they use MRV2+Yarn. Arguments need to be made as to what the positives are of choosing Mesos (and the Berkeley stack in general) over a Hadoop/Yarn centered platform. I want to see community members be a bit more brash and critical of competitors - one of the major benefits of Mesos is the simplicity in deployment and use vs the Hadoop mess. IMO, Mesos by itself is an awesome piece of software, but it's the services that run on it out of the box that will convert people. Perhaps a pre-packaged Spark+Mesos distribution offered by supported like Mesosphere would be helpful.
Improved documentation around the framework development.
Specifying an externally addressable IP for the masters. You can pass a hostname to use for web UI-related things, but you can't pass an IP that other nodes should use to address a given master. This prevents masters from being addressable from outside their network, and precludes running the master inside something like a docker container. See https://issues.apache.org/jira/browse/MESOS-809 for discussion
security, multi-DC, improved scheduling, preemption, service-discovery, etc.
Dead simple way to plug Mesos to external monitoring frameworks e.g. plug Mesos to Nagios, Cacti, etc. * Official packaging distributions RPMS, DEB, or images. * Binary caching for executors. * Native integration with Docker. * Additional CLI commands that will let you manage the applications/frameworks deployed as well as identify potential issues (this includes killing a framework). * Scheduler that combines user/apps privileges (ACL) with a lower and upper band of resources for frameworks such that it dynamically readjusts the resources granted to the frameworks according the available resources and application privileges.
Documentation
Hooks into Node.js.
FreeBSD integration is a biggest issue at the moment.
Easy set-up and deployment. I'd like to have a tool-of-your-choice which provisions mesos, zookeeper, hdfs, and allows easy additions of slaves. Finally, documentation could be improved: mesosphere is very helpful here, but something similar to what docker is doing with its docs will help.
authorization framework integration for running tasks Z X Y
Load balancing.
Lots of room for improvement in the documentation.
It would be handy to have the ability to: * specify that jobs run collocated on a node * provide tagging of nodes so that jobs can specify tags to execute with
More resource isolation: disk I/O and network I/O