Skip to content

Instantly share code, notes, and snippets.

@dariusk
Created February 23, 2018 17:10
Show Gist options
  • Save dariusk/93c09b44e7ac1548f174cc61d02ee93c to your computer and use it in GitHub Desktop.
Save dariusk/93c09b44e7ac1548f174cc61d02ee93c to your computer and use it in GitHub Desktop.
Recommendations for ML5

Recommendations for ML5

Darius Kazemi, Feb 23 2018

After a few days of poking around at ML5 I have some recommendations for making it a healthier open source project. I am, of course, happy to sit down and talk through any of these recommendations in depth, but I thought I'd get them documented first.

Project management

These are project management, rather than technical, recommendations.

Issue and pull request templates

Github has a new-ish feature that lets you specify default text that appears in the textarea when a contributor opens a form to create a new issue or PR. I find that it's a great place to put a condensed questionnaire. It may overlap with your CONTRIBUTING.md, but since a lot of people won't read that, this is a great way to get right in their face.

Issue/PR templates take the form of simple markdown files placed in a .github/ directory in the root of your project. I recommend keeping the templates very simple, just to encourage complete beginners to give you the basic information that you might not otherwise get from them. More advanced users will simply ignore the guidelines and provide you even more information than you could possibly want :)

I maintain an open source project called Shortcut that uses these templates and they've worked out very well for us! Feel free to copy or modify the issues template and the PR template.

Reorganize your labels

You're using the default set of Github labels, which are mostly good but:

  • remove invalid because that's just aggro and when would you ever use it anyway? in practice it's mostly synonymous with wontfix
  • add a very simple prioritization system. On Shortcut we have labels for priority: low, priority: medium, and priority: high. More than anything this helps when you're looking at the sea of issues during a planning session: what do we work on next? Well, probably the high pri stuff. Also, if you use your weekly meetings as a triage session, the simple act of determining what the priority is for new issues will help you better understand where your project 'wants' to go.

Have some kind of project tracking tool

You need a tool to track your project and Github Issues/PRs probably doesn't cut it. Fortunately, Github itself understands this and has provided Gtihub Projects, which is basically a Trello clone inside Github.

For a simple, contrived example, look at my project sandbox. There are real open source projects of considerable size that use Github Projects but they can be hard to find. Here are some examples:

Looking at how other projects use these tools can be very instructive. For example, a lot of them have a column just for issues that need technical review. Some of them track milestones as individual projects. The whole thing is pretty flexible.

Another really nice thing about having a Github Project is that it's public-readable, so anyone who wants to keep informed of the current status of ML5 could just take a peek at it.

That said, if you want dependencies (issues that are marked as pending until another issue is solved), you cannot do that with Github Projects. I highly recommend a third party app like ZenHub, especially if you want to do something closer to a traditional agile management board with estimates and burndowns and stuff. This may be overkill for your project, though. I would recommend starting with vanilla Github Projects and then upgrading to one of these other services if you find yourself cursing the lack of features.

Rewrite your CONTRIBUTING.md

Just kidding. You did a really good job with this. I was able to start contributing very quickly!

Documentation

Recommendations for improving documentation.

Provide a story for each example

Sometimes examples are self-explanatory, but machine learning is notoriously unintuitive. I barely understand what any of the examples are doing without reading through the code. It would be good to have a relatively verbose prose description of what each example is doing.

For example, the Simple LSTM example currently reads:

A simple LSTM text generation example using a model trained on a corpus of Ernest Hemingway. Built with p5.js.

And then is followed by the demo itself, and then the (commented) code. This is okay but could be made a lot better with a little more guidance as to the details of the demo:

A simple LSTM text generation example using a model trained on a corpus of Ernest Hemingway. The "seed text" tells the text generator where to begin its predictions, a higher "temperature" will make the model spit out something more Hemingway-like but also less surprising and original, and "length" is the number of additional characters the model will generate. Built with p5.js.

Basically, your examples could use brief tutorials.

Remove p5 from examples where possible

I noticed that most of the examples really don't benefit at all from p5 and can be refactored into basic DOM calls using getElementById(), the Image() api, etc. Maybe worth having a specific "here are some P5 examples" but I think the basic examples should be vanilla JavaScript.

@shiffman
Copy link

This is unbelievably useful, thank you! This could probably be generalized as a guide for maintaining open source projects (beyond just ml5)! Regarding the p5 examples, I would love to have a discussion about how to manage this. One of the reasons things skewed towards p5 is that I hope to integrate it into my beginner programming classes with p5, but this is probably not helpful for the larger web audience and adds unnecessary extra stuff where simple vanilla JS makes the most sense. I see a few options here:

  1. Maintain a separate repo with a p5 + ml5 examples.
  2. Maintain two sets of examples here with ml5 itself.
  3. Use p5 only where p5 shines (i.e. if we're making heavy use of canvas drawing, webcam capture, loading a CSV, etc.) but otherwise stick with vanilla JS.

Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment