Darius Kazemi, Feb 23 2018
After a few days of poking around at ML5 I have some recommendations for making it a healthier open source project. I am, of course, happy to sit down and talk through any of these recommendations in depth, but I thought I'd get them documented first.
These are project management, rather than technical, recommendations.
Github has a new-ish feature that lets you specify default text that appears in the textarea when a contributor opens a form to create a new issue or PR. I find that it's a great place to put a condensed questionnaire. It may overlap with your CONTRIBUTING.md
, but since a lot of people won't read that, this is a great way to get right in their face.
Issue/PR templates take the form of simple markdown files placed in a .github/
directory in the root of your project. I recommend keeping the templates very simple, just to encourage complete beginners to give you the basic information that you might not otherwise get from them. More advanced users will simply ignore the guidelines and provide you even more information than you could possibly want :)
I maintain an open source project called Shortcut that uses these templates and they've worked out very well for us! Feel free to copy or modify the issues template and the PR template.
You're using the default set of Github labels, which are mostly good but:
- remove
invalid
because that's just aggro and when would you ever use it anyway? in practice it's mostly synonymous withwontfix
- add a very simple prioritization system. On Shortcut we have labels for
priority: low
,priority: medium
, andpriority: high
. More than anything this helps when you're looking at the sea of issues during a planning session: what do we work on next? Well, probably the high pri stuff. Also, if you use your weekly meetings as a triage session, the simple act of determining what the priority is for new issues will help you better understand where your project 'wants' to go.
You need a tool to track your project and Github Issues/PRs probably doesn't cut it. Fortunately, Github itself understands this and has provided Gtihub Projects, which is basically a Trello clone inside Github.
For a simple, contrived example, look at my project sandbox. There are real open source projects of considerable size that use Github Projects but they can be hard to find. Here are some examples:
Looking at how other projects use these tools can be very instructive. For example, a lot of them have a column just for issues that need technical review. Some of them track milestones as individual projects. The whole thing is pretty flexible.
Another really nice thing about having a Github Project is that it's public-readable, so anyone who wants to keep informed of the current status of ML5 could just take a peek at it.
That said, if you want dependencies (issues that are marked as pending until another issue is solved), you cannot do that with Github Projects. I highly recommend a third party app like ZenHub, especially if you want to do something closer to a traditional agile management board with estimates and burndowns and stuff. This may be overkill for your project, though. I would recommend starting with vanilla Github Projects and then upgrading to one of these other services if you find yourself cursing the lack of features.
Just kidding. You did a really good job with this. I was able to start contributing very quickly!
Recommendations for improving documentation.
Sometimes examples are self-explanatory, but machine learning is notoriously unintuitive. I barely understand what any of the examples are doing without reading through the code. It would be good to have a relatively verbose prose description of what each example is doing.
For example, the Simple LSTM example currently reads:
A simple LSTM text generation example using a model trained on a corpus of Ernest Hemingway. Built with p5.js.
And then is followed by the demo itself, and then the (commented) code. This is okay but could be made a lot better with a little more guidance as to the details of the demo:
A simple LSTM text generation example using a model trained on a corpus of Ernest Hemingway. The "seed text" tells the text generator where to begin its predictions, a higher "temperature" will make the model spit out something more Hemingway-like but also less surprising and original, and "length" is the number of additional characters the model will generate. Built with p5.js.
Basically, your examples could use brief tutorials.
I noticed that most of the examples really don't benefit at all from p5 and can be refactored into basic DOM calls using getElementById()
, the Image()
api, etc. Maybe worth having a specific "here are some P5 examples" but I think the basic examples should be vanilla JavaScript.
This is unbelievably useful, thank you! This could probably be generalized as a guide for maintaining open source projects (beyond just ml5)! Regarding the p5 examples, I would love to have a discussion about how to manage this. One of the reasons things skewed towards p5 is that I hope to integrate it into my beginner programming classes with p5, but this is probably not helpful for the larger web audience and adds unnecessary extra stuff where simple vanilla JS makes the most sense. I see a few options here:
Thoughts?