Skip to content

Instantly share code, notes, and snippets.

@diyclassics
Last active March 20, 2017 14:02
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save diyclassics/637d7e78b73e4b96cab0ddc0399170ac to your computer and use it in GitHub Desktop.
Save diyclassics/637d7e78b73e4b96cab0ddc0399170ac to your computer and use it in GitHub Desktop.
Some things to consider including in a GSoC proposal for CLTK
Several prospective CLTK Google Summer of Code applicants have written recently about what the proposal should include. While successful project proposals can take many different forms, here is an outline that helps address the questions likely to come up as the proposal are reviewed:
- Abstract: It is helpful to distill your proposal into 100-200 words that define the problem, identify your solution, name the datasets necessary to do the work, and report the expected outcome of this project. On this last point, note that since this is a proposal, we do not expect you to report results—but you should have a clear idea of where you expect to be by the end of the summer. We will also need to use abstracts and brief descriptions of your project on the GSoC page if your proposal is selected.
- Proposal: This will be the bulk of your submission. Here you want to expand upon the points mentioned in the abstract, including:
- Define the problem. Depending on your project, CLTK may be different than other open source projects in that you may need to identify both a technical problem (e.g. how can we use NLP to solve this language task) and also an academic/linguistic problem (e.g. how will this task help us address research questions about our language). Note that the fact that a specific topic has not been addressed before is a good start for a proposal, but you also need to make the case that there is a further value in investing time on a computer-assisted solution to your problem. Examples of linguistic usage relevant to the project can be helpful here.
- Define the ‘landscape’. How is this problem being addressed currently, either through traditional methods or through computer-assisted methods? Name other projects that are doing similar work, perhaps parallel work in another language. At this point, you need to concisely explain why these projects are insufficient to address the problem you have defined and explain what your project will contribute to the field. You should also note here the expected audience for your work when it is complete.
- Name the datasets that you need to do your work. Be sure to credit the necessary parties, note whether the datasets are open access, and include any rights statement.
- Explain in brief the methods, algorithms, etc. you will use to address the problem. Examples are helpful here. Diagrams are also appreciated.
- Background: Let us know your background both in computer science/programming and in the language(s) you plan to work with. This does not need to be a complete CV, but rather should make the case as concisely as possible for why you are the right person to address this problem for CLTK/GSoC.
- Timeline: A detailed timeline (week-by-week is best) of your working plan for the summer should be included. This section should make clear as well what specific goals you hope to achieve by each of the midterm reviews and final review.
- Bibliography: This is optional, but since there may be an academic/linguistic component to CLTK work, references to relevant research and related projects should be documented. This helps enormously with understanding the context of your proposal. [In addition, we expect the proposals to represent fully the work of the student and expect the other work of others to be acknowledged and cited properly.]
I hope this gives a clearer idea of what we are looking for in GSoC proposals. To sum up the above in a few words (the TL;DR version, I suppose), we want to know what you are doing, why you are doing it, why *you* specifically are the right person to do it, and how you plan to get it done. And as I mentioned, it is also helpful for us to know what similar work has been done, why this project is different, and who will use it and benefit from it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment