Skip to content

Instantly share code, notes, and snippets.

@tusharmath
Last active January 6, 2018 06:26
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tusharmath/a8ada510ff0bea0336267c3a83065740 to your computer and use it in GitHub Desktop.
Save tusharmath/a8ada510ff0bea0336267c3a83065740 to your computer and use it in GitHub Desktop.
note.scala-cheatsheet

#Core Idea

  • easy to start with and tough to master
  • brainstorming

#d11hack

We started off brain storming almost a week prior to the event about various problems that might get people interested in them. The [Web Crawler](link to the problem statement) was one which was the right balance of complexity and fun and adhered to our philosophy of

A good game is easy to start with but tough to master.

  • Graph traversal problem
  • Little bit of web

Building a web crawler is essentially a Graph traversal problem. You have pages that link to other pages. Its possible some pages link back and you end up in a cycle. Thus one needs to keep a reference of all the pages that have been visited and make sure that they are not visited again.

Going to a particular page takes time and crawling it serially is not the most efficient way. Ideally you should be able to parallize as much as possible, ie. since each page has links to multiple other pages, its better to request them in parallel and continue parsing as an when the response is received.

  • Travis Running time
  • Benchmarking set
  • Git Repos Setup
  • Plagiarism
    • Open sourced 4 days before JSFoo
    • Branches contained solutions

To make people's lives easier we created an OS github repo with the boilerplate code and a unit test. At dream11 we take TDD to heart :P. So we added unit tests before hand to help developers test/benchmark their code once they are done. We also integrated the project with Travis so that your code could be automatically executed on a standard machine setup. This also helped us saving a lot of our time later when we added our hidden test cases ;)

Initially we were worried about plagiarism since everyone had to publish their solutions by raising a pull request to an open source project. But then we realized that most devs takes days to understand what other's have coded and given this tight deadline in the middle of JSFoo people would perfer doing it themselves rather than copying. We even went to the limit of publishing the project almost a week before JSFoo actually started :D PS: You might want to keep a watch on your github repo for the next challenge

  • Lexicographically smallest
  • With some latency/IO
  • Throttled server
  • Any library
  • Any hacks
  • Hidden Test Cases
  • Last minute test cases
  • Travis back merges
  • Server throttles/delays/Connection Reset etc.
  • Failures post hidden tests disappointment.

We added some caveats to the challenge, for instance — While crawling you also need to find the lexicographically smallest word on the website. The webserver would automatically start throttling the response after sometime and eventually send a 429 response. You could use any library but had to pass all our test cases. Initially we only had one so that people could get started with. Observed people were solving it in 10seconds where as our per our internal benchmarks it should take atleast 34seconds to solve the problem. At the end of day one we added some more test cases to be fair to those who were taking longer to crawl the website. We made this change and backmerged all the 21 pull requests that we had received on day one. Unsurprisingly one two of them passed all cases and other just the case they had hard coded the results for.

Our deadline was 3:30pm on day two of JSFoo and we had announced that we will be adding more test cases at 3:30 so make sure your solution is a generic one and not, something that works of just one test case. Even though initially people we disappointed most of the them updated their code to make it work for all cases.

  • Crazy hacks!

People submitted amazing solutions and most of them were able to do it under 30 seconds. This was much less than our original estimation of 34 seconds. People were trying to beat each other by throttling requests based on the size of the graph. One of the solutions exploited the hints that the server returns in a special header X-RateLimit-Limit and X-RateLimit-Remaining depicting when it was going to throttle next.

  • Benchmark cases

To benchmark we considered on the the first test case which was given on day 1 for which people had optimized the most. Though the submission was supposed to pass all the test their run time was not considered to create the leaderboard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment