Skip to content

Instantly share code, notes, and snippets.

@SalehDehqanpour
Last active November 18, 2023 12:06
Show Gist options
  • Save SalehDehqanpour/6b3bacaf0d3272740503339ff345de3c to your computer and use it in GitHub Desktop.
Save SalehDehqanpour/6b3bacaf0d3272740503339ff345de3c to your computer and use it in GitHub Desktop.
Netbox Interview Question

Netbox Interview Question

Solve only one of these problems and send its problem number and your solution via email. Earlier problems are a bit more challenging and hence appreciated. So first try to solve problem 1, and only if you could not solve it gracefully, proceed to the next problem.

In addition to a correct response, follow practices of clean code and refactor it neatly. The quality of your solution is of paramount importance.

Problem #1

We need a scrapy project to crawl filmnet.ir. The desired solution should be able to crawl movies from this service. (Only crawl the latest ~100 movies or so) At least these fields are needed.

  • title (both Persian and English)
  • summary (both Persian and English)
  • publish date
  • release year
  • rate
  • duration
  • link of the item on the platform (for this problem link on filmnet)
  • a list of item's genres

Scraping further data as much as you can is highly encouraged. Use Django ORM and sqlite database to store data. You need to create a Django model for movie. Name it "Movie". Please include a very brief "readme" on how to run your crawler.

Requirements and tips:

  • Bring only movies. (no series and episodes)
  • You should first find the APIs you need to fetch data. See what requests are being sent in filmnet webapp. If you struggle with this, read through this link.
  • Downloading and saving photos are required. Read this link to learn about that.
  • Saving items into database should be performed in a pipeline.
  • NEVER use selectors that look like gibberish (like .e1eum8tf0). These are very fragile and will be updated in the next update of the target website!
  • Nice git commits is a plus (No need for a git remote)
  • Getting a list of artists for each movie is a plus.
  • Using scrapy item loader and its input/output processors is a plus.
  • Configuring django admin panel to see your crawled data is a plus.
  • Readability counts. Use comments or docstrings wherever necessary.
  • No selenium or any other tools should be used. Just use scrapy.
  • You can use postman as a great tool to easily work with APIs.

Problem #2

Do the same thing stated in problem #1 for namava. This problem is a little easier since the links are provided for you here:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment