Skip to content

Instantly share code, notes, and snippets.

@ZubairLK
Created December 6, 2015 07:39
Show Gist options
  • Save ZubairLK/ca658bb5b9f4258a2dd3 to your computer and use it in GitHub Desktop.
Save ZubairLK/ca658bb5b9f4258a2dd3 to your computer and use it in GitHub Desktop.
Downloading all of sabaq.pk lectures
sabaq.pk has lectures in urdu for high school.
They are hosted on daily motion.
The easiest way would be to download them all using youtube-dl and the sabaq user.
However they are not sorted as playlists, and the video titles are not numbered.
And daily motion has an issue. It only shows 1800 videos on a 100 user pages.
Any more video pages get redirected to the 100th page.
Try http://www.dailymotion.com/user/sabaqpk/101
All the major sorting is done on sabaq.pk website.
Their site-map.php has all links of the website in an xml.
Extracting the video links is doable using grep to only get the video lines.
And then gedit/replace all to remove the header/footer.
Some other manual replacement. As those links had a few extra/different characters than the site links.
After getting a file with all video pages. It is easy to use youtube-dl to batch download the whole file like this
youtube-dl -a videos.txt -o '%(autonumber)s/%(playlist_index)s_%(title)s_%(id)s.%(ext)s' --restrict-filenames
autonumber is necessary. All files are now sorted as sequential numbers like in videos.txt
Then we'll need a script to parse the url and rename all the folders to proper names.
Another issue. Daily motion servers restrict bandwidth I think. Because I can only seem to download from them at around 70kbits/s.
While any youtube downloads are super fast in 2-3Mbps..
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment