Skip to content

Instantly share code, notes, and snippets.

@tsanak
Last active January 22, 2018 10:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tsanak/26ea3ab9a70d79c2df66be0697f6aedb to your computer and use it in GitHub Desktop.
Save tsanak/26ea3ab9a70d79c2df66be0697f6aedb to your computer and use it in GitHub Desktop.
Scraping Udacity lessons with Chrome's console

Returns an array of maps with the following format:

[
	{
		chapter: 0,
		html: the html of the specified container from the first lesson
	},
	//.
	//.
	//.
	{
		chapter: number of the last lesson,
		html: the html of the specified container from the last lesson
	}
]

The output is stringified, so you need to use JSON.parse in order to view the html from each lesson

To run this, you need to navigate to the lesson you want to scrape & go to its first chapter LessonList is the list on the left sidebar that you can navigate through the lesson's chapters The hash[ '.index--contents-list--33vHB' ] will probably change so you need to change it when you use it

var chapterList = document.querySelectorAll('.index--contents-list--33vHB');
var lessonObj = [];

//Recursive function with a timeout of 5 seconds to avoid flooding udacity
function clickAndGetContent(i) {
	var l = chapterList[0].children[i].children[0];
	l.click();
	setTimeout(function() {
		//Main container of the video & all the content [this hash will probably change too]
		var currentHTML = document.querySelectorAll('._main--content-container--ILkoI');
		var outer = currentHTML[0].outerHTML;
		lessonObj.push({
			'chapter': i,
			'html': outer
		});
		if(i < chapterList[0].children.length - 1) {
			clickAndGetContent(++i);
		}
		else {
			console.log('Finished scraping');
		}
	}, 5000);   
}

clickAndGetContent(0);

Once the above function finishes, run this command in the console to get the array on your clipboard

copy(JSON.stringify(lessonObj));
@npanagop
Copy link

Python script to parse the stringified json object into separate html files for each chapter.

import json
#change 'jsonData' to the filename where you saved the data of lessonObj 
with open('jsonData') as f:
    data = json.load(f)
    for dat in data:
        filename = 'chapter-'+str(dat['chapter'])+'.html'
        with open(filename, 'w') as newF:
            newF.write(dat['html'].encode('utf-8'))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment