Skip to content

@jamaher /c_get.php forked from macias/CHANGELOG
Created

Embed URL

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Coursera Getter [download video lecture]
<?php
/* ===================== COURSERA GETTER ============================
tags: [coursera video download] [coursera lecture download]
CHANGELOG:
---------
2012-07-05 * initial release of Coursera preview getter
2012-06-14 * added little control if there is sufficient number of arguments
2012-06-11 * UTF-8 in filenames are supported
(another module for PHP is required -- mbstring)
* replaces slash and backslash with underscore
2012-06-07 * changed this info, added another way of getting cookies
2012-06-06, 2 * extensions casing reverted -- they matter again
* directories are named according to Lectures sections
* handles multiple files for given file type
* files are counted within each directory, not within entire course
2012-06-06 * extensions can be given lower/upper-case, they do not matter
2012-06-05, 2 * creates weekly subdirectories and puts the files in there
2012-06-05 * initial release
WHAT IT DOES:
------------
* it parses given course Lectures page
* it extracts all the desired content (links for videos, slides, etc)
* it uses consistent naming of the files
* it replaces colon with period (hello Windows users)
* it finally creates a bunch of wget command ready to execute
* it ignores already existing files, so it is safe to rerun wget script just to get missing files
(note this might be not true if you update this script, because of possible change in naming convention)
WHAT YOU NEED:
-------------
1. proper shell (Windows users -- of course I recommend switching to Linux entirely, but as a workaround Cygwin should be fine -- I don't know how about the tools I mention below)
2. wget (in openSUSE `sudo zypper in wget`)
3. php5 (in openSUSE `sudo zypper in php5`)
4. php5-openssl (in openSUSE `sudo zypper in php5-openssl`)
5. php5-mbstring (in openSUSE `sudo zypper in php5-mbstring`)
6. and an adventurous soul -- in Firefox, go to Edit/Preferences/Privacy/Remove Individual Cookie (don't freak out!) search for "coursera". Several items should appear -- look for key session for the site you would like to download (for example "nlp"). Copy the value (content) of that key. Close the preferences window (do **NOT** delete anything!) -- I will be grateful for info if there is easier way
Ok, so now you know the address of the site, the session, and the files you would like to download.
Jan de Vos sent another way for getting cookies (step 5):
* find the cookies directory -- in case of Linux it will be something like this `~/.mozilla/firefox/88xw1k8g.default/`
* run sqlite3 -- `sqlite3 cookies.sqlite`
* run SQL query -- `select path,value from moz_cookies where baseDomain = 'coursera.org' and name='session';`
You will get the session codes for all courses you are enrolled on.
USAGE:
-----
php c_get.php "link_to_lectures_page" "file types" "session code" > wget_script_name.sh
sh wget_script_name.sh
Example (this is one line):
php c_get.php "https://class.coursera.org/crypto-preview/lecture/index" "MP4 PDF" "HERE&IS%MY&SESSION^VALUE@WHICH*OF!COURSE*I_WONT*TELL9YOU" > wgetter.sh
the one above creates appropriate script for wget for downloading videos (MP4) and slides (PDF). Now execute
sh wgetter.sh
Please note the file type casing (MP4 vs. mp4) must match the casing of the title (tooltip) of given category of files
-- check the Lectures page to find it out.
SECURITY NOTE:
-------------
Do NOT share your session code with anyone, and this means -- do NOT share the wget script with anyone as well.
================================================================== */
function get_page($url,$session)
{
$http = array('method'=>'GET',
'header'=> 'Cookie: session='.$session.';');
$context = stream_context_create(array('http'=> $http));
$content = file_get_contents($url,false,$context);
if ($content===FALSE)
return NULL;
return $content;
}
function get_dom($content)
{
$dom = new DOMDocument();
$errors_mode = libxml_use_internal_errors(TRUE);
$content = mb_convert_encoding($content, 'HTML-ENTITIES', "UTF-8");
$dom->loadHTML($content);
libxml_clear_errors();
libxml_use_internal_errors($errors_mode);
$dom->preserveWhiteSpace = false;
return $dom;
}
function fix_filename($s)
{
return strtr(trim($s),':"/\\','.\'__');
}
function print_wget($content,$session,$extensions)
{
$dom = get_dom($content);
$xpath = new DOMXPath($dom);
$group_count = 0;
$group_list = $xpath->query('//a[contains(@class,"list_header_link")]');
foreach ($group_list as $group)
{
$item_count = 0;
++$group_count;
$dir = $xpath->query('./h3',$group)->item(0)->nodeValue;
$dir = str_pad($group_count,2,'0',STR_PAD_LEFT).'. '.fix_filename($dir);
echo 'mkdir "'.$dir.'"'."\n";
$node_list = $xpath->query('.//li[contains(@class,"item_row")]',$group->nextSibling);
foreach ($node_list as $node)
{
++$item_count;
$title = $xpath->query('.//a[@class="lecture-link"]/text()',$node)->item(0)->nodeValue;
foreach ($extensions as $ext)
{
$links = $xpath->query('.//a[contains(@title,"'.$ext.'")]',$node);
foreach ($links as $link)
{
$suffix = '';
if ($links->length>1)
$suffix = '.'.$link->attributes->getNamedItem('title')->nodeValue;
$link = $link->attributes->getNamedItem('href')->nodeValue;
echo 'wget -nc --no-cookies --header "Cookie: session='.$session.'" "'.$link.'" -O "'.$dir.'/'.str_pad($item_count,3,'0',STR_PAD_LEFT).'. '.fix_filename($title).$suffix.'.'.strtolower($ext).'"'."\n";
}
}
}
}
}
if ($argc!=4)
{
echo "Error: you should input three arguments, the usage is:\n";
echo "\"LECTURES_URL\" \"FILE_TYPES\" \"SESSION_CODE\"\n";
}
else
{
$url = $argv[1];
$extensions = explode(' ',$argv[2]);
$session = $argv[3];
$content = get_page($url,$session);
if ($content!==NULL)
print_wget($content,$session,$extensions);
}
?>
<?php
/* ===================== COURSERA PREVIEW GETTER =======================
tags: [coursera video download] [coursera lecture download]
CHANGELOG:
---------
2012-07-05 * initial release
WHAT IT DOES:
------------
* it is counterpart for Coursera getter, but this one works only for course previews
-- the ones with embedded video player, and nothing else
WHAT YOU NEED:
-------------
1. proper shell (Windows users -- of course I recommend switching to Linux entirely, but as a workaround Cygwin should be fine -- I don't know how about the tools I mention below)
2. wget (in openSUSE `sudo zypper in wget`)
3. php5 (in openSUSE `sudo zypper in php5`)
4. php5-mbstring (in openSUSE `sudo zypper in php5-mbstring`)
USAGE:
-----
php c_preview.php "link_to_preview_page" "video_file_type" > wget_script_name.sh
sh wget_script_name.sh
Example (this is one line):
php c_get.php "https://class.coursera.org/crypto-preview/lecture/index" "mp4"
the one above creates appropriate script for wget for downloading videos (MP4). Now execute
sh wgetter.sh
Please note the file type is not guaranteed to exists on the server
(so far "webm" and "mp4" are supported by Coursera).
================================================================== */
function get_page_xpath($url,$session = null)
{
$http = array('method'=>'GET');
if ($session!==NULL)
$http['header'] = 'Cookie: session='.$session.';';
$context = stream_context_create(array('http'=> $http));
$content = file_get_contents($url,false,$context);
if ($content===FALSE)
return NULL;
$dom = get_dom($content);
$xpath = new DOMXPath($dom);
return $xpath;
}
function get_dom($content)
{
$dom = new DOMDocument();
$errors_mode = libxml_use_internal_errors(TRUE);
$content = mb_convert_encoding($content, 'HTML-ENTITIES', "UTF-8");
$dom->loadHTML($content);
libxml_clear_errors();
libxml_use_internal_errors($errors_mode);
$dom->preserveWhiteSpace = false;
return $dom;
}
function fix_filename($s)
{
return strtr(trim($s),':"/\\','.\'__');
}
function print_wget($xpath,$ext)
{
$group_count = 0;
$group_list = $xpath->query('//a[contains(@class,"list_header_link")]');
foreach ($group_list as $group)
{
$item_count = 0;
++$group_count;
$dir = $xpath->query('./h3',$group)->item(0)->nodeValue;
$dir = str_pad($group_count,2,'0',STR_PAD_LEFT).'. '.fix_filename($dir);
echo 'mkdir "'.$dir.'"'."\n";
$node_list = $xpath->query('.//li[contains(@class,"item_row")]',$group->nextSibling);
foreach ($node_list as $node)
{
++$item_count;
$row = $xpath->query('.//a[@class="lecture-link"]',$node)->item(0);
$title = $row->firstChild->nodeValue; // retrieving text() via firstChild --> buggy point
$link = trim($row->attributes->getNamedItem('href')->nodeValue);
$preview = get_page_xpath($link);
$video_list = $preview->query('//video[@id="QL_video_element_first"]/source[@type="video/'.$ext.'"]');
if ($video_list->length==0)
{
file_put_contents('php://stderr', "Filetype $ext not found for '$title'\n");
continue;
}
$video = $video_list->item(0);
$vid_src = $video->attributes->getNamedItem('src')->nodeValue;
echo 'wget -nc "'.$vid_src.'" -O "'.$dir.'/'.str_pad($item_count,3,'0',STR_PAD_LEFT).'. '.fix_filename($title).'.'.strtolower($ext).'"'."\n";
}
}
}
if ($argc!=3)
{
file_put_contents('php://stderr', "Error: you should input three arguments, the usage is:\n");
file_put_contents('php://stderr', "\"PREVIEW_URL\" \"FILE_TYPE\"\n");
}
else
{
$url = $argv[1];
$filetype = $argv[2];
$xpath = get_page_xpath($url);
if ($xpath!==NULL)
print_wget($xpath,$filetype);
}
?>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.