Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@macias
Last active February 2, 2019 18:23
Show Gist options
  • Star 13 You must be signed in to star a gist
  • Fork 7 You must be signed in to fork a gist
  • Save macias/2880753 to your computer and use it in GitHub Desktop.
Save macias/2880753 to your computer and use it in GitHub Desktop.
Coursera Getter [download video lecture]
WHAT IT DOES:
============
c_getter.php:
------------
* it parses given course Lectures page
* it extracts all the desired content (links for videos, slides, etc)
* it uses consistent naming of the files
* it replaces colon with period (hello Windows users)
* it finally creates a bunch of wget command ready to execute
* it ignores already existing files, so it is safe to rerun wget
script just to get missing files (note this might be not true if you
update this script, because of possible change in naming convention)
c_preview.php:
-------------
* it is counterpart for Coursera getter, but this one works only for course previews
-- the ones with embedded video player, and nothing else
WHAT YOU NEED:
=============
c_getter.php:
------------
1. proper shell (Windows users -- of course I recommend switching to
Linux entirely, but as a workaround Cygwin should be fine -- I
don't know how about the tools I mention below)
2. wget (in openSUSE `sudo zypper in wget`)
3. php5 (in openSUSE `sudo zypper in php5`)
4. php5-openssl (in openSUSE `sudo zypper in php5-openssl`)
5. php5-mbstring (in openSUSE `sudo zypper in php5-mbstring`)
6. and an adventurous soul -- in Firefox, go to
Edit/Preferences/Privacy/Remove Individual Cookie (don't freak
out!) search for "coursera". Several items should appear -- look
for key CAUTH for the site you would like to download (for
example "nlp"). Copy the value (content) of that key. Close the
preferences window (do **NOT** delete anything!) -- I will be
grateful for info if there is easier way
Ok, so now you know the address of the site, the CAUTH, and the files
you would like to download.
Jan de Vos sent another way for getting cookies (step 5):
* find the cookies directory -- in case of Linux it will be something
like this `~/.mozilla/firefox/88xw1k8g.default/`
* run sqlite3 -- `sqlite3 cookies.sqlite`
* run SQL query -- `select path,value from moz_cookies where
baseDomain = 'coursera.org' and name='CAUTH';`
You will get the CAUTH codes for all courses you are enrolled on.
c_preview.php:
-------------
1. proper shell (Windows users -- of course I recommend switching to Linux entirely, but as a workaround Cygwin should be fine -- I don't know how about the tools I mention below)
2. wget (in openSUSE `sudo zypper in wget`)
3. php5 (in openSUSE `sudo zypper in php5`)
4. php5-mbstring (in openSUSE `sudo zypper in php5-mbstring`)
USAGE:
=====
c_preview.php:
-------------
php c_preview.php "link_to_preview_page" "video_file_type" > wget_script_name.sh
sh wget_script_name.sh
Example (this is one line):
php c_preview.php "https://class.coursera.org/crypto-preview/lecture/index" "mp4"
the one above creates appropriate script for wget for downloading videos (MP4). Now execute
sh wgetter.sh
Please note the file type is not guaranteed to exists on the server
(so far "webm" and "mp4" are supported by Coursera).
c_getter.php:
------------
php c_get.php "link_to_lectures_page" "file types" "credentials_filename" > wget_script_name.sh
sh wget_script_name.sh
Credentials file should look like this:
[coursera]
CAUTH=HERE&IS%MY&CAUTH_COOKIE^VALUE@WHICH*OF!COURSE*I_WONT*TELL9YOU
Example (this is one line):
php c_get.php "https://class.coursera.org/crypto/lecture/index" "MP4 PDF" credentials > wgetter.sh
the one above creates appropriate script for wget for downloading videos
(MP4) and slides (PDF).
Please note the file type casing (MP4 vs. mp4) must match the casing of
the title (tooltip) of given category of files -- check the Lectures
page to find it out.
It is possible to pass file type in format "FileFormat=FileExtension",
so this script will look for one thing, but save as another.
For example some courses list pdf files as "Slides". In such case pass
such file format "Slides=pdf" -- this mean "Slides" will be grabbed,
but saved with extension "pdf".
There are also meta file extensions and filenames:
.$ -- preserve the original filename extension
$$ -- preserve the entire original filename
^^ -- use tooltip as filename
Some courses do not use consistent naming of tooltips (unfortunately),
in such case you can download files directly by extension -- add dot
(".") character in front of tile type. As previously, pay attention to
lowercase/uppercase (e.g. usually the extension is "mp4" but tooltip is
"MP4"). Example:
php c_get.php "https://class.coursera.org/scala/lecture/index" ".mp4 .pdf" credentials > wgetter.sh
You can also use class name of the icon associated with the resource:
php c_get.php "https://class.coursera.org/scala/lecture/index" "#laptop=$$" credentials > wgetter.sh
Yet another source of files are embedded frames (the ones when you click
to view lecture online). One of the advantages of this is ability to
download video in webm format. Instead of "." use now "~", for example:
php c_get.php "https://class.coursera.org/scala/lecture/index" "~webm .pdf" credentials > wgetter.sh
NOTE: the video will be downloaded from embedded player, but handouts
(pdf) will be downloaded from download (resources) section.
If you would like to have notes in the "notes" subdirectory and lectures
in "lectures" one add "--split_dirs" argument in such way:
php c_get.php "https://class.coursera.org/scala/lecture/index" "mp4 pdf" credentials --split_dirs="videos texts" > wgetter.sh
so "mp4" files will go into "videos" subdirectory and "pdf" files into
"texts" subdirectory.
If the directories with openining "Week X." seem redundant add
"--drop_week" option:
php c_get.php "https://class.coursera.org/scala/lecture/index" "mp4 pdf" credentials --drop_week > wgetter.sh
Instead of having "02. Week 1: Functions & Evaluations" you will get
"02. Functions & Evaluations".
For courses which do not use natural order (from oldest to newest) there
is an option "reverse":
php c_get.php "https://class.coursera.org/scala/lecture/index" "mp4 pdf" credentials --reverse > wgetter.sh
This will tell this script to use reversed order of numbering sections.
The courses with embedded videos are harder to process -- extraction
takes more time. If you know in advance that you don't want to extract
some portion of the lectures you can pass the limit option:
php c_get.php "https://class.coursera.org/scala/lecture/index" "mp4 pdf" credentials --limit="Week 9" > wgetter.sh
This will start extraction from section containing phrase "Week 9". In
case of reversed order -- it will stop extraction on phrase "Week 9".
In all above examples, video lecture (mp4/webm) came first -- the
program assumes it is the main resource, and if it is missing it will
report this fact. It won't report missing resource of any other kind.
Once the actual getter script is created (here: wgetter.sh) you can pass
any extra option for "wget". For example you can run it as:
sh wgetter.sh --limit-rate=100k
This would limit speed of download to 100KB/s. See "man wget" for more
options.
SECURITY NOTE:
=============
Do NOT share your CAUTH code with anyone, and this means -- do NOT
share the wget script with anyone as well!
<?php
/* ===================== COMMON CODE ============================
============================================================== */
// ---- Coursera specific stuff ----------------------------------------
$split_dirs_key = '--split_dirs';
$drop_deco_key = '--drop_week';
$reverse_key = '--reverse';
$beep_key = '--beep';
$debug_key = '--debug';
$limit_key = '--limit';
$debug_page = NULL;
function c_query_groups($xpath)
{
return $xpath->query('//div[contains(@class,"course-item-list-header") and not(.//script[@id="disappear"])]');
}
function drop_deco($name)
{
$s = $name;
$s = preg_replace('/^(Week|Lecture|Chapter)[\s]*\d+[.:\s\-]*/','',$s);
if ($s=='')
return $name;
else
return $s;
}
function c_query_dir($xpath,$group)
{
$dir = coursera_trim($xpath->query('./h3',$group)->item(0)->nodeValue);
return $dir;
}
function c_deco_dir($dir,$group_count,$drop_deco)
{
if ($drop_deco)
$dir = drop_deco($dir);
return str_pad($group_count,2,'0',STR_PAD_LEFT).'. '.fix_filename($dir);
}
function c_query_list($xpath,$group)
{
return $xpath->query('.//li',$group->nextSibling);
}
function c_query_row($xpath,$node,$drop_deco,&$row,&$title)
{
$row = $xpath->query('.//a[contains(@class,"lecture-link")]',$node)->item(0);
$title = fix_filename($row->firstChild->nodeValue);
if ($drop_deco)
$title = drop_deco($title);
}
function c_get_embedded_links($row,$ext,$auth_token = NULL)
{
$frame = trim($row->attributes->getNamedItem('data-modal-iframe')->nodeValue);
// lectures in preview mode are put at external pages, so we have to download them extra
$view = get_page_xpath($frame,$auth_token);
if (!$view)
return NULL;
else
{
$links = $view->query('//video[@id="QL_video_element_first"]/source[@type="video/'.$ext.'"]');
if ($links->length===0)
$links = $view->query('//div[@id="QL_player_container_first"]//source[@type="video/'.$ext.'"]');
return $links;
}
}
/*function c_get_embedded_links2($row,$ext,$auth_token = NULL)
{
$frame = trim($row->attributes->getNamedItem('data-modal-iframe')->nodeValue);
// lectures in preview mode are put at external pages, so we have to download them extra
$view = get_page_xpath($frame,$auth_token);
if (!$view)
return NULL;
else
return $view->query('//div[@id="QL_player_container_first"]//embed[@id="me_flash_0" and ends-with(@flashvars,".'.$ext.'")]');
}*/
function coursera_trim($s)
{
return trim(strtr($s,"\xa0\xc2",' '));
}
// ---- general php code -----------------------------------------------
class BashPrinter
{
// buffer for creating directiories IF appropriate
private $dirLines = array();
private $splitDirs;
public function __construct($extras)
{
global $split_dirs_key;
if (array_key_exists($split_dirs_key,$extras))
$this->splitDirs = $extras[$split_dirs_key];
else
$this->splitDirs = NULL;
}
public function wget_file_print($link,$dir,$target_filename,$log = NULL,$auth_token = NULL)
{
if ($this->dirLines[$dir] !== NULL)
{
echo $this->dirLines[$dir];
$this->dirLines[$dir] = NULL;
}
echo 'if [ ! -e "'.$target_filename.'" ] ; then'."\n";
echo ' wget $@ -nc --no-cookies ';
if ($auth_token!==NULL)
echo ' --header "'.build_auth_cookie($auth_token).'" ';
echo '"'.$link.'" -O "'.$target_filename.'"'."\n";
echo ' if [ $? -ne 0 ]'."\n";
echo ' then'."\n";
echo ' rm -f "'.$target_filename.'"; ERRORS=$((ERRORS+1))'."\n";
if ($log!==NULL)
{
echo ' else'."\n";
echo ' echo "'.$link.'" >> '.$log."\n";
}
echo ' fi'."\n";
echo 'fi'."\n";
}
public function mkdir_print($dir)
{
$this->dirLines = array();
if ($this->splitDirs !== NULL)
{
foreach ($this->splitDirs as $d)
$this->dirLines[$d.'/'] = 'mkdir -p "'.$d.'/'.$dir.'"'."\n";
}
else
$this->dirLines[''] = 'mkdir "'.$dir.'"'."\n";
}
}
function process_extra_arguments(&$extras)
{
global $split_dirs_key;
if (array_key_exists($split_dirs_key,$extras))
$extras[$split_dirs_key] = explode(' ',$extras[$split_dirs_key]);
}
function get_dom($content)
{
$dom = new DOMDocument();
$errors_mode = libxml_use_internal_errors(TRUE);
$content = mb_convert_encoding($content, 'HTML-ENTITIES', "UTF-8");
$dom->loadHTML($content);
libxml_clear_errors();
libxml_use_internal_errors($errors_mode);
$dom->preserveWhiteSpace = false;
return $dom;
}
function fix_filename($s)
{
return preg_replace('/\s{2,}/', ' ',coursera_trim(strtr($s,'?:"/\\',' .\'__')));
}
function build_auth_cookie($auth_token)
{
$cookie = 'Cookie: ';
foreach ($auth_token as $key => $value)
$cookie .= $key . '=' . $value . ';';
return $cookie;
}
function get_page_xpath($url,$auth_token = NULL)
{
global $debug_page;
$http = array('method'=>'GET');
$http['header'] = array();
if ($auth_token!==NULL)
array_push($http['header'],build_auth_cookie($auth_token));
//array_push($http['header'],'User-Agent:Mozilla/5.0 (X11; Linux x86_64; rv:21.0) Gecko/20100101 Firefox/21.0');
//array_push($http['header'],'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8');
$context = stream_context_create(array('http'=> $http));
$content = file_get_contents($url,false,$context);
$debug_page = $content;
if ($content===FALSE)
return NULL;
$dom = get_dom($content);
$xpath = new DOMXPath($dom);
return $xpath;
}
?>
<?php
/* ===================== COURSERA GETTER ============================
tags: [coursera video download] [coursera lecture download]
================================================================== */
require_once 'c_common.php';
function print_wget($xpath,$auth_token,$extensions,$extras)
{
global $split_dirs_key,$drop_deco_key,$reverse_key,$beep_key,$debug_key,$limit_key;
global $debug_page;
process_extra_arguments($extras);
// done with extra arguments ---------------------------------------
$bash_printer = new BashPrinter($extras);
$downloads_filename = 'downloads.log';
$downloads = array();
if (file_exists($downloads_filename))
$downloads = file($downloads_filename, FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);
echo "ERRORS=0\n";
$group_list = c_query_groups($xpath);
$group_count = array_key_exists($reverse_key,$extras) ? $group_list->length : 1;
if (array_key_exists($reverse_key,$extras))
// in reverse order pretend limit was NOT hit
$limit_hit = false;
else
// if there is no limit given by user, pretend it was hit
$limit_hit = !array_key_exists($limit_key,$extras);
if ($group_list->length == 0)
echo "\n# No group was extracted from lectures page -- check authentication key.\n";
else
foreach ($group_list as $group)
{
$item_count = 0;
$dir = c_query_dir($xpath,$group);
if (!$limit_hit && array_key_exists($limit_key,$extras))
$limit_hit = (strpos($dir,$extras[$limit_key])!==FALSE);
$dir = c_deco_dir($dir,$group_count,array_key_exists($drop_deco_key,$extras));
$group_count += array_key_exists($reverse_key,$extras) ? -1 : +1;
if (array_key_exists($limit_key,$extras))
{
if (array_key_exists($reverse_key,$extras))
{
if ($limit_hit)
break;
}
elseif (!$limit_hit)
continue;
}
$bash_printer->mkdir_print($dir);
$node_list = c_query_list($xpath,$group);
foreach ($node_list as $node)
{
++$item_count;
c_query_row($xpath,$node,array_key_exists($drop_deco_key,$extras),$row,$title);
// each $ext_combo might be in such forms: either "FileType" or "FileType=FileExtension" (e.g. "PDF", "Slides=pdf")
for ($i_ext = 0; $i_ext < count($extensions); ++$i_ext)
{
$ext_parts = explode('=',$extensions[$i_ext]);
if (array_key_exists($split_dirs_key,$extras))
$target_dir = $extras[$split_dirs_key][$i_ext].'/';
else
$target_dir = '';
$attr_extractor = 'href';
if ($ext_parts[0][0]=='.') // extract link by extension of the linked file
{
$links = $xpath->query('.//div[@class="course-lecture-item-resource"]/a[contains(@href,"'.$ext_parts[0].'")]',$node);
$ext_parts[0] = substr($ext_parts[0],1);
}
else if ($ext_parts[0][0]=='#') // extract link by icon class name
{
$ext_parts[0] = substr($ext_parts[0],1);
$links = $xpath->query('.//a/i[contains(@class,"icon-'.$ext_parts[0].'")]/..',$node);
}
else if ($ext_parts[0][0]=='~') // extract link by extension from viewer frame
{
$ext_parts[0] = substr($ext_parts[0],1);
$links = c_get_embedded_links($row,$ext_parts[0],$auth_token);
if ($links===NULL)
{
file_put_contents('php://stderr', "Loading embedded frame failed: '$dir/$title'\n");
continue;
}
else if ($links->length===0 && $i_ext===0)
{
file_put_contents('php://stderr', "No resources '$ext_parts[0]' found for '$dir/$title'\n");
if (array_key_exists($debug_key,$extras))
file_put_contents('DEBUG_'.$title,$debug_page);
continue;
/* $links = c_get_embedded_links2($row,$ext_parts[0],$auth_token);
if ($links===NULL)
{
file_put_contents('php://stderr', "Loading fallback embedded frame failed: '$dir/$title'\n");
continue;
}
else if ($links->length===0 && $i_ext===0)
{
file_put_contents('php://stderr', "No fallback resources '$ext_parts[0]' found for '$dir/$title'\n");
if (array_key_exists($debug_key,$extras))
file_put_contents('DEBUG_'.$title,$debug_page);
continue;
}
else
$attr_extractor = 'flashvars';*/
}
else
$attr_extractor = 'src';
if (array_key_exists($debug_key,$extras))
file_put_contents('php://stderr', "For $dir/$title ".$links->length." '$ext_parts[0]' links found.\n");
}
else // extract link by tooltip of the link
{
$links = $xpath->query('.//a[contains(@title,"'.$ext_parts[0].'")]',$node);
}
$match = FALSE;
foreach ($links as $node_link)
{
$tooltip = fix_filename($node_link->attributes->getNamedItem('title')->nodeValue);
$suffix = '';
if ($links->length>1)
$suffix = '.'.$tooltip;
$link = $node_link->attributes->getNamedItem($attr_extractor)->nodeValue;
if ($attr_extractor=='flashvars')
{
$url_idx = strpos($link,'&file=http');
$link = urldecode(substr($link,$url_idx+strlen('&file=')));
}
if (!in_array($link,$downloads))
{
$file_ext = end(array_values($ext_parts));
$target_filename = $target_dir.$dir.'/';
if ($file_ext == '$$')
{
$url_parts = parse_url(urldecode($link));
$url_no_query = $url_parts['scheme'] . '://' . $url_parts['host'] . (isset($url_parts['path'])?$url_parts['path']:'');
$target_filename .= pathinfo($url_no_query, PATHINFO_BASENAME);
}
else if ($file_ext == '^^')
$target_filename .= $tooltip;
else
{
if ($file_ext == '.$')
{
$url_parts = parse_url(urldecode($link));
$url_no_query = $url_parts['scheme'] . '://' . $url_parts['host'] . (isset($url_parts['path'])?$url_parts['path']:'');
$file_ext = pathinfo($url_no_query, PATHINFO_EXTENSION);
}
$target_filename .= str_pad($item_count,3,'0',STR_PAD_LEFT).'. '.$title.$suffix.'.'.strtolower($file_ext);
}
$bash_printer->wget_file_print($link,$target_dir,$target_filename,$downloads_filename,$auth_token);
$match = TRUE;
}
else if (array_key_exists($debug_key,$extras))
file_put_contents('php://stderr', "$dir/$title '$ext_parts[0]' already downloaded.\n");
}
}
}
}
echo "\n";
echo 'if [ $ERRORS -ne 0 ] ; then echo "There were some errors while downloading. Run the script again." ; fi'."\n";
if (array_key_exists($beep_key,$extras))
echo "beep\n";
}
if ($argc<4)
{
file_put_contents('php://stderr', "Error: you should input minimum three arguments, the usage is:\n");
file_put_contents('php://stderr', "\"LECTURES_URL\" \"FILE_TYPES\" \"SESSION_CODE\" [--$beep_key] [--$reverse_key] [--$drop_deco_key] [--$debug_key] [--$split_dirs_key=\"directories per file type\"]\n");
}
else
{
array_shift($argv);
$url = array_shift($argv);
$extensions = array_shift($argv);
$credentials = parse_ini_file(array_shift($argv),true);
$auth_token = array('CAUTH' => $credentials['coursera']['CAUTH'],
'__204u' => $credentials['coursera']['__204u']);
$extras = array();
foreach ($argv as $a)
{
$parts = explode('=',$a);
if (!in_array($parts[0],array($split_dirs_key,$drop_deco_key,$reverse_key,$beep_key,$debug_key,$limit_key)))
{
file_put_contents('php://stderr', 'Unknown extra argument "'.$parts[0]."\"\n");
exit(1);
}
$extras[$parts[0]] = count($parts)==1 ? NULL : $parts[1];
}
if ($extensions == '*')
{
$extras[$split_dirs_key] = 'lectures subtitles subtitles slides data code';
$extensions = array('MP4', // video
'#align-justify=txt', // subtitles
'#list=srt', // subtitles
'#file=.$', // slides
'#info-sign=.$', // data
'#laptop=$$'); // program code
}
else
$extensions = explode(' ',$extensions);
$xpath = get_page_xpath($url,$auth_token);
if (array_key_exists($debug_key,$extras))
file_put_contents('DEBUG_index.html',$debug_page);
if ($xpath!==NULL)
print_wget($xpath,$auth_token,$extensions,$extras);
}
?>
<?php
/* ===================== COURSERA PREVIEW GETTER =======================
tags: [coursera video download] [coursera lecture download]
================================================================== */
require_once 'c_common.php';
// https://class.coursera.org/machlearning-001/lecture/preview/index
function print_wget($xpath,$ext,$extras)
{
global $split_dirs_key,$drop_deco_key,$extension_key,$reverse_key,$beep_key;
$bash_printer = new BashPrinter();
process_extra_arguments($extras);
$group_count = 0;
echo "ERRORS=0\n";
$group_list = c_query_groups($xpath);
$group_count = array_key_exists($reverse_key,$extras) ? $group_list->length : 1;
foreach ($group_list as $group)
{
$item_count = 0;
$dir = c_deco_dir(c_query_dir($xpath,$group),$group_count,array_key_exists($drop_deco_key,$extras));
$group_count += array_key_exists($reverse_key,$extras) ? -1 : +1;
$bash_printer->mkdir_print($dir,$extras);
// get the list of all lectures within current group (week)
$node_list = c_query_list($xpath,$group);
foreach ($node_list as $node)
{
++$item_count;
c_query_row($xpath,$node,array_key_exists($drop_deco_key,$extras),$row,$title);
$video_list = c_get_embedded_links($row,$ext);
if ($video_list===NULL)
{
file_put_contents('php://stderr', "Loading embedded frame failed: '$dir/$title'\n");
continue;
}
else if ($video_list->length==0)
{
file_put_contents('php://stderr', "Filetype '$ext' not found for '".$title."'\n");
continue;
}
$video = $video_list->item(0);
$vid_src = $video->attributes->getNamedItem('src')->nodeValue;
$bash_printer->wget_file_print($vid_src,$dir.'/'.str_pad($item_count,3,'0',STR_PAD_LEFT).'. '.$title.'.'.strtolower($ext));
}
}
echo 'if [ $ERRORS -ne 0 ] ; then echo "There were some errors while downloading. Run the script again." ; fi'."\n";
if (array_key_exists($beep_key,$extras))
echo "beep\n";
}
if ($argc<3)
{
file_put_contents('php://stderr', "Error: you should input minimum two arguments, the usage is:\n");
file_put_contents('php://stderr', "\"LECTURES_URL\" \"FILE_TYPES\" [--$beep_key] [--$reverse_key] [--$drop_deco_key] [--$split_dirs_key=\"directories per file type\"]\n");
}
else
{
array_shift($argv);
$url = array_shift($argv);
$extensions = array_shift($argv);
$extras = array();
foreach ($argv as $a)
{
$parts = explode('=',$a);
if (!in_array($parts[0],array($split_dirs_key,$drop_deco_key,$reverse_key,$beep_key)))
{
file_put_contents('php://stderr', 'Unknown extra argument "'.$parts[0]."\"\n");
exit(1);
}
$extras[$parts[0]] = count($parts)==1 ? NULL : $parts[1];
}
$xpath = get_page_xpath($url);
if ($xpath!==NULL)
print_wget($xpath,$extensions,$extras);
}
?>
2014-11-03 * Coursera requires just another authentication key --
`__204u`
2013-12-01 * credentials file is used instead of passing directly
"CAUTH" value
2013-09-24 * ignoring hidden lecture sections
2013-09-06 * ability to download resources by icon class
* added new flags for setting target filename as original
one or as tooltip
2013-09-04 * Coursera dropped "session" cookie and introduced
"CAUTH" one instead as authentication token -- this
one is a global, so once you have it you can use it
for all Coursera courses
2013-08-06 * additional info from tooltip is processed to be used
as a filename
2013-07-15 * if there is nothing to download for given folder
it is not created
2013-06-29 * updated extraction for embedded videos
* added option to limit extraction from main page
2013-05-15, 2 * corrected reporting failed download or extraction
2013-05-15 * reporting failed extraction of main resource
2013-05-03 * more unification with c_preview utility -- ability to
download embedded videos as well (read: webm), add
"~" character before file extension to make Coursera
Getter fetch embedded video
* "--extension" option in no longer supported -- add
dot (".") before file extension instead
2013-04-22 * dropping a week/lecture phrase from filenames as well
2013-04-15 * refactoring
2013-04-07 * keeping log of downloads (file "downloads.log") as
countermeasure for renaming the lectures/notes
* added "beep" option to make a sound at the end of
downloading
2013-04-03 * bugfix: the title of lecture sometimes was ignored
2013-02-15 * more accurate whitespace removal
2013-02-02 * automatically removes corrupted files
2013-01-23 * new option "reverse" for the courses which put
sections in "from newest to oldest" order
2013-01-08 * Coursera changed its web format, along with structure
and CSS tags/classes this version hopefully is
changed to reflect all of those
2012-10-12 * added option --drop_week to drop "week X." part from
the directory (remind me this was supposed to be dead
simple tool ;-D)
2012-10-11 * added option --split_dirs to save files into
subdirectories according of files extensions
2012-10-03 * added option --extension to get resources by
extension of the files, not the tooltips
2012-09-18 * added c_common.php
* you can specify via filetypes what to grab and what
extension set
2012-07-05 * initial release of Coursera preview getter
2012-06-14 * added little control if there is insufficient number
of arguments
2012-06-11 * UTF-8 in filenames are supported
(another module for PHP is required -- mbstring)
* replaces slash and backslash with underscore
2012-06-07 * changed this info, added another way of getting
cookies
2012-06-06, 2 * extensions casing reverted -- they matter again
* directories are named according to Lectures sections
* handles multiple files for given file type
* files are counted within each directory, not within
entire course
2012-06-06 * extensions can be given lower/upper-case, they do not
matter
2012-06-05, 2 * creates weekly subdirectories and puts the files in
there
2012-06-05 * initial release
@nellaivijay
Copy link

Getting this error message
vijayram@ubuntu:/Downloads$ php c_get.php "https://class.coursera.org/compilers/lecture/index" "----------"
PHP Notice: Undefined offset: 3 in /home/vijayram/Downloads/c_get.php on line 160
vijayram@ubuntu:
/Downloads$

@macias
Copy link
Author

macias commented Jun 14, 2012

@nellaivijay, I will answer properly in a few minutes, but for now -- LOGOUT FROM THE COURSERA NOW (using your regular web browser) AND LOGIN AGAIN. As I wrote, don't share your session code, and you shared in with entire world (7 bln people).

Ok, as for answer for the problem, you forgot to add (as the middle argument) what files you want to grap -- mp4 (lectures), slides, and so on. I modified code now, so it reminds you in more handy manner that the number of arguments is incorrect.

So in your case, it could be:

php c_get.php "https://class.coursera.org/compilers/lecture/index" "MP4" "your session code which you should not share, really"

Hope this helps.

@macias
Copy link
Author

macias commented Sep 23, 2012

For the record -- I already wrote a reply to your comment, but it was deleted. Since then the script was fixed (i.e. there was added a help message). In your particular case, you forgot about file types you would like to get from Coursera.

@hgajshb
Copy link

hgajshb commented Oct 4, 2013

This is pretty awesome, I used this script to download webm files and subtitles.
And there is always "PHP Notice: Trying to get property of non-object in /path_to/c_get.php on line 148" when creating the shell script.

@macias
Copy link
Author

macias commented Dec 1, 2013

@yanglifu90, sorry for late response, I didn't get any notification.

Anyway, could you please tell me for what course you see this message and how to you call it. Please omit your CAUTH.

@mccbala
Copy link

mccbala commented Apr 28, 2015

Hi... I tried this and for some reason, keep getting this error.. Could you please explain what is wrong here? Also are these scripts compatible with the new coursera interface? Thanks!

php c_preview.php "https://www.coursera.org/learn/work-smarter-not-har
der/outline" "mp4"
PHP Warning:  Missing argument 1 for BashPrinter::__construct(), called in /home/myuser/coursera/c_
preview.php on line 14 and defined in /home/myuser/coursera/c_common.php on line 105
PHP Notice:  Undefined variable: extras in /home/myuser/coursera/c_common.php on line 109
PHP Warning:  array_key_exists() expects parameter 2 to be array, null given in /home/myuser/course
ra/c_common.php on line 109
ERRORS=0
if [ $ERRORS -ne 0 ] ; then echo "There were some errors while downloading. Run the script again." ; fi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment