-
-
Save macias/2880753 to your computer and use it in GitHub Desktop.
WHAT IT DOES: | |
============ | |
c_getter.php: | |
------------ | |
* it parses given course Lectures page | |
* it extracts all the desired content (links for videos, slides, etc) | |
* it uses consistent naming of the files | |
* it replaces colon with period (hello Windows users) | |
* it finally creates a bunch of wget command ready to execute | |
* it ignores already existing files, so it is safe to rerun wget | |
script just to get missing files (note this might be not true if you | |
update this script, because of possible change in naming convention) | |
c_preview.php: | |
------------- | |
* it is counterpart for Coursera getter, but this one works only for course previews | |
-- the ones with embedded video player, and nothing else | |
WHAT YOU NEED: | |
============= | |
c_getter.php: | |
------------ | |
1. proper shell (Windows users -- of course I recommend switching to | |
Linux entirely, but as a workaround Cygwin should be fine -- I | |
don't know how about the tools I mention below) | |
2. wget (in openSUSE `sudo zypper in wget`) | |
3. php5 (in openSUSE `sudo zypper in php5`) | |
4. php5-openssl (in openSUSE `sudo zypper in php5-openssl`) | |
5. php5-mbstring (in openSUSE `sudo zypper in php5-mbstring`) | |
6. and an adventurous soul -- in Firefox, go to | |
Edit/Preferences/Privacy/Remove Individual Cookie (don't freak | |
out!) search for "coursera". Several items should appear -- look | |
for key CAUTH for the site you would like to download (for | |
example "nlp"). Copy the value (content) of that key. Close the | |
preferences window (do **NOT** delete anything!) -- I will be | |
grateful for info if there is easier way | |
Ok, so now you know the address of the site, the CAUTH, and the files | |
you would like to download. | |
Jan de Vos sent another way for getting cookies (step 5): | |
* find the cookies directory -- in case of Linux it will be something | |
like this `~/.mozilla/firefox/88xw1k8g.default/` | |
* run sqlite3 -- `sqlite3 cookies.sqlite` | |
* run SQL query -- `select path,value from moz_cookies where | |
baseDomain = 'coursera.org' and name='CAUTH';` | |
You will get the CAUTH codes for all courses you are enrolled on. | |
c_preview.php: | |
------------- | |
1. proper shell (Windows users -- of course I recommend switching to Linux entirely, but as a workaround Cygwin should be fine -- I don't know how about the tools I mention below) | |
2. wget (in openSUSE `sudo zypper in wget`) | |
3. php5 (in openSUSE `sudo zypper in php5`) | |
4. php5-mbstring (in openSUSE `sudo zypper in php5-mbstring`) | |
USAGE: | |
===== | |
c_preview.php: | |
------------- | |
php c_preview.php "link_to_preview_page" "video_file_type" > wget_script_name.sh | |
sh wget_script_name.sh | |
Example (this is one line): | |
php c_preview.php "https://class.coursera.org/crypto-preview/lecture/index" "mp4" | |
the one above creates appropriate script for wget for downloading videos (MP4). Now execute | |
sh wgetter.sh | |
Please note the file type is not guaranteed to exists on the server | |
(so far "webm" and "mp4" are supported by Coursera). | |
c_getter.php: | |
------------ | |
php c_get.php "link_to_lectures_page" "file types" "credentials_filename" > wget_script_name.sh | |
sh wget_script_name.sh | |
Credentials file should look like this: | |
[coursera] | |
CAUTH=HERE&IS%MY&CAUTH_COOKIE^VALUE@WHICH*OF!COURSE*I_WONT*TELL9YOU | |
Example (this is one line): | |
php c_get.php "https://class.coursera.org/crypto/lecture/index" "MP4 PDF" credentials > wgetter.sh | |
the one above creates appropriate script for wget for downloading videos | |
(MP4) and slides (PDF). | |
Please note the file type casing (MP4 vs. mp4) must match the casing of | |
the title (tooltip) of given category of files -- check the Lectures | |
page to find it out. | |
It is possible to pass file type in format "FileFormat=FileExtension", | |
so this script will look for one thing, but save as another. | |
For example some courses list pdf files as "Slides". In such case pass | |
such file format "Slides=pdf" -- this mean "Slides" will be grabbed, | |
but saved with extension "pdf". | |
There are also meta file extensions and filenames: | |
.$ -- preserve the original filename extension | |
$$ -- preserve the entire original filename | |
^^ -- use tooltip as filename | |
Some courses do not use consistent naming of tooltips (unfortunately), | |
in such case you can download files directly by extension -- add dot | |
(".") character in front of tile type. As previously, pay attention to | |
lowercase/uppercase (e.g. usually the extension is "mp4" but tooltip is | |
"MP4"). Example: | |
php c_get.php "https://class.coursera.org/scala/lecture/index" ".mp4 .pdf" credentials > wgetter.sh | |
You can also use class name of the icon associated with the resource: | |
php c_get.php "https://class.coursera.org/scala/lecture/index" "#laptop=$$" credentials > wgetter.sh | |
Yet another source of files are embedded frames (the ones when you click | |
to view lecture online). One of the advantages of this is ability to | |
download video in webm format. Instead of "." use now "~", for example: | |
php c_get.php "https://class.coursera.org/scala/lecture/index" "~webm .pdf" credentials > wgetter.sh | |
NOTE: the video will be downloaded from embedded player, but handouts | |
(pdf) will be downloaded from download (resources) section. | |
If you would like to have notes in the "notes" subdirectory and lectures | |
in "lectures" one add "--split_dirs" argument in such way: | |
php c_get.php "https://class.coursera.org/scala/lecture/index" "mp4 pdf" credentials --split_dirs="videos texts" > wgetter.sh | |
so "mp4" files will go into "videos" subdirectory and "pdf" files into | |
"texts" subdirectory. | |
If the directories with openining "Week X." seem redundant add | |
"--drop_week" option: | |
php c_get.php "https://class.coursera.org/scala/lecture/index" "mp4 pdf" credentials --drop_week > wgetter.sh | |
Instead of having "02. Week 1: Functions & Evaluations" you will get | |
"02. Functions & Evaluations". | |
For courses which do not use natural order (from oldest to newest) there | |
is an option "reverse": | |
php c_get.php "https://class.coursera.org/scala/lecture/index" "mp4 pdf" credentials --reverse > wgetter.sh | |
This will tell this script to use reversed order of numbering sections. | |
The courses with embedded videos are harder to process -- extraction | |
takes more time. If you know in advance that you don't want to extract | |
some portion of the lectures you can pass the limit option: | |
php c_get.php "https://class.coursera.org/scala/lecture/index" "mp4 pdf" credentials --limit="Week 9" > wgetter.sh | |
This will start extraction from section containing phrase "Week 9". In | |
case of reversed order -- it will stop extraction on phrase "Week 9". | |
In all above examples, video lecture (mp4/webm) came first -- the | |
program assumes it is the main resource, and if it is missing it will | |
report this fact. It won't report missing resource of any other kind. | |
Once the actual getter script is created (here: wgetter.sh) you can pass | |
any extra option for "wget". For example you can run it as: | |
sh wgetter.sh --limit-rate=100k | |
This would limit speed of download to 100KB/s. See "man wget" for more | |
options. | |
SECURITY NOTE: | |
============= | |
Do NOT share your CAUTH code with anyone, and this means -- do NOT | |
share the wget script with anyone as well! | |
<?php | |
/* ===================== COMMON CODE ============================ | |
============================================================== */ | |
// ---- Coursera specific stuff ---------------------------------------- | |
$split_dirs_key = '--split_dirs'; | |
$drop_deco_key = '--drop_week'; | |
$reverse_key = '--reverse'; | |
$beep_key = '--beep'; | |
$debug_key = '--debug'; | |
$limit_key = '--limit'; | |
$debug_page = NULL; | |
function c_query_groups($xpath) | |
{ | |
return $xpath->query('//div[contains(@class,"course-item-list-header") and not(.//script[@id="disappear"])]'); | |
} | |
function drop_deco($name) | |
{ | |
$s = $name; | |
$s = preg_replace('/^(Week|Lecture|Chapter)[\s]*\d+[.:\s\-]*/','',$s); | |
if ($s=='') | |
return $name; | |
else | |
return $s; | |
} | |
function c_query_dir($xpath,$group) | |
{ | |
$dir = coursera_trim($xpath->query('./h3',$group)->item(0)->nodeValue); | |
return $dir; | |
} | |
function c_deco_dir($dir,$group_count,$drop_deco) | |
{ | |
if ($drop_deco) | |
$dir = drop_deco($dir); | |
return str_pad($group_count,2,'0',STR_PAD_LEFT).'. '.fix_filename($dir); | |
} | |
function c_query_list($xpath,$group) | |
{ | |
return $xpath->query('.//li',$group->nextSibling); | |
} | |
function c_query_row($xpath,$node,$drop_deco,&$row,&$title) | |
{ | |
$row = $xpath->query('.//a[contains(@class,"lecture-link")]',$node)->item(0); | |
$title = fix_filename($row->firstChild->nodeValue); | |
if ($drop_deco) | |
$title = drop_deco($title); | |
} | |
function c_get_embedded_links($row,$ext,$auth_token = NULL) | |
{ | |
$frame = trim($row->attributes->getNamedItem('data-modal-iframe')->nodeValue); | |
// lectures in preview mode are put at external pages, so we have to download them extra | |
$view = get_page_xpath($frame,$auth_token); | |
if (!$view) | |
return NULL; | |
else | |
{ | |
$links = $view->query('//video[@id="QL_video_element_first"]/source[@type="video/'.$ext.'"]'); | |
if ($links->length===0) | |
$links = $view->query('//div[@id="QL_player_container_first"]//source[@type="video/'.$ext.'"]'); | |
return $links; | |
} | |
} | |
/*function c_get_embedded_links2($row,$ext,$auth_token = NULL) | |
{ | |
$frame = trim($row->attributes->getNamedItem('data-modal-iframe')->nodeValue); | |
// lectures in preview mode are put at external pages, so we have to download them extra | |
$view = get_page_xpath($frame,$auth_token); | |
if (!$view) | |
return NULL; | |
else | |
return $view->query('//div[@id="QL_player_container_first"]//embed[@id="me_flash_0" and ends-with(@flashvars,".'.$ext.'")]'); | |
}*/ | |
function coursera_trim($s) | |
{ | |
return trim(strtr($s,"\xa0\xc2",' ')); | |
} | |
// ---- general php code ----------------------------------------------- | |
class BashPrinter | |
{ | |
// buffer for creating directiories IF appropriate | |
private $dirLines = array(); | |
private $splitDirs; | |
public function __construct($extras) | |
{ | |
global $split_dirs_key; | |
if (array_key_exists($split_dirs_key,$extras)) | |
$this->splitDirs = $extras[$split_dirs_key]; | |
else | |
$this->splitDirs = NULL; | |
} | |
public function wget_file_print($link,$dir,$target_filename,$log = NULL,$auth_token = NULL) | |
{ | |
if ($this->dirLines[$dir] !== NULL) | |
{ | |
echo $this->dirLines[$dir]; | |
$this->dirLines[$dir] = NULL; | |
} | |
echo 'if [ ! -e "'.$target_filename.'" ] ; then'."\n"; | |
echo ' wget $@ -nc --no-cookies '; | |
if ($auth_token!==NULL) | |
echo ' --header "'.build_auth_cookie($auth_token).'" '; | |
echo '"'.$link.'" -O "'.$target_filename.'"'."\n"; | |
echo ' if [ $? -ne 0 ]'."\n"; | |
echo ' then'."\n"; | |
echo ' rm -f "'.$target_filename.'"; ERRORS=$((ERRORS+1))'."\n"; | |
if ($log!==NULL) | |
{ | |
echo ' else'."\n"; | |
echo ' echo "'.$link.'" >> '.$log."\n"; | |
} | |
echo ' fi'."\n"; | |
echo 'fi'."\n"; | |
} | |
public function mkdir_print($dir) | |
{ | |
$this->dirLines = array(); | |
if ($this->splitDirs !== NULL) | |
{ | |
foreach ($this->splitDirs as $d) | |
$this->dirLines[$d.'/'] = 'mkdir -p "'.$d.'/'.$dir.'"'."\n"; | |
} | |
else | |
$this->dirLines[''] = 'mkdir "'.$dir.'"'."\n"; | |
} | |
} | |
function process_extra_arguments(&$extras) | |
{ | |
global $split_dirs_key; | |
if (array_key_exists($split_dirs_key,$extras)) | |
$extras[$split_dirs_key] = explode(' ',$extras[$split_dirs_key]); | |
} | |
function get_dom($content) | |
{ | |
$dom = new DOMDocument(); | |
$errors_mode = libxml_use_internal_errors(TRUE); | |
$content = mb_convert_encoding($content, 'HTML-ENTITIES', "UTF-8"); | |
$dom->loadHTML($content); | |
libxml_clear_errors(); | |
libxml_use_internal_errors($errors_mode); | |
$dom->preserveWhiteSpace = false; | |
return $dom; | |
} | |
function fix_filename($s) | |
{ | |
return preg_replace('/\s{2,}/', ' ',coursera_trim(strtr($s,'?:"/\\',' .\'__'))); | |
} | |
function build_auth_cookie($auth_token) | |
{ | |
$cookie = 'Cookie: '; | |
foreach ($auth_token as $key => $value) | |
$cookie .= $key . '=' . $value . ';'; | |
return $cookie; | |
} | |
function get_page_xpath($url,$auth_token = NULL) | |
{ | |
global $debug_page; | |
$http = array('method'=>'GET'); | |
$http['header'] = array(); | |
if ($auth_token!==NULL) | |
array_push($http['header'],build_auth_cookie($auth_token)); | |
//array_push($http['header'],'User-Agent:Mozilla/5.0 (X11; Linux x86_64; rv:21.0) Gecko/20100101 Firefox/21.0'); | |
//array_push($http['header'],'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'); | |
$context = stream_context_create(array('http'=> $http)); | |
$content = file_get_contents($url,false,$context); | |
$debug_page = $content; | |
if ($content===FALSE) | |
return NULL; | |
$dom = get_dom($content); | |
$xpath = new DOMXPath($dom); | |
return $xpath; | |
} | |
?> |
<?php | |
/* ===================== COURSERA GETTER ============================ | |
tags: [coursera video download] [coursera lecture download] | |
================================================================== */ | |
require_once 'c_common.php'; | |
function print_wget($xpath,$auth_token,$extensions,$extras) | |
{ | |
global $split_dirs_key,$drop_deco_key,$reverse_key,$beep_key,$debug_key,$limit_key; | |
global $debug_page; | |
process_extra_arguments($extras); | |
// done with extra arguments --------------------------------------- | |
$bash_printer = new BashPrinter($extras); | |
$downloads_filename = 'downloads.log'; | |
$downloads = array(); | |
if (file_exists($downloads_filename)) | |
$downloads = file($downloads_filename, FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES); | |
echo "ERRORS=0\n"; | |
$group_list = c_query_groups($xpath); | |
$group_count = array_key_exists($reverse_key,$extras) ? $group_list->length : 1; | |
if (array_key_exists($reverse_key,$extras)) | |
// in reverse order pretend limit was NOT hit | |
$limit_hit = false; | |
else | |
// if there is no limit given by user, pretend it was hit | |
$limit_hit = !array_key_exists($limit_key,$extras); | |
if ($group_list->length == 0) | |
echo "\n# No group was extracted from lectures page -- check authentication key.\n"; | |
else | |
foreach ($group_list as $group) | |
{ | |
$item_count = 0; | |
$dir = c_query_dir($xpath,$group); | |
if (!$limit_hit && array_key_exists($limit_key,$extras)) | |
$limit_hit = (strpos($dir,$extras[$limit_key])!==FALSE); | |
$dir = c_deco_dir($dir,$group_count,array_key_exists($drop_deco_key,$extras)); | |
$group_count += array_key_exists($reverse_key,$extras) ? -1 : +1; | |
if (array_key_exists($limit_key,$extras)) | |
{ | |
if (array_key_exists($reverse_key,$extras)) | |
{ | |
if ($limit_hit) | |
break; | |
} | |
elseif (!$limit_hit) | |
continue; | |
} | |
$bash_printer->mkdir_print($dir); | |
$node_list = c_query_list($xpath,$group); | |
foreach ($node_list as $node) | |
{ | |
++$item_count; | |
c_query_row($xpath,$node,array_key_exists($drop_deco_key,$extras),$row,$title); | |
// each $ext_combo might be in such forms: either "FileType" or "FileType=FileExtension" (e.g. "PDF", "Slides=pdf") | |
for ($i_ext = 0; $i_ext < count($extensions); ++$i_ext) | |
{ | |
$ext_parts = explode('=',$extensions[$i_ext]); | |
if (array_key_exists($split_dirs_key,$extras)) | |
$target_dir = $extras[$split_dirs_key][$i_ext].'/'; | |
else | |
$target_dir = ''; | |
$attr_extractor = 'href'; | |
if ($ext_parts[0][0]=='.') // extract link by extension of the linked file | |
{ | |
$links = $xpath->query('.//div[@class="course-lecture-item-resource"]/a[contains(@href,"'.$ext_parts[0].'")]',$node); | |
$ext_parts[0] = substr($ext_parts[0],1); | |
} | |
else if ($ext_parts[0][0]=='#') // extract link by icon class name | |
{ | |
$ext_parts[0] = substr($ext_parts[0],1); | |
$links = $xpath->query('.//a/i[contains(@class,"icon-'.$ext_parts[0].'")]/..',$node); | |
} | |
else if ($ext_parts[0][0]=='~') // extract link by extension from viewer frame | |
{ | |
$ext_parts[0] = substr($ext_parts[0],1); | |
$links = c_get_embedded_links($row,$ext_parts[0],$auth_token); | |
if ($links===NULL) | |
{ | |
file_put_contents('php://stderr', "Loading embedded frame failed: '$dir/$title'\n"); | |
continue; | |
} | |
else if ($links->length===0 && $i_ext===0) | |
{ | |
file_put_contents('php://stderr', "No resources '$ext_parts[0]' found for '$dir/$title'\n"); | |
if (array_key_exists($debug_key,$extras)) | |
file_put_contents('DEBUG_'.$title,$debug_page); | |
continue; | |
/* $links = c_get_embedded_links2($row,$ext_parts[0],$auth_token); | |
if ($links===NULL) | |
{ | |
file_put_contents('php://stderr', "Loading fallback embedded frame failed: '$dir/$title'\n"); | |
continue; | |
} | |
else if ($links->length===0 && $i_ext===0) | |
{ | |
file_put_contents('php://stderr', "No fallback resources '$ext_parts[0]' found for '$dir/$title'\n"); | |
if (array_key_exists($debug_key,$extras)) | |
file_put_contents('DEBUG_'.$title,$debug_page); | |
continue; | |
} | |
else | |
$attr_extractor = 'flashvars';*/ | |
} | |
else | |
$attr_extractor = 'src'; | |
if (array_key_exists($debug_key,$extras)) | |
file_put_contents('php://stderr', "For $dir/$title ".$links->length." '$ext_parts[0]' links found.\n"); | |
} | |
else // extract link by tooltip of the link | |
{ | |
$links = $xpath->query('.//a[contains(@title,"'.$ext_parts[0].'")]',$node); | |
} | |
$match = FALSE; | |
foreach ($links as $node_link) | |
{ | |
$tooltip = fix_filename($node_link->attributes->getNamedItem('title')->nodeValue); | |
$suffix = ''; | |
if ($links->length>1) | |
$suffix = '.'.$tooltip; | |
$link = $node_link->attributes->getNamedItem($attr_extractor)->nodeValue; | |
if ($attr_extractor=='flashvars') | |
{ | |
$url_idx = strpos($link,'&file=http'); | |
$link = urldecode(substr($link,$url_idx+strlen('&file='))); | |
} | |
if (!in_array($link,$downloads)) | |
{ | |
$file_ext = end(array_values($ext_parts)); | |
$target_filename = $target_dir.$dir.'/'; | |
if ($file_ext == '$$') | |
{ | |
$url_parts = parse_url(urldecode($link)); | |
$url_no_query = $url_parts['scheme'] . '://' . $url_parts['host'] . (isset($url_parts['path'])?$url_parts['path']:''); | |
$target_filename .= pathinfo($url_no_query, PATHINFO_BASENAME); | |
} | |
else if ($file_ext == '^^') | |
$target_filename .= $tooltip; | |
else | |
{ | |
if ($file_ext == '.$') | |
{ | |
$url_parts = parse_url(urldecode($link)); | |
$url_no_query = $url_parts['scheme'] . '://' . $url_parts['host'] . (isset($url_parts['path'])?$url_parts['path']:''); | |
$file_ext = pathinfo($url_no_query, PATHINFO_EXTENSION); | |
} | |
$target_filename .= str_pad($item_count,3,'0',STR_PAD_LEFT).'. '.$title.$suffix.'.'.strtolower($file_ext); | |
} | |
$bash_printer->wget_file_print($link,$target_dir,$target_filename,$downloads_filename,$auth_token); | |
$match = TRUE; | |
} | |
else if (array_key_exists($debug_key,$extras)) | |
file_put_contents('php://stderr', "$dir/$title '$ext_parts[0]' already downloaded.\n"); | |
} | |
} | |
} | |
} | |
echo "\n"; | |
echo 'if [ $ERRORS -ne 0 ] ; then echo "There were some errors while downloading. Run the script again." ; fi'."\n"; | |
if (array_key_exists($beep_key,$extras)) | |
echo "beep\n"; | |
} | |
if ($argc<4) | |
{ | |
file_put_contents('php://stderr', "Error: you should input minimum three arguments, the usage is:\n"); | |
file_put_contents('php://stderr', "\"LECTURES_URL\" \"FILE_TYPES\" \"SESSION_CODE\" [--$beep_key] [--$reverse_key] [--$drop_deco_key] [--$debug_key] [--$split_dirs_key=\"directories per file type\"]\n"); | |
} | |
else | |
{ | |
array_shift($argv); | |
$url = array_shift($argv); | |
$extensions = array_shift($argv); | |
$credentials = parse_ini_file(array_shift($argv),true); | |
$auth_token = array('CAUTH' => $credentials['coursera']['CAUTH'], | |
'__204u' => $credentials['coursera']['__204u']); | |
$extras = array(); | |
foreach ($argv as $a) | |
{ | |
$parts = explode('=',$a); | |
if (!in_array($parts[0],array($split_dirs_key,$drop_deco_key,$reverse_key,$beep_key,$debug_key,$limit_key))) | |
{ | |
file_put_contents('php://stderr', 'Unknown extra argument "'.$parts[0]."\"\n"); | |
exit(1); | |
} | |
$extras[$parts[0]] = count($parts)==1 ? NULL : $parts[1]; | |
} | |
if ($extensions == '*') | |
{ | |
$extras[$split_dirs_key] = 'lectures subtitles subtitles slides data code'; | |
$extensions = array('MP4', // video | |
'#align-justify=txt', // subtitles | |
'#list=srt', // subtitles | |
'#file=.$', // slides | |
'#info-sign=.$', // data | |
'#laptop=$$'); // program code | |
} | |
else | |
$extensions = explode(' ',$extensions); | |
$xpath = get_page_xpath($url,$auth_token); | |
if (array_key_exists($debug_key,$extras)) | |
file_put_contents('DEBUG_index.html',$debug_page); | |
if ($xpath!==NULL) | |
print_wget($xpath,$auth_token,$extensions,$extras); | |
} | |
?> |
<?php | |
/* ===================== COURSERA PREVIEW GETTER ======================= | |
tags: [coursera video download] [coursera lecture download] | |
================================================================== */ | |
require_once 'c_common.php'; | |
// https://class.coursera.org/machlearning-001/lecture/preview/index | |
function print_wget($xpath,$ext,$extras) | |
{ | |
global $split_dirs_key,$drop_deco_key,$extension_key,$reverse_key,$beep_key; | |
$bash_printer = new BashPrinter(); | |
process_extra_arguments($extras); | |
$group_count = 0; | |
echo "ERRORS=0\n"; | |
$group_list = c_query_groups($xpath); | |
$group_count = array_key_exists($reverse_key,$extras) ? $group_list->length : 1; | |
foreach ($group_list as $group) | |
{ | |
$item_count = 0; | |
$dir = c_deco_dir(c_query_dir($xpath,$group),$group_count,array_key_exists($drop_deco_key,$extras)); | |
$group_count += array_key_exists($reverse_key,$extras) ? -1 : +1; | |
$bash_printer->mkdir_print($dir,$extras); | |
// get the list of all lectures within current group (week) | |
$node_list = c_query_list($xpath,$group); | |
foreach ($node_list as $node) | |
{ | |
++$item_count; | |
c_query_row($xpath,$node,array_key_exists($drop_deco_key,$extras),$row,$title); | |
$video_list = c_get_embedded_links($row,$ext); | |
if ($video_list===NULL) | |
{ | |
file_put_contents('php://stderr', "Loading embedded frame failed: '$dir/$title'\n"); | |
continue; | |
} | |
else if ($video_list->length==0) | |
{ | |
file_put_contents('php://stderr', "Filetype '$ext' not found for '".$title."'\n"); | |
continue; | |
} | |
$video = $video_list->item(0); | |
$vid_src = $video->attributes->getNamedItem('src')->nodeValue; | |
$bash_printer->wget_file_print($vid_src,$dir.'/'.str_pad($item_count,3,'0',STR_PAD_LEFT).'. '.$title.'.'.strtolower($ext)); | |
} | |
} | |
echo 'if [ $ERRORS -ne 0 ] ; then echo "There were some errors while downloading. Run the script again." ; fi'."\n"; | |
if (array_key_exists($beep_key,$extras)) | |
echo "beep\n"; | |
} | |
if ($argc<3) | |
{ | |
file_put_contents('php://stderr', "Error: you should input minimum two arguments, the usage is:\n"); | |
file_put_contents('php://stderr', "\"LECTURES_URL\" \"FILE_TYPES\" [--$beep_key] [--$reverse_key] [--$drop_deco_key] [--$split_dirs_key=\"directories per file type\"]\n"); | |
} | |
else | |
{ | |
array_shift($argv); | |
$url = array_shift($argv); | |
$extensions = array_shift($argv); | |
$extras = array(); | |
foreach ($argv as $a) | |
{ | |
$parts = explode('=',$a); | |
if (!in_array($parts[0],array($split_dirs_key,$drop_deco_key,$reverse_key,$beep_key))) | |
{ | |
file_put_contents('php://stderr', 'Unknown extra argument "'.$parts[0]."\"\n"); | |
exit(1); | |
} | |
$extras[$parts[0]] = count($parts)==1 ? NULL : $parts[1]; | |
} | |
$xpath = get_page_xpath($url); | |
if ($xpath!==NULL) | |
print_wget($xpath,$extensions,$extras); | |
} | |
?> |
2014-11-03 * Coursera requires just another authentication key -- | |
`__204u` | |
2013-12-01 * credentials file is used instead of passing directly | |
"CAUTH" value | |
2013-09-24 * ignoring hidden lecture sections | |
2013-09-06 * ability to download resources by icon class | |
* added new flags for setting target filename as original | |
one or as tooltip | |
2013-09-04 * Coursera dropped "session" cookie and introduced | |
"CAUTH" one instead as authentication token -- this | |
one is a global, so once you have it you can use it | |
for all Coursera courses | |
2013-08-06 * additional info from tooltip is processed to be used | |
as a filename | |
2013-07-15 * if there is nothing to download for given folder | |
it is not created | |
2013-06-29 * updated extraction for embedded videos | |
* added option to limit extraction from main page | |
2013-05-15, 2 * corrected reporting failed download or extraction | |
2013-05-15 * reporting failed extraction of main resource | |
2013-05-03 * more unification with c_preview utility -- ability to | |
download embedded videos as well (read: webm), add | |
"~" character before file extension to make Coursera | |
Getter fetch embedded video | |
* "--extension" option in no longer supported -- add | |
dot (".") before file extension instead | |
2013-04-22 * dropping a week/lecture phrase from filenames as well | |
2013-04-15 * refactoring | |
2013-04-07 * keeping log of downloads (file "downloads.log") as | |
countermeasure for renaming the lectures/notes | |
* added "beep" option to make a sound at the end of | |
downloading | |
2013-04-03 * bugfix: the title of lecture sometimes was ignored | |
2013-02-15 * more accurate whitespace removal | |
2013-02-02 * automatically removes corrupted files | |
2013-01-23 * new option "reverse" for the courses which put | |
sections in "from newest to oldest" order | |
2013-01-08 * Coursera changed its web format, along with structure | |
and CSS tags/classes this version hopefully is | |
changed to reflect all of those | |
2012-10-12 * added option --drop_week to drop "week X." part from | |
the directory (remind me this was supposed to be dead | |
simple tool ;-D) | |
2012-10-11 * added option --split_dirs to save files into | |
subdirectories according of files extensions | |
2012-10-03 * added option --extension to get resources by | |
extension of the files, not the tooltips | |
2012-09-18 * added c_common.php | |
* you can specify via filetypes what to grab and what | |
extension set | |
2012-07-05 * initial release of Coursera preview getter | |
2012-06-14 * added little control if there is insufficient number | |
of arguments | |
2012-06-11 * UTF-8 in filenames are supported | |
(another module for PHP is required -- mbstring) | |
* replaces slash and backslash with underscore | |
2012-06-07 * changed this info, added another way of getting | |
cookies | |
2012-06-06, 2 * extensions casing reverted -- they matter again | |
* directories are named according to Lectures sections | |
* handles multiple files for given file type | |
* files are counted within each directory, not within | |
entire course | |
2012-06-06 * extensions can be given lower/upper-case, they do not | |
matter | |
2012-06-05, 2 * creates weekly subdirectories and puts the files in | |
there | |
2012-06-05 * initial release |
@nellaivijay, I will answer properly in a few minutes, but for now -- LOGOUT FROM THE COURSERA NOW (using your regular web browser) AND LOGIN AGAIN. As I wrote, don't share your session code, and you shared in with entire world (7 bln people).
Ok, as for answer for the problem, you forgot to add (as the middle argument) what files you want to grap -- mp4 (lectures), slides, and so on. I modified code now, so it reminds you in more handy manner that the number of arguments is incorrect.
So in your case, it could be:
php c_get.php "https://class.coursera.org/compilers/lecture/index" "MP4" "your session code which you should not share, really"
Hope this helps.
For the record -- I already wrote a reply to your comment, but it was deleted. Since then the script was fixed (i.e. there was added a help message). In your particular case, you forgot about file types you would like to get from Coursera.
This is pretty awesome, I used this script to download webm files and subtitles.
And there is always "PHP Notice: Trying to get property of non-object in /path_to/c_get.php on line 148" when creating the shell script.
@yanglifu90, sorry for late response, I didn't get any notification.
Anyway, could you please tell me for what course you see this message and how to you call it. Please omit your CAUTH.
Hi... I tried this and for some reason, keep getting this error.. Could you please explain what is wrong here? Also are these scripts compatible with the new coursera interface? Thanks!
php c_preview.php "https://www.coursera.org/learn/work-smarter-not-har
der/outline" "mp4"
PHP Warning: Missing argument 1 for BashPrinter::__construct(), called in /home/myuser/coursera/c_
preview.php on line 14 and defined in /home/myuser/coursera/c_common.php on line 105
PHP Notice: Undefined variable: extras in /home/myuser/coursera/c_common.php on line 109
PHP Warning: array_key_exists() expects parameter 2 to be array, null given in /home/myuser/course
ra/c_common.php on line 109
ERRORS=0
if [ $ERRORS -ne 0 ] ; then echo "There were some errors while downloading. Run the script again." ; fi
Getting this error message
vijayram@ubuntu:
/Downloads$ php c_get.php "https://class.coursera.org/compilers/lecture/index" "----------"/Downloads$PHP Notice: Undefined offset: 3 in /home/vijayram/Downloads/c_get.php on line 160
vijayram@ubuntu: