nteract doesn't support requirejs because we have the builtin require
at our fingertips. Jupyter notebook however has long operated under the assumption that you can use the builtin requirejs for loading modules asynchronously:
require(['d3'], function(d3) {...
I started exploring the idea of providing some of these modules in a "quirks" sort of mode where we provide limited access to "requirejs" while still sandboxed. To find out what modules were commonly required, I turned to Google BigQuery, the GitHub dataset, and a User Defined Function (UDF) written in JavaScript.
I'll flesh out this gist or a blog post later. For now, I'll just provide my query code:
# JavaScript UDF for extracting requireJS modules from Jupyter Notebooks on GitHub
CREATE TEMPORARY FUNCTION
extractModules(notebookJSON STRING)
RETURNS Array<string>
LANGUAGE js AS """
/**
* Grab all the modules loaded with requirejs within a jupyter notebook.
*/
function getModules(s) {
// Note: The backslash has to be escaped for BigQuery's editor
// Visualize half as many backslashes here ;)
var re = new RegExp(/require\\((\\[[^\\]]+\\])/, 'gm');
var modules = [];
if(!s) {
return []
}
var match = re.exec(s);
while(match !== null) {
try {
var hopefullyJSONArray = match[1].replace(/'/g, '"');
var arr = JSON.parse(hopefullyJSONArray);
if(Array.isArray(arr)) {
modules = modules.concat(arr);
}
} catch(e) {
// assume invalid, can't use
}
match = re.exec(s);
}
return modules;
}
function flatten(a,b) {
return a.concat(b);
}
try {
var notebook = JSON.parse(notebookJSON);
if(!notebook.cells) {
return []
}
var mods = notebook.cells.map(function(cell) {
if(!cell.outputs) {
return []
}
return cell.outputs.map(function(output) {
var modules = [];
if((output.output_type === "display_data" || output.output_type === "execute_result") && (output.data['text/html'] || output.data['application/javascript'])) {
var html = output.data['text/html'];
var js = output.data['application/javascript'];
if(html) {
modules = modules.concat(getModules(html))
}
if(js) {
modules = modules.concat(getModules(js))
}
}
return modules;
}).reduce(flatten, []);
}).reduce(flatten, []);
return [...new Set(mods)];
} catch (e) {
return ["ERROR" + e.toString()];
}
""";
SELECT
CONCAT("https://github.com/", F.repo_name, "/blob/master/", F.path) AS URL,
extractModules(C.content) AS modules
FROM (
SELECT
id,
content
FROM
`bigquery-public-data.github_repos.contents`
WHERE
REGEXP_CONTAINS(content, "require\\(\\['") ) AS C
JOIN (
SELECT
repo_name,
path,
id
FROM
`bigquery-public-data.github_repos.files`
WHERE
path LIKE '%.ipynb' ) AS F
ON
C.id = F.id
Excuse the poor JS above, it was a bit strange to write embedded code for a UDF.
Note that this is standard SQL in google bigquery, not their legacy syntax. You'll have to enable standard SQL to use it.