rgbkrk/notebook-require-usage.md

## notebook-require-usage.md

      
    Raw
  

              notebook-require-usage.md
            
          
    nteract doesn't support requirejs because we have the builtin require at our fingertips. Jupyter notebook however has long operated under the assumption that you can use the builtin requirejs for loading modules asynchronously:
require(['d3'], function(d3) {...
I started exploring the idea of providing some of these modules in a "quirks" sort of mode where we provide limited access to "requirejs" while still sandboxed. To find out what modules were commonly required, I turned to Google BigQuery, the GitHub dataset, and a User Defined Function (UDF) written in JavaScript.
I'll flesh out this gist or a blog post later. For now, I'll just provide my query code:
# JavaScript UDF for extracting requireJS modules from Jupyter Notebooks on GitHub

CREATE TEMPORARY FUNCTION
  extractModules(notebookJSON STRING)
  RETURNS Array<string>
  LANGUAGE js AS """
    /**
     * Grab all the modules loaded with requirejs within a jupyter notebook.
     */
    function getModules(s) {
      // Note: The backslash has to be escaped for BigQuery's editor
      // Visualize half as many backslashes here ;)
      var re = new RegExp(/require\\((\\[[^\\]]+\\])/, 'gm');
      
      var modules = [];
      if(!s) {
        return []
      }
      
      var match = re.exec(s);
      while(match !== null) {
        try {
          var hopefullyJSONArray = match[1].replace(/'/g, '"');       
          var arr = JSON.parse(hopefullyJSONArray);
          if(Array.isArray(arr)) {
            modules = modules.concat(arr);
          }
        } catch(e) {
          // assume invalid, can't use
        }
        
        match = re.exec(s);
      }
      
      return modules;
    }
    
    function flatten(a,b) {
      return a.concat(b);
    }
  
    try {
      var notebook = JSON.parse(notebookJSON);
      if(!notebook.cells) {
        return []
      }
      
      var mods = notebook.cells.map(function(cell) {
        if(!cell.outputs) {
          return []
        }
        
        return cell.outputs.map(function(output) {
          var modules = [];
          if((output.output_type === "display_data" || output.output_type === "execute_result") && (output.data['text/html'] || output.data['application/javascript'])) {         
            var html = output.data['text/html'];
            var js = output.data['application/javascript'];
           
            if(html) {
              modules = modules.concat(getModules(html))
            }
            if(js) {
              modules = modules.concat(getModules(js))
            }
          }
          return modules;
        }).reduce(flatten, []);
      }).reduce(flatten, []);
      
      return [...new Set(mods)];
      
    } catch (e) {
      return ["ERROR" + e.toString()];
    }
  """;
  
  
SELECT
  CONCAT("https://github.com/", F.repo_name, "/blob/master/", F.path) AS URL,
  extractModules(C.content) AS modules
FROM (
  SELECT
    id,
    content
  FROM
    `bigquery-public-data.github_repos.contents`
  WHERE
    REGEXP_CONTAINS(content, "require\\(\\['") ) AS C
JOIN (
  SELECT
    repo_name,
    path,
    id
  FROM
    `bigquery-public-data.github_repos.files`
  WHERE
    path LIKE '%.ipynb' ) AS F
ON
  C.id = F.id

Determining Notebook Usage of RequireJS on GitHub