airportyh/browserifiability.md

## browserifiability.md

      
    Raw
  

              browserifiability.md
            
          
    Browserify Search Request For Help

I have been working on making a search index for npm modules that work with browserify. In order to do this, obviously, for each module on npm, I need to determine whether or not it works with browserify. Easy! I thought. I will run the browserify cli tool on the module browserify node_modules/that_module, and if it exits normally, it works with browserify. So I did that, and implemented a search engine on this premise. Except that, in the search results, I found a lot of modules which, although it "passed the test", they still were useless because simple the act of loading the bundle in a browser would result in a runtime error - a lot of these were a result of the module testing for process.versions.node which doesn't exist in the browserify process shim. Okay, I thought, I'll run the resulting bundle in jsdom, genious! That did reject lots of modules, but it still wasn't good enough. I still got a lot of modules in the search index that don't work. Why? Because they were using certain core modules like fs, and child_process, which simply don't work in the browser. The reason that the bundling step didn't reject these modules is because browserify very leniently returns empty objects as shims for these modules. As a result, it would seem like it worked, but as soon as you try to do something interesting with the module, like reading from the file system, it would bomb with a undefined is not a function. I was like WTF? Why wouldn't you error out right away? From what I've gathered, this is done so that the modules which use these core modules only for optional parts of their functionality will still work. For example, a module that say - renders some markdown - may use fs to load in a .md file, but it may also just directly load markdown from a string or from a stream - which would work in the browser, and we don't want to disallow that necessarily. So, it was back to the drawing board.
Statistics

At this point, I have imported 71087 modules into my database, which, although not exact, is close to the current number on npm. 4739 have errored on npm install. 45314 of the remaining have passed the browserify bundle test. 34346 of that subset have passed the jsdom test.  17938 of those have no dependencies on core modules, which means that the other 16408 depend on at least one core module.
Although I was never good at it, I did take a class in probability and statistics in college, and this seems like to be a good place to apply it. Here's the new plan:

for a given module that passes the automated tests, I will determine the set of core modules it depends on, this includes any core module used by any of its dependencies as well. Turns out answering this question was easy using module-deps, thanks to browserify's modular architecture.
next, I will use the dependence of that core module as a test, i.e. I want to be able to make the statement that given that a module depends on fs, it has the probability of X of actually working in the browser.
In addition to using the set of core modules as tests, I can also use other meta data in the module, such as whether or not it has a testling configuration in its package.json or whether or not it has "browser" as one of its keywords, or if the word browser shows up in its description or readme.

At the end of it, I will use the bayes theorem to tie everything together. I am very rusty with statistics, so I would love some help from other interested parties in vetting my math.
Sampling

In order to supply the a priori probabilities needed to for these calculations, I need to have before-hand data on what those probabilities are. To answer the question of: "given that a module uses fs, how likely is it to work in the browser?" I will need the reverse information: "given a module works in the browser, how likely is it to use fs?" I also need to know: "given a random module, how likely is it to work in the browser w browserify?" as well as "given a random module, how likely is it to use fs?" To get these probabilities, we will use sampling, and then we'll actually determine for each sample (module in our case), without a shadow of doubt, whether it works with browserify in the browser. Using the sample size calculator, I have gotten the number 375 as the sample size I'd need for 95% confidence level and +/-5% confidence interval. I thought I'd round up to 400 for good measure. So, what it means at this point is that I need to randomly select 400 modules from npm, and manually determine whether each one works in the browser. After that, we'll use the sample data and test results to calculate the probabilities we need. I have started down this path, and so far I have manually tested 59 modules. I did a calculation and at my current rate it might take me 3-4 more full work days to finish. Since I also have stuff-I-do-for-money to take care of, and would only work on this for ~1 day a week, this would take 3-4 more weeks to complete. This is where you can help - it would get done faster if it were us, and not just me. If you are interested in helping, read on.
Testing Procedure

The testing procedure I have been using given a random module on npm is:

install it - I assume you know how
read the README, hopefully find a working code example
if found code example, copy it into a run.js, inject copious console.log statements where needed
if there is little or no readme, look for tests and adopt them to write the run.js script.
Failing that, read the source, and try to figure it out. Basically do anything to write a small example that tests the basic core functionality of the module
run the run.js in Node and see that it does the expected thing
if it doesn't work in Node, try to get it to work somehow, or at least make it "work enough"
once it's working in Node, try it in the browser, I have been using beefy, via beefy run.js, then go to http://localhost:9966 in the browser, and determine if it did the right thing
determining whether it works:


if it did the same thing in the browser vs Node, it worked
if some of the functionalitiy worked and some didn't, make a judgement call as to whether the stuff that didn't work is optional. If you called it optional, then it worked.
some modules are written exclusively for the browser, in that case, you will probably get an error when running in Node along the lines of document is not defined. Just skip trying to run it in Node and go straight to the browser
otherwise, it did not work
you can sometimes use some shortcuts/heuristics such as: express plugins and grunt plugins fail automatically because neither express or grunt can work in a browser

The Spreadsheet

If you want to help, work from the full list of 400 in this google spreadsheet and record the results there. To avoid conflicts: highlight a block of them which you want to work on - say 20. By the same token, if you are looking to work on some modules, choose a set which others haven't already highlighted. Please leave your name or internet handle in the tester column when you record the result. If you are uncertain about the result, or have a comment about it, leave it in the comments column.
Feedback

If you have any questions, suggestions or any other feedback, please comment on this gist.