Skip to content

Instantly share code, notes, and snippets.

@dideler dideler/automated-data-collection.md Secret

Last active Oct 13, 2015
Embed
What would you like to do?
Scraping Tools & Tips

Python

Javascript

Java

C++

  • QtWebKit (probably has wrappers for other languages)

Browser plugins

Ruby

PHP

Misc

TIPS

  • Check out the mobile versions of the sites
  • Phone number regex \(?([0-9]{3})\)?[-. ]?([0-9]{3})[-. ]?([0-9]{4})
  • JS example
function extractPhoneNumbers(text) {
    var match
      , numbers = []
      , r = /\(?([0-9]{3})\)?[-. ]?([0-9]{3})[-. ]?([0-9]{4})/g;

    /* 
     * Calling exec on a regular expression object with the global flag (/g)
     * set on same text multiple times causes it to move from one match to
     * the next until there are no more.
     */ 
    while ((match = r.exec(text)) !== null) {
        // Give the numbers a nice uniform look.
        numbers.push('(' + match[1] + ') ' + match[2] + '-' + match[3]);
    }

    return numbers;
}
  • Email regex
    ([a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`{|}~-]+)*(@|\sat\s)(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?(\.|\sdot\s))+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?)
    
  • Scriptular - decent regex tester
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.