Mauro Asprea brutuscat

## index.html
<!doctype html>
<html>
  <head>
    <script src="https://cdn.jsdelivr.net/npm/@undecaf/zbar-wasm@0.9.11/dist/index.js"></script>
    <script src="https://cdn.jsdelivr.net/npm/@undecaf/barcode-detector-polyfill@0.9.13/dist/index.js"></script>
    <script>
        try {
            window['BarcodeDetector'].getSupportedFormats()
        } catch {
            window['BarcodeDetector'] = barcodeDetectorPolyfill.BarcodeDetectorPolyfill

## gist:1ca90fffb6a03c69faa8
    /*
     *  tests whether payment matches options
     *
     *  returns 'true' if matches, 'false' otherwise
     *
     *  $opts is an array of options to filter the payments by
     *  possible values are
     *    'withoutPrefix' => 'prefix' - only select payment without this prefix
     *    'withPrefix' => 'prefix' - only select payment with this prefix
     *  if no options are provided, always returns 'true'

## 99java
## Setup java

if [ `uname -m` == 'x86_64' ]; then
 PATH="/usr/lib64/jvm/java-7-oracle/jre/bin/"
 JAVA_HOME="/usr/lib64/jvm/java-7-oracle/"
else
 PATH="/usr/lib/jvm/java-7-oracle/jre/bin/"
 JAVA_HOME="/usr/lib/jvm/java-7-oracle/"
fi

## prompt.sh
# Configure colors, if available.
if [ -x /usr/bin/tput ] && tput setaf 1 >&/dev/null; then
    c_reset='\[\e[0m\]'
    c_user='\[\e[0;32m\]'
    c_path='\[\e[1;34m\]'
    c_git_clean='\[\e[0;37m\]'
    c_git_staged='\[\e[0;32m\]'
    c_git_unstaged='\[\e[0;31m\]'
else
    c_reset=

## gist:3893558

      
        
          
            
              
              1 file
            
          
          
            
              
              1 fork
            
          
          
            
              
              0 comments
            
          
          
            
              
              2 stars
            
          
        
        
          
              
          
          
            
                brutuscat
                / gist:3893558
            
            
              Created
              October 15, 2012 16:49
                — forked from mattb/gist:3888345
            
              
                Some pointers for Natural Language Processing / Machine Learning
              
          
        
      
        
  
      
    Here are the areas I've been researching, some things I've read and some open source packages...
Nearly all text processing starts by transforming text into vectors:
http://en.wikipedia.org/wiki/Vector_space_model
Often it uses transforms such as TFIDF to normalise the data and control for outliers (words that are too frequent or too rare confuse the algorithms):
http://en.wikipedia.org/wiki/Tf%E2%80%93idf
Collocations is a technique to detect when two or more words occur more commonly together than separately (e.g. "wishy-washy" in English) - I use this to group words into n-gram tokens because many NLP techniques consider each word as if it's independent of all the others in a document, ignoring order:
http://matpalm.com/blog/2011/10/22/collocations_1/
	<!doctype html>
	<html>
	<head>
	<script src="https://cdn.jsdelivr.net/npm/@undecaf/zbar-wasm@0.9.11/dist/index.js"></script>
	<script src="https://cdn.jsdelivr.net/npm/@undecaf/barcode-detector-polyfill@0.9.13/dist/index.js"></script>
	<script>
	try {
	window['BarcodeDetector'].getSupportedFormats()
	} catch {
	window['BarcodeDetector'] = barcodeDetectorPolyfill.BarcodeDetectorPolyfill
	/*
	* tests whether payment matches options
	*
	* returns 'true' if matches, 'false' otherwise
	*
	* $opts is an array of options to filter the payments by
	* possible values are
	* 'withoutPrefix' => 'prefix' - only select payment without this prefix
	* 'withPrefix' => 'prefix' - only select payment with this prefix
	* if no options are provided, always returns 'true'
	## Setup java

	if [ `uname -m` == 'x86_64' ]; then
	PATH="/usr/lib64/jvm/java-7-oracle/jre/bin/"
	JAVA_HOME="/usr/lib64/jvm/java-7-oracle/"
	else
	PATH="/usr/lib/jvm/java-7-oracle/jre/bin/"
	JAVA_HOME="/usr/lib/jvm/java-7-oracle/"
	fi
	# Configure colors, if available.
	if [ -x /usr/bin/tput ] && tput setaf 1 >&/dev/null; then
	c_reset='\[\e[0m\]'
	c_user='\[\e[0;32m\]'
	c_path='\[\e[1;34m\]'
	c_git_clean='\[\e[0;37m\]'
	c_git_staged='\[\e[0;32m\]'
	c_git_unstaged='\[\e[0;31m\]'
	else
	c_reset=