Skip to content

Instantly share code, notes, and snippets.

View brutuscat's full-sized avatar

Mauro Asprea brutuscat

View GitHub Profile
@brutuscat
brutuscat / gist:3893558
Created October 15, 2012 16:49 — forked from mattb/gist:3888345
Some pointers for Natural Language Processing / Machine Learning

Here are the areas I've been researching, some things I've read and some open source packages...

Nearly all text processing starts by transforming text into vectors: http://en.wikipedia.org/wiki/Vector_space_model

Often it uses transforms such as TFIDF to normalise the data and control for outliers (words that are too frequent or too rare confuse the algorithms): http://en.wikipedia.org/wiki/Tf%E2%80%93idf

Collocations is a technique to detect when two or more words occur more commonly together than separately (e.g. "wishy-washy" in English) - I use this to group words into n-gram tokens because many NLP techniques consider each word as if it's independent of all the others in a document, ignoring order: http://matpalm.com/blog/2011/10/22/collocations_1/

@brutuscat
brutuscat / helper.rb
Created August 31, 2012 17:23
helpers in console
>> helper.truncate("Testing", length: 4)
=> "T..."
>> helper.link_to "Home", app.root_path
=> "<a href=\"/\">Home</a>"
@brutuscat
brutuscat / interpolation.rb
Created August 31, 2012 16:39
Completes an array with missing consecutive numbers. Kind of a linear interpolation. Read something more here http://stackoverflow.com/questions/4570755/calculate-missing-values-in-an-array-from-adjacent-values
list = [2, 3, 4, 6, 7, 9, 12]
[].tap{ |array|
[0, *list, 0].each_cons(3).map do |p, x, n|
array << x;
(n - x).times{|i| array << x + i if i > 0}
end
} # => [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
@brutuscat
brutuscat / gist:2974188
Created June 22, 2012 17:49
Facebook Graph api places geo-query for autocomplete
https://graph.facebook.com/search?q=coffee&type=place&center=-34.58,-58.427&distance=1000&fields=location,name,id,checkins
@brutuscat
brutuscat / gist:2944180
Created June 17, 2012 10:46
Steps for POW to reload its config
sudo pow --install-system
sudo launchctl unload /Library/LaunchDaemons/cx.pow.firewall.plist
sudo launchctl load /Library/LaunchDaemons/cx.pow.firewall.plist
# Quit pow process (it will restart itself) via Activity Monitor (MacOSX)
pow --print-config
# Your new config will be there and picked up by POW
May 31, 2012 3:30:22 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: SimpleFSLock@/home/entretenerse/src/apache-solr-3.6.0/example
/data/index/write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:84)
at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:1098)
at org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:84)
at org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:101)
at org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:171)
at org.apache.solr.update.DirectUpdateHandler2.delete(DirectUpdateHandler2.java:286)
at org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:68)
@brutuscat
brutuscat / weak_assign_attributes.rb
Created May 8, 2012 15:42
A Weak Attributes-Assignment Method for ActiveRecords, that merges in the values only when the attribute is blank?
def weak_assign_attributes(new_attributes)
new_attributes = new_attributes.stringify_keys
new_attributes = self.attributes.merge(new_attributes){|key, old_val, new_val| old_val.present? ? old_val : new_val}.
diff(self.attributes)
self.attributes = new_attributes
end
@brutuscat
brutuscat / README
Last active October 26, 2022 06:22
Anonymous Rotating Proxies with Monit, Tor, Haproxy and Delegated. Idea by http://blog.databigbang.com/running-your-own-anonymous-rotating-proxies/
0 - Read http://blog.databigbang.com/running-your-own-anonymous-rotating-proxies/
1 - Install monit, haproxy, tor and delegated.
2 - Setup your environment in the setup.rb file
3 - Just run > ruby setup.rb
4 - ...........
5 - PROFIT! > http://www.southparkstudios.com/clips/151040/the-underpants-business
@brutuscat
brutuscat / monit
Created April 25, 2012 15:31
Chksrvd @cpanel /etc/chkserv.d/monit for monitoring monit
service[monit]=x,x,x,/etc/init.d/monit restart,/usr/bin/monit,root
@brutuscat
brutuscat / deploy.rb
Created February 21, 2012 15:36
Simple Sunspot Solr Capistrano Tasks
# Simple Start and Stop tasks for your Solr server.
# 1 - Download your Solr from http://www.apache.org/dyn/closer.cgi/lucene/solr/ and extract it somewhere
# 2 - Set the :solr_path with the full path to the Solr "example" dir
#
# NOTES:
# - sunspot_solr gem must be in your Gemfile
# - It will run the "example" Jetty-embed Solr server (start.jar)
# - It will automatically pick your project Solr config from your "project_path/solr" with the Sunspot schema and files.