Skip to content

Instantly share code, notes, and snippets.

View jlecour's full-sized avatar

Jérémy Lecour jlecour

View GitHub Profile
@jlecour
jlecour / gist:1139737
Created August 11, 2011 14:09
bi-grams comparison algorithm
def string_similarity(str1,str2)
str1.downcase!
pairs1 = (0..str1.length-2).collect {|i| str1[i,2]}.reject { |pair| pair.include? " "}
str2.downcase!
pairs2 = (0..str2.length-2).collect {|i| str2[i,2]}.reject { |pair| pair.include? " "}
union = pairs1.size + pairs2.size
intersection = 0
@jlecour
jlecour / gist:1270400
Created October 7, 2011 14:36
awk est un grotte pleine de mystères pour moi
Puisque plein de voix s'élèvent pour m'aider avec awk, voilà mon "problème".
Je cherche à récupérer des infos dans la sortie d'une commande :
$ gem build project.gemspec
WARNING: no homepage specified
Successfully built RubyGem
Name: project
Version: 2.7.5
File: project-2.7.5.gem
#!/bin/sh
set -e
# git name-rev is fail
CURRENT=`git branch | grep '\*' | awk '{print $2}'`
if [ "master" = "${CURRENT}" ]
then
OUT='gem.out'
@jlecour
jlecour / README.md
Created October 10, 2011 20:33
Identify paperclip attachment files that are not attached to any record

Let's say you have a model, with an files attached, using Paperclip. You have a couple millions of those files and you're not sure that every one of them (and all its thumbnails) are still used by a database record.

You could use this rake task to recursively scan all the directories and check if the files need to be kept or destroyed.

In this example, the model is called Picture, the attachment is image and the path is partitioned like images/001/412/497/actual_file.jpg

The task is going down the path. Each time the path ends with 3 triplets of digits ("001/412/497" for example) it looks for a record with the ID 1412497. If such a record doesn't exist, the whole directory is moved to a parallel images_deleted directory. At the end you can delete the files if you like, or move them to an archive location.

You can use the "dry run" mode : to print which files would be removed

@jlecour
jlecour / s3nukem.rb
Created January 10, 2012 14:21
s3nukem mirror
#!/usr/bin/env ruby
# s3nukem
#
# original by: Stephen Eley (sfeley@gmail.com)
# improved by: Robert LaThanh
# improved again by: Ben Hathaway
#
# A script to delete Amazon S3 buckets (or folders within buckets) with many objects (millions) quickly by
# using multiple threads to retrieve and delete the individual objects.
@jlecour
jlecour / gist:1770127
Created February 8, 2012 14:37
Stress test for Amatch
require 'faker'
require 'amatch'
include Amatch
puts "Start"
str_ref = Faker::Address.city
1_000_000.times do |i|
str_comp = Faker::Address.city
str_ref.pair_distance_similar(str_comp)
str_ref.jarowinkler_similar(str_comp)
# Be sure to restart your server when you modify this file.
# Add new inflection rules using the following format
# (all these examples are active by default):
ActiveSupport::Inflector.inflections do |inflect|
# inflect.plural /^(ox)$/i, '\1en'
# inflect.singular /^(ox)en/i, '\1'
# inflect.irregular 'person', 'people'
# inflect.uncountable %w( fish sheep )
inflect.uncountable %w(
namespace :log_resque do
desc "Print out (every 2 seconds) the number of busy and total workers for Resque"
task :working do
# In your Rails app directory :
# bundle exec rake log_resque:working --silent >> log/resque_working.log &
interval = 2.0
@jlecour
jlecour / explicit_env
Created March 7, 2012 16:44
Monit config for Resque workers.
check process resque_api_0
with pidfile /home/deploy/apps/api/current/tmp/pids/resque_worker_0.pid
start program = "/bin/sh -c 'cd /home/deploy/apps/api/current; GEM_HOME=/home/deploy/.gem/ruby/1.8 GEM_PATH=/home/deploy/.gem/ruby/1.8 PATH=$PATH:/home/deploy/.gem/ruby/1.8/bin:./bin nohup bundle exec rake environment resque:work RAILS_ENV=production QUEUE=* PIDFILE=tmp/pids/resque_worker_0.pid INTERVAL=2 >> log/resque_worker_0.log'" as uid deploy and gid deploy
stop program = "/bin/sh -c 'cd /home/deploy/apps/api/current && kill -s QUIT `cat tmp/pids/resque_worker_0.pid` && rm -f tmp/pids/resque_worker_0.pid; exit 0;'"
if totalmem is greater than 350 MB for 10 cycles then restart # eating up memory?
GROUP resque_api
@jlecour
jlecour / _no_commit_on_master
Created April 3, 2012 10:10
Any (executable) "pre-commit" script in .git/hooks is always executed before commiting, so if it returns a non-0 value, it halts the commit. There i put some external scripts, like this one to stop if the commit would be on master directly.
#!/bin/sh
if [[ `git symbolic-ref HEAD` == "refs/heads/master" ]]
then
echo "You cannot commit in master!"
exit 1
fi