Skip to content

Instantly share code, notes, and snippets.

@rtanglao
Created March 2, 2011 17:31
Show Gist options
  • Save rtanglao/851325 to your computer and use it in GitHub Desktop.
Save rtanglao/851325 to your computer and use it in GitHub Desktop.
How to find spam images in tiki wiki before upgrading to tiki
  • cd to the directory with the articles from :gozer bug 634657
  • assuming all the images in articles aren't spam, then find all the unique images in the articles:
    • egrep -h -o '{img src="img/wiki_up/[^-]*-[^-]*-[^-]*-[^\.]*.(jpeg|jpg|png|PNG|JPG)[^&]*"[a-zA-Z0-9 &;=}]*' * | sort | uniq > unique_sumomo_images_1march_2011.txt [file0]
    • egrep -o '[0-9a-f]{32}-[0-9]{10}-[0-9]{1,3}-[0-9]{1}.(png|PNG|jpg|JPG|jpeg|JPEG|gif|GIF)' unique_sumomo_images_1march_2011.txt > just.unique.image.filenames.1march2011.txt [file1]
  • open up those images from :sancus bug 634667 in [file1] and see if any are spam, if so update [file1] by removing them from [file1]
  • find the spam images by removing the known good images from [file1]
    • var=`cat /Users/rolandtanglao/Documents/MOZILLA_MESSAGING/KITSUNE/TIKI_WIKI_ARTICLES/sumo/just.unique.image.filenames.1march2011.txt`
    • grep -v "$var" all_images.txt > all_spam_images.txt [file2]
    • cat all_spam_images.txt | xargs -n 1 open (OS X dependent, will open the images in Preview.app for manual inspection)
  • manually inspect the images and then remove from [file2] non spam images:
  • port the non spam images to kitsune with the filenames
    • var=`cat all_spam_with_good_images_taken_out1Mar2011.txt`
    • grep -v "$var" all_images.txt > all_non_spam_images.txt
    • open all the images and make sure they are not spam and update the file manually
      • cat all_non_spam_images.txt | xargs -n 1 open
      • output file is: non_spam_images.1march2011.txt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment