- cd to the directory with the articles from :gozer bug 634657
- assuming all the images in articles aren't spam, then find all the unique images in the articles:
egrep -h -o '{img src="img/wiki_up/[^-]*-[^-]*-[^-]*-[^\.]*.(jpeg|jpg|png|PNG|JPG)[^&]*"[a-zA-Z0-9 &;=}]*' * | sort | uniq > unique_sumomo_images_1march_2011.txt
[file0]egrep -o '[0-9a-f]{32}-[0-9]{10}-[0-9]{1,3}-[0-9]{1}.(png|PNG|jpg|JPG|jpeg|JPEG|gif|GIF)' unique_sumomo_images_1march_2011.txt > just.unique.image.filenames.1march2011.txt
[file1]
- open up those images from :sancus bug 634667 in [file1] and see if any are spam, if so update [file1] by removing them from [file1]
- find the spam images by removing the known good images from [file1]
var=`cat /Users/rolandtanglao/Documents/MOZILLA_MESSAGING/KITSUNE/TIKI_WIKI_ARTICLES/sumo/just.unique.image.filenames.1march2011.txt`
grep -v "$var" all_images.txt > all_spam_images.txt
[file2]cat all_spam_images.txt | xargs -n 1 open
(OS X dependent, will open the images in Preview.app for manual inspection)
- manually inspect the images and then remove from [file2] non spam images:
- port the non spam images to kitsune with the filenames
var=`cat all_spam_with_good_images_taken_out1Mar2011.txt`
grep -v "$var" all_images.txt > all_non_spam_images.txt
- open all the images and make sure they are not spam and update the file manually
cat all_non_spam_images.txt | xargs -n 1 open
- output file is: non_spam_images.1march2011.txt
Created
March 2, 2011 17:31
-
-
Save rtanglao/851325 to your computer and use it in GitHub Desktop.
How to find spam images in tiki wiki before upgrading to tiki
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment