Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
Upgrade script from ReAD commit f5bd3f4 to ad87e7b. See
require_once "deps/meekrodb.2.3.class.php";
require_once "TextExtractor.class.php";
$allArticles = DB::query("SELECT * FROM `read` ORDER BY `time_added` ASC");
$N = count($allArticles);
echo "\tid\twordcount\n";
echo "-------------------------\n";
foreach ($allArticles as $n => $article) {
$id = $article["id"];
// get source
$source = DB::queryFirstField("SELECT `source` FROM `read_sources` WHERE `id` = '$id'");
// extract text and compute word count
$text = TextExtractor::extractText($source);
$wordcount = TextExtractor::countWords($text);
// print progress
echo round(10000 * ($n / $N)) / 100 . "%\t";
echo "$id\t";
echo "$wordcount\n";
// save to database
DB::query("INSERT INTO `read_texts` ( `id`, `text` ) VALUES (%s, %s)", $id, $text);
DB::query("UPDATE `read` SET `wordcount` = %i WHERE `id` = %i", $wordcount, $id);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment