Skip to content

Instantly share code, notes, and snippets.

@tosiara
Last active June 15, 2021 15:00
Show Gist options
  • Save tosiara/a0c4ebfb2e0707fc1920cf204f6cde0f to your computer and use it in GitHub Desktop.
Save tosiara/a0c4ebfb2e0707fc1920cf204f6cde0f to your computer and use it in GitHub Desktop.
Hashtopolis hack script to workaround DB resource limitation
<?php
/*
This script is a temporary hack to overcome Hashtopolis DB resource limitation.
Original thread: https://github.com/hashtopolis/server/issues/644
Script location: https://gist.github.com/tosiara/a0c4ebfb2e0707fc1920cf204f6cde0f
tl;dr: when a task has too many chunks in the table, apache will fetch them all
and process in a loop in order to calculate task progress - this will cause
out of memory at some point, depending on the server's RAM and CPU.
This script will query the DB, find a continuous chunk sequence (chain) and merge
them into one big (lengthy) chunk.
*/
require_once("inc/conf.php");
$conn = new mysqli($CONN["server"], $CONN["user"], $CONN["pass"], $CONN["db"]);
if ($conn->connect_error)
die ("Could not connect to DB: " . $conn->connect_error . "\n");
if (!isset($argv[1]))
die("Usage: php " . $argv[0] . " taskId\n");
// taskId
$id = intval($argv[1]);
//state 4 means the chunk has been processed
$stmt = $conn->prepare("SELECT * FROM Chunk WHERE taskId = ? AND state = 4 ORDER BY skip");
if ($stmt == FALSE)
die ("Could not prepare statement. Bad SQL?\n");
$stmt->bind_param("i", $id);
$stmt->execute();
$result = $stmt->get_result();
if ($result == NULL)
die("Could not query, result is NULL\n");
if ($result->num_rows < 1)
{
die ("Unexpected empty result set. Bad task id?\n");
}
// we start from the first chunk, which has "skip" (or offset) 0
$chain = 0;
$count = 0;
while ($row = $result->fetch_assoc())
{
$chunkId = $row["chunkId"];
$skip = $row["skip"];
$length = $row["length"];
$progress = $row["progress"];
if ($progress != 10000)
{
echo ($chunkId . "\n");
echo ("WARN: the chunk is done, but the progress is not 100% (" . $progress . ")\n");
}
if ($length < 1)
{
echo ($chunkId . "\n");
echo ("WARN: the length is invalid (" . $length . ")\n");
}
// looks like we have got a chunk that is out of continuous sequence, stop here
if ($skip != $chain)
break;
// all good, calc this chunk and move to the next
$chain = $skip + $length;
$count++;
}
// we did not find any chunk expect the first one, nothing to do
if ($chain == 0)
die("Could not find a continuous batch of chunks to optimize\n");
echo ("Checked ". $count . " records. Found last chunk offset: " . $chain . "\n");
echo ("DELETE FROM Chunk WHERE taskId = " . $id . " AND state = 4 AND skip < " . $chain . " AND skip <> 0\n");
echo ("UPDATE Chunk SET length = " . $chain . ", checkpoint = " . $chain . " WHERE taskId = " . $id . " AND state = 4 AND skip = 0\n");
$result->close();
$stmt->close();
$conn->close();
?>
@tosiara
Copy link
Author

tosiara commented Jun 14, 2021

Sample output:

Checked 295950 records. Found last chunk offset: 5545364397099
DELETE FROM Chunk WHERE taskId = 20 AND state = 4 AND skip < 5545364397099 AND skip <> 0
UPDATE Chunk SET length = 5545364397099, checkpoint = 5545364397099 WHERE taskId = 20 AND state = 4 AND skip = 0

This would drop close to 300k records from the database.

@tosiara
Copy link
Author

tosiara commented Jun 14, 2021

Hashtopolis won't allow to merge chunks that are assigned to cracked hashes.
To do so, we need first to migrate hashes to the chunk with skip=0:

mysql> select chunkId FROM Chunk WHERE taskId = 20 AND state = 4 AND skip = 0;
+---------+
| chunkId |
+---------+
|  117863 |
+---------+
1 row in set (0.54 sec)

mysql> UPDATE Hash set chunkId = 117863 where chunkId in (select chunkId FROM Chunk WHERE taskId = 20 AND state = 4 AND skip < 5545364397099 AND skip <> 0);
Query OK, 1317 rows affected (0.46 sec)
Rows matched: 1317  Changed: 1317  Warnings: 0

After that you will be able to merge chunks:

mysql> DELETE FROM Chunk WHERE taskId = 20 AND state = 4 AND skip < 5545364397099 AND skip <> 0;
Query OK, 295950 rows affected (29.54 sec)

mysql> UPDATE Chunk SET length = 5545364397099 WHERE taskId = 20 AND state = 4 AND skip = 0;
Query OK, 1 row affected (0.04 sec)
Rows matched: 1  Changed: 1  Warnings: 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment