Let's say I've downloaded big file using torrent. Then add very small file and recreate new torrent file. Like subtitle.
Now two torrent files are totally different file to machine. Tracker and torrent client would treat them different torrent. Of course we don't need duplicate original data file for multi seeding. But seeders and leechers split by two torrent file. They don't know about they have exact same file. Torrent client and tracker cannot connect people for exact same data. We have split share pool for exact same file. It's not efficient. More seeders, more speed.
Let's say original torrent file is 1.torrent
.
[ file1 ]
Now I add some file and make new torrent file 2.torrent
that looks like,
[
[ file1 ] => This is 1.torrent.
+ file2
] => This is 2.torrent
Another person got reached 2.torrent
. Hey maybe create new torrent file based 2.torrent
. So we got 3.torrent
.
[
[
[ file1 ] => This is 1.torrent.
+ file2
] => This is 2.torrent
+ file3
] => This is 3.torrent
So if you got 3.torrent
, you are in same share pool with 1.torrent
, 2.torrent
people.
What if there is 4.torrent
, 5.torrent
or so more in near future?
We maybe query to torrent search engine or DHT, PEX.
"Please give me torrent list based on
3.torrent
"
If there is new interesting torrent, we can upgrade 3.torrent
-> X.torrent
. We don't need any interaction to local files. Only added files will be downloaded.
If you know about source code management tools like git
, this idea is basically 'git repoisitory in one torrent file'.
git init
make 1.torrent
git commit
make 2.torrent
git commit
...
- Torrent file can contain another torrent file.
- We can keep seeder/leecher pool big as possible as. Don't split us if we have exact same contents.
- If there is other torrents based on particular torrent, we can discover them.
That's the key points.
How this idea can be real? Is that possible?
"How can a BT client find an earlier version of this torrent?" is probably not the question you're actually trying to solve. "How can a BT client discover more sources for the data represented by this torrent, given that a subset of that data is also present in other torrents?" is probably the problem you're trying to solve.
Torrent file modifications aren't a branchless phenomenon... A may give rise to B, but then one person might modify B to get C while someone else modifies B to get D, and a third person modifies A to get E. All of these could benefit by knowing about each other.
Advertising a cryptographic Merkle tree root (or other cryptographic hash, though Merkle trees have several advantages) for each file in the DHT would allow the downloaders of these files to find seeders or peers from other torrent swarms, if the Merkle tree roots are added to the per-file descriptions in the "files" section of the torrent.
Normal SHA-256, SHA-3-256, or SHA-1 could be used, but the advantage of Merkle trees is that the 64-MB-granularity (or arbitrary granuality that's a power of two number of kilobytes) row of the Merkle tree can be asked from a peer and that information can be cryptographically verified without having any trust of the peer, and allows backwards-compatible tweaking of granularity without changing the torrent file and (more importantly) without changing the Merkle tree root.
Using Merkle trees would also open up the door for easily and verifiably advertising on the DHT availability of sub-ranges of files, say at the 64 MB granularity. Disk corruption and network corruption happen, and for large rare files, there very well may be another file out there that has only a few bits corrupted and has usable data that other downloaders could use... or the file has some header metadata changed by one person (maybe correcting the date of an MLK speech or something) while the rest of the data remains identical and identically aligned. If the 64 MB granularity row of the Merkle tree for that file is advertised via DHT and present in the torrent, then these "large and nearly identical" files can also be used as source for data. On a side note, anyone creating a file format should place user-editable metadata at the end of the file whenever possible, and the same goes for creators of metadata editing tools. That way, metadata differences don't affect alignment of data and won't prevent partial cross-sharing of files that differ only in their user-edited metadata. Yes, one can deal with the data misalignment issue by using a rolling hash and breaking the file into blocks at bounded-but-irregular lengths, but this is much more complicated than just putting the user-editable metadata at the end of the file.