Let's say I've downloaded big file using torrent. Then add very small file and recreate new torrent file. Like subtitle.
Now two torrent files are totally different file to machine. Tracker and torrent client would treat them different torrent. Of course we don't need duplicate original data file for multi seeding. But seeders and leechers split by two torrent file. They don't know about they have exact same file. Torrent client and tracker cannot connect people for exact same data. We have split share pool for exact same file. It's not efficient. More seeders, more speed.
Let's say original torrent file is 1.torrent
.
[ file1 ]
Now I add some file and make new torrent file 2.torrent
that looks like,
[
[ file1 ] => This is 1.torrent.
+ file2
] => This is 2.torrent
Another person got reached 2.torrent
. Hey maybe create new torrent file based 2.torrent
. So we got 3.torrent
.
[
[
[ file1 ] => This is 1.torrent.
+ file2
] => This is 2.torrent
+ file3
] => This is 3.torrent
So if you got 3.torrent
, you are in same share pool with 1.torrent
, 2.torrent
people.
What if there is 4.torrent
, 5.torrent
or so more in near future?
We maybe query to torrent search engine or DHT, PEX.
"Please give me torrent list based on
3.torrent
"
If there is new interesting torrent, we can upgrade 3.torrent
-> X.torrent
. We don't need any interaction to local files. Only added files will be downloaded.
If you know about source code management tools like git
, this idea is basically 'git repoisitory in one torrent file'.
git init
make 1.torrent
git commit
make 2.torrent
git commit
...
- Torrent file can contain another torrent file.
- We can keep seeder/leecher pool big as possible as. Don't split us if we have exact same contents.
- If there is other torrents based on particular torrent, we can discover them.
That's the key points.
How this idea can be real? Is that possible?
It's worth noting that many of your stated goals can be achieved using existing BitTorrent extensions - to whit, BEP0038 provides for finding already downloaded data so that it only requires a rehash.
That on it's own doesn't provide for combined pools of peers but in combination with BEP0039, you can ensure that with any regularly updated content (think, for instance, of a torrent containing all the hotfixes for a piece of software), peers on the n-th torrent will automatically connect to the n+1-th torrent.
I believe that using these approaches in concert can make for a much improved torrent ecosystem, particularly for periodic content. It also accounts for security through BEP0039's application of BEP0035. As such, I would love to see effort go into implementing these mechanisms in various open source clients, rather than duplicating the work.
That said, there is a certain amount of value in bridging the distinct swarms, and to that end I would suggest a much, much simpler approach.
First, forget about updating the torrents; there's already a standard for that. Second, don't use git or anything anywhere near as complicated; the beauty of the .torrent is that it's a simple, well-defined and extensible structure with no external dependencies. Finally, don't weigh the torrent down by including the full .torrent - your updated .torrent already has the hash of the old files, by necessity.
To that end, I would suggest a structure as follows:
As you can see, this format differs from accepted standards in only one way - the addition of the "replaces" key.
The value of this "replaces" key would be a dictionary of torrents which the current torrent supersedes - the key of the dictionary is the info hash of each old torrent, and it's value is a list (optionally empty) of URIs to acquire that torrent from.
The torrent client would use this key in two ways - first, if it loads a torrent which replaces a torrent already loaded in the client, that would be stopped/deleted in favour of the new torrent. This would help to stop the splitting of resources, unnecessary announces (to each previous swarm), and could significantly reduce load on the tracker. Second, if the torrent client is unable to reach seeds (or is simply searching for more), it knows that it can use data from the replaced torrents, if any seeds are still active on them.
By simply providing URIs to the replaced torrents, instead of embedding the entire torrent, we achieve several purposes - we stop the actual torrent file from growing exponentially (or even linearly) as the chain grows longer, we enable the use of alternate protocols such as magnet links (thereby providing support for any future addressing protocols), and we allow older torrents to be updated to include new trackers, etc. Any key outside of the info hash on the older torrents can be updated and still be used by the current torrent.
There are still some flaws with this method, but in the end it comes down to these pros and cons in my view:
Pros:
Cons:
This is a problem that I've been considering for some time and am intending to lobby the various open source clients to improve their support for the mentioned BEPs, as I see it as being important to the ecosystem - I'm glad that other people see the problem and want to fix it too!
EDIT: Added another con, updated the first con.