The pages on this website document the process to implement a general-purpose web-based DHT search engine by using and extending existing open source tools without having to reinvent the wheel. The search engine must be flexible enough to support the four following basic criteria:
1. The search engine must be able to access established DHT networks to respond quickly to a web search request without having to search through any particular website. A distributed hash table, abbreviately DHT, is a class of a decentralized distributed network system that provides a lookup service similar to a hash table using (key, value) pairs. Participants in a DHT network uses a unique 20-byte key to identify and retrieve information stored in the value pair to locate the desired contents. DHT forms the basis infrastructure to build more complex service such as peer-to-peer file sharing and content distribution systems.
The BitTorrent protocol helps build many large BitTorrent DHT networks based on the popular Kademlia DHT design such as the Vuze DHT and Mainline DHT. At any moment there can be as many as 50 million users participating in a particular sharing digital contents. Therefore it is possible to build a search engine that just looks at what are currently shared on the DHT to report a query result without having to search through any particular website. The search result can vary between consecutive searches using the same query keywords since the search engine simply reports what is currently available on the DHT networks without actually storing any contents.
As designed, the search engine is not a tracker because it does not share content, and it does not participate in or even coordinate with any BitTorrent swarms. It is not a BitTorrent Index either because it does not store and does not maintain a static list of torrents. It is simply a reporting tool and has no idea if the description of a torrent means that the digital contents are copyrighted or illegal.
2. The search engine must be able to suggest related content that are somewhat related to the current search term. This is a statistical assumption that if user A and B are sharing contents queried by your search term and also sharing “other contents”, the user who issues the search term might be interested in the “other contents” as well. The “other contents” may help visitors discover new and useful contents without having to do extensive search or lookup.
3. The search engine must be able to identify and monitor the sharing activities of a particular peer and the number of peers downloading a particular content.
4. The search engine can interface with a relational database to categorize the search result. Over time, the accumulated data can be used to reconstruct a private torrent index site with no effort.
Out of all open source BitTorrent clients, Vuze Azureus offers the most number of usable options
and at least extensible without a lot of digging underneath the surface.
Search current peer activities on DHT not from websites
Complete privacy and anonymous using VPN
Extensibility does not mean that it is easy or even straightforward.
Since the engine is developed as a Java platform, the structure of the codes are fairly
straightforward. Even though the documentation is poor and incomplete, most of the features
can be reused or extended by guessing the purpose and function of each major module.
To update the source code and recompile, it is recommended to use the Eclipse IDE.
This is a nightmare scenario assuming having to go through so many installation steps
for just a minute change. Fortunately, command-line javac would do.
Java is cross-platform so the rebuilding can be done on Arm-based platform instead of Intel/MS Windows.
Another factor which greatly influence the decision to pick this engine is that most of
the codes can be tested or investigated interactively using the Jython language.
Depending on the search term, the number of initial primary contacts can be between 500 to 1,000 peers (unique IPs).
Each peer can have one or more torrents whose description contains the pattern of the search term.
The peers are most likely belong to the same country.
The Vuze protocol probably limits the number of torrents found per peer to be 30.
However, each primrary peer can have up to 50 secondary contacts having torrents with the same search pattern.
This website is an experiment in progress to build and document the most appropriate and efficient open-source tools to search for new contents of interest on various BitTorrent DHT (Distributed Hash Table) networks. These tools can function as a pseudo general purpose BitTorrent search engine that neither relies on user uploading torrents nor scanning any other torrent index sites (1377x, The Pirate Bay, rarbg, idope.se, etc) and traditional search engine sites (Yahoo, Google, Bing, etc).
Visitors may search the BitTorrent DHT (Distributed Hash Table) network in real time or from the updated 2-million-plus torrent database already collected from the DHT over a period of most recent 30 days.
The BitTorrent DHT (Distributed Hash Table) network contains millions of active users known as peers sharing millions of data files to transfer from one computer to another. Data files are mostly digital contents of motion pictures, TV broadcasts, recorded music, applications software, games, published books, both public domain and copyrighted, mostly copyrighted. Depending upon your locale, participating or assisting the copying of copyrighted contents is illegal and can result in fines and criminal charges.
Most BitTorrent clients (BitComet, rTorrent, Transmission, Vuze, etc) concentrate on searching for peers to connect in order to transfer the desired pieces of digital contents in the shortest time possible. Most users do not realize how vast the DHT network is. It is not unusual to find hundreds of millions of users connected at a specific time of day from almost all countries on our planet.
Tools on this web site utilizes Python scripts in the form of plugin to interact with the BitTorrent client Vuze and a relational database. These specialized tools only collect the title of contents potentially be transferred by peers currently active on the DHT network to demonstrate that it is possible to build a private torrent index site within 30 days from a single computer. Any title discovered on the DHT network does not necessarily mean that the digital contents are available to download even though the title reports a high seed count.
1. There are about 2 million active torrents at a time from over 200 countries. The top 50 countries (having most active users) are listed at this post.
2. The current approximate number of active users from the mainline DHT and the Azureus DHT network closest to this web site is listed on the following page. To conserve processor and transfer bandwidth, the report is updated every hour.
3. The search capability provided on this web site demonstrates that a specific peer can be monitored for transfer activities over a specified period of time (sorted by date, by contents, and by completed downloads). For 100 popular search terms (file extension, format type, studio name, topic, genre, actors and actresses), there are at least 100 primary users per term organized by countries which can quickly collect results from 10,000 resulting peers. Each peer can have up to 30 titles per term shown on a single web page, one page per peer). After duplication is removed, the search result can approach 100k torrents.
4. Search by terms for new peers (number of closest peers found depending upon the term can get up to 2k-3k peers per term. Each peer may suggest up to 50 additional peers participating in the transfer of the same digital contents (usually from the same region or country).
5. Search by specific peers for related contents and related peers in addition to the specified term.
6. Search from existing 30-day database. The database is automatically updated:
-from terms to discover new contents and new peers
-from existing peers for new contents
7. Monitor a specific torrent to collect information on transfer activities among peers and regions of interest.
8. Daily report of suggested contents based upon specific search criteria.