Distributed Data

This is still quite incomplete

Following the "everything is a file" philosophy of Unix, every data object is a file. It's uniquely referenced by its hash. Further metadata are called "tags", and organized in a distributed prefix hash tree. There are also "subjects" (persons, computers), which are referenced by their public keys; necessary metadata for those subjects is also found in the DPHT.

As the DPHT contains all the metadata, objects which are not shared public should also not be visible in a public hash tree; therefore, there are private or group-related hash trees, as well.

Efficient distribution of data to large numbers of peers

For distributing data to many peers, these peers are arranged in a colored tree. Data (e.g. video streams) are divided into different chunks, and sent down different colored branches of the tree. The leaf nodes of each colored branch then distribute the data to the other branches. It can be shown that each node receives as much data as it sends (when the tree is balanced). The latency of the tree is O(log n); the actual base is a tradeoff of bandwidth and sending latency. The rule of thumb is to use the hop-to-hop latency time to send out packets, so higher latency means higher fanout of the tree.

Trees are formed ad hoc, and since nodes can come and go as they like, there needs to be self-healing capabilities. Nodes know 2n neighbors for a tree base n. These trees are used for file-sharing, for group message delivery, and to keep the DPHT in sync.