← Go back

Tentris Cluster - A Distributing Tensor-Based Triple

Master Thesis

In the DICE group, we develop Tentris(github, paper), one of the fastest triple stores currently available.

Tentris is a triplestore that is conceptually based on tensors and tensor algebra. Tensors are implemented by a condensed, monolithic indexing datastructure dubbed hypertrie. The evaluation of SPARQL queries is conducted as Einstein summation (einsum). The einsum implementation of Tentris is based on a state-of-the-art worst-case optimal join (WCOJ) algorithm.

So much, so awesome. But there are still things that need to be worked on. Currently, Tentris can only run on a single machine. To scale on more machines to process even larger datasets or to serve more parallel requests, tentris needs to be distributed. So your master thesis task will be to scetch and implement Tentris Cluster.

The master thesis includes:

  • Scan related work on triple store distribution strategies
  • Implement at least one existing distribution or come up with an own strategy
  • Benchmarking the implementation regarding its loading time, query processing performance and scalability
  • Compare with at least one other distributed triple store

Required skills:

  • Knowledge of Semantic Web Standards like SPARQL and RDF
  • Good modern C++ coding skills (C++17/20)
  • Experience with C++ template programming
  • Some prior knowledge on distributed data bases might be helpful