ORCA | DICE Research Group

Project Group Master

ORCA is a system for benchmarking Linked Data crawlers. More information about the benchmark can be found in the IEEE ICSC paper. The code base can be found on github.

The goal of the PG is to extend the benchmark by adding more features. Some of these features are (in no particular order):

Add checks for embedded RDF data (Microformat, Microdata, JSON-LD).
Add more compressions (especially HDT).
Incorporate existing RDF generators.

Project presentation

The topic is briefly explained in the slides that have been used during the presentation of the project groups on February 15th.

Question & Answer Session

Q: Will there be prior assignments? What would these assignments look like?
A: Yes, there will be assignments to ensure that participants are able to program in Java. To this end, the assignments will be simple Java programming tasks.

Q: Is there a seminar connected to this PG?
A: No.

Q: What are the prerequisites for this PG?
A: You should be able to program in Java. Knowledge about RDF, knowledge graphs and/or graph theory are beneficial. If you don't know these things you will have to study it at the beginning of the PG. Unfortunately, there won't be the time to give an introduction into these topics.

In case you have further questions, feel free to contact Michael Röder.

Course in PAUL

L.079.07007 Project Group: ORCA: a crawler analysis benchmark (in English)