Production-ready project (January 2019 - Ongoing)

ORCA is a Crawler Analysis Benchmark for Data Web Crawlers

About the project

ORCA is a benchmark for Data Web crawler, i.e., crawler that are focussed on gathering structured data. The main idea of ORCA is to generate a synthetic Data Web for which the ground truth is known. It is based on the HOBBIT benchmarking platform and supports distributed crawler implementations.

Available Adapters for Crawler

Server Types

  • Dump file
  • Dereferencing Server
  • HTML Webserver with embedded RDFa
  • SPARQL endpoint
  • CKAN

All servers work on HTTP.


