Production-ready project (January 2019 - Ongoing)
ORCA is a Crawler Analysis Benchmark for Data Web Crawlers
ORCA is a benchmark for Data Web crawler, i.e., crawler that are focussed on gathering structured data. The main idea of ORCA is to generate a synthetic Data Web for which the ground truth is known. It is based on the HOBBIT benchmarking platform and supports distributed crawler implementations.
All servers work on HTTP.