Apache Spark and Apache Kafka at the rescue of distributed RDF Stream Processing engines

Abstract : Due to the growing need to timely process and derive valuable information and knowledge from data produced in the Semantic Web, RDF stream processing (RSP) has emerged as an important research domain. In this paper, we describe the design of an RSP engine that is built upon state of the art Big Data frameworks, namely Apache Kafka and Apache Spark. Together, they support the implementation of a production-ready RSP engine that guarantees scalability, fault-tolerance, high availability, low latency and high throughput. Moreover, we highlight that the Spark framework considerably eases the implementation of complex applications requiring libraries as diverse as machine learning, graph processing, query processing and stream processing.
Complete list of metadatas

Cited literature [7 references]  Display  Hide  Download

https://hal-upec-upem.archives-ouvertes.fr/hal-01740515
Contributor : Olivier Curé <>
Submitted on : Thursday, March 22, 2018 - 9:34:06 AM
Last modification on : Thursday, February 7, 2019 - 5:55:11 PM
Long-term archiving on : Thursday, September 13, 2018 - 1:10:41 AM

File

paper43.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01740515, version 1

Citation

Xiangnan Ren, Olivier Curé, Houda Khrouf, Zakia Kazi-Aoul, Yousra Chabchoub. Apache Spark and Apache Kafka at the rescue of distributed RDF Stream Processing engines. 15th International Semantic Web Conference ISWC 2016, 2016, Kobe, Japan. ⟨hal-01740515⟩

Share

Metrics

Record views

125

Files downloads

124