SPARQL is a W3C standard for querying the data stored as Resource Description Framework (RDF). The SPARQL queries are represented using triple-patterns, and the querying process searches for these patterns in given RDF. Most of the existing SPARQL evaluators provide centralized, DBMS inspired solutions consuming high resources and offering limited flexibility. To deal with the increasing size of RDF data, it is important to develop scalable and efficient solutions for distributed SPARQL query evaluation. In this paper, we present DISE – an open-source implementation of distributed in-memory SPARQL engine that can scale out to a cluster of machines. DISE represents the RDF graph as a three-way distributed tensor for querying large-scale RDF datasets. This distributed tensor representation offers opportunities for novel distributed applications. DISE translates the SPARQL queries into Spark-tensor operations by exploiting the information about the query complexity and creating a dynamic execution plan. We have tested the scalability and efficiency of DISE on different datasets. The results for this new representation based querying have been found scalable, efficient and comparable to a related approach.
DISE: A Distributed in-Memory SPARQL Processing Engine over Tensor Data
DISE: A Distributed in-Memory SPARQL Processing Engine over Tensor Data.