Top-Down Evaluation of Context-Free Path Queries in Graph Databases
rdf; graph queries; LL grammars.
The internet has enabled the creation of an immense global data space, that can be accessed in the form of web pages.
However, web pages are ideal for presenting content to human beings, but not to be interpreted by machines.
In addition, it becomes difficult to relate the information stored in the databases behind these pages.
From this came the Linked Data, a set of good practices for relating and publishing data data.
The standard format recommended by Linked Data for storing and publishing related data is RDF.
This format uses triples in the form (subject, predicate, object) to stabilish relationships between the data.
A triplestore can be easily visualized as a graph, so queries are made by defining paths in the graph.
SPARQL, the standard query language for RDF graphs, supports the definition of paths using regular expressions.
However, regular expressions have reduced expressiveness, insufficient for some desirable queries.
In order to overcome this problem, some studies have proposed the use of context-free grammars to define the paths.
We present an algorithm for evaluating context-free path queries in graphs inspired by top-down parsing techniques.
Given a graph and a query defined over a context-free grammar, our algorithm identifies pairs of vertices linked by paths that form words of the language generated by the grammar.
We show that our algorithm is correct and demonstrate other important properties of it.
It presents cubic worst-case runtime complexity in terms of the number of vertices in the graph.
We implemented the proposed algorithm and evaluated its performance with RDF databases and synthetic graphs to confirm its efficiency.