Nowadays, a leading instance of big data is represented byWeb data that lead to the definition of so-called bigWeb data. Indeed, extending beyond to a large number of critical applications (e.g.,Web advertisement), these data expose several characteristics that clearly adhere to the well-known 3V properties (i.e., volume, velocity, variety). Resource Description Framework (RDF) is a significant formalism and language for the so-called Semantic Web, due to the fact that a very wide family of Web entities can be naturally modeled in a graph-shaped manner. In this context, RDF graphs play a first-class role, because they are widely used in the context of modern Web applications and systems, including the emerging context of social networks. When RDF graphs are defined on top of big (Web) data, they lead to the so-called large-scale RDF graphs, which reasonably populate the next-generation SemanticWeb. In order to process such kind of big data, MapReduce, an open source computational framework specifically tailored to big data processing, has emerged during the last years as the reference implementation for this critical setting. In line with this trend, in this paper, we present an approach for efficiently implementing traversals of large-scale RDF graphs over MapReduce that is based on the Breadth First Search (BFS) strategy for visiting (RDF) graphs to be decomposed and processed according to the MapReduce framework. We demonstrate how such implementation speeds-up the analysis of RDF graphs with respect to competitor approaches. Experimental results clearly support our contributions.

An effective and efficient mapreduce algorithm for computing BFS-based traversals of large-scale RDF graphs

CUZZOCREA, Alfredo Massimiliano;
2016

Abstract

Nowadays, a leading instance of big data is represented byWeb data that lead to the definition of so-called bigWeb data. Indeed, extending beyond to a large number of critical applications (e.g.,Web advertisement), these data expose several characteristics that clearly adhere to the well-known 3V properties (i.e., volume, velocity, variety). Resource Description Framework (RDF) is a significant formalism and language for the so-called Semantic Web, due to the fact that a very wide family of Web entities can be naturally modeled in a graph-shaped manner. In this context, RDF graphs play a first-class role, because they are widely used in the context of modern Web applications and systems, including the emerging context of social networks. When RDF graphs are defined on top of big (Web) data, they lead to the so-called large-scale RDF graphs, which reasonably populate the next-generation SemanticWeb. In order to process such kind of big data, MapReduce, an open source computational framework specifically tailored to big data processing, has emerged during the last years as the reference implementation for this critical setting. In line with this trend, in this paper, we present an approach for efficiently implementing traversals of large-scale RDF graphs over MapReduce that is based on the Breadth First Search (BFS) strategy for visiting (RDF) graphs to be decomposed and processed according to the MapReduce framework. We demonstrate how such implementation speeds-up the analysis of RDF graphs with respect to competitor approaches. Experimental results clearly support our contributions.
http://www.mdpi.com/1999-4893/9/1/7/pdf
File in questo prodotto:
File Dimensione Formato  
algorithms-09-00007-v2.pdf

accesso aperto

Tipologia: Documento in Versione Editoriale
Licenza: Creative commons
Dimensione 2.17 MB
Formato Adobe PDF
2.17 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11368/2897888
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 3
social impact