Huge amounts of data are being generated by IoT devices, and are termed as ‘Big Data’. Big Data needs to be reliably stored and analyzed. Capturing provenance of such data provides a mechanism to explain the result of data analyt-ics, and provides greater trustworthiness to the insights gathered from data analyt-ics. Capturing the provenance of the data stored in NoSQL databases can help to understand how the data reached its current state. A holistic explanation of the re-sults of data analytics can be achieved through the combination of provenance in-formation of the data with results of analytics. This chapter explores the challenges of automatic provenance capture at the middleware level in three different contexts – in an analytics framework like MapReduce, NoSQL data stores analyzed using the MapReduce framework and in NoSQL stores with SQL front ends. The chapter also portrays how the provenance captured in the MapReduce framework is useful for improving the future executions of job re-runs and anomaly detection, apart from its use in debugging.

Automatic Big Data Provenance Capture at Middleware Level in Advanced Big Data Frameworks

CUZZOCREA, Alfredo Massimiliano
2017

Abstract

Huge amounts of data are being generated by IoT devices, and are termed as ‘Big Data’. Big Data needs to be reliably stored and analyzed. Capturing provenance of such data provides a mechanism to explain the result of data analyt-ics, and provides greater trustworthiness to the insights gathered from data analyt-ics. Capturing the provenance of the data stored in NoSQL databases can help to understand how the data reached its current state. A holistic explanation of the re-sults of data analytics can be achieved through the combination of provenance in-formation of the data with results of analytics. This chapter explores the challenges of automatic provenance capture at the middleware level in three different contexts – in an analytics framework like MapReduce, NoSQL data stores analyzed using the MapReduce framework and in NoSQL stores with SQL front ends. The chapter also portrays how the provenance captured in the MapReduce framework is useful for improving the future executions of job re-runs and anomaly detection, apart from its use in debugging.
978-3-319-70102-8
File in questo prodotto:
File Dimensione Formato  
front matter+automatic big data.pdf

non disponibili

Tipologia: Documento in Versione Editoriale
Licenza: Digital Rights Management non definito
Dimensione 1.16 MB
Formato Adobe PDF
1.16 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11368/2898002
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact