Huge amounts of data are being generated by IoT devices, and are termed as ‘Big Data’. Big Data needs to be reliably stored and analyzed. Capturing provenance of such data provides a mechanism to explain the result of data analyt-ics, and provides greater trustworthiness to the insights gathered from data analyt-ics. Capturing the provenance of the data stored in NoSQL databases can help to understand how the data reached its current state. A holistic explanation of the re-sults of data analytics can be achieved through the combination of provenance in-formation of the data with results of analytics. This chapter explores the challenges of automatic provenance capture at the middleware level in three different contexts – in an analytics framework like MapReduce, NoSQL data stores analyzed using the MapReduce framework and in NoSQL stores with SQL front ends. The chapter also portrays how the provenance captured in the MapReduce framework is useful for improving the future executions of job re-runs and anomaly detection, apart from its use in debugging.
Automatic Big Data Provenance Capture at Middleware Level in Advanced Big Data Frameworks
CUZZOCREA, Alfredo Massimiliano
2017-01-01
Abstract
Huge amounts of data are being generated by IoT devices, and are termed as ‘Big Data’. Big Data needs to be reliably stored and analyzed. Capturing provenance of such data provides a mechanism to explain the result of data analyt-ics, and provides greater trustworthiness to the insights gathered from data analyt-ics. Capturing the provenance of the data stored in NoSQL databases can help to understand how the data reached its current state. A holistic explanation of the re-sults of data analytics can be achieved through the combination of provenance in-formation of the data with results of analytics. This chapter explores the challenges of automatic provenance capture at the middleware level in three different contexts – in an analytics framework like MapReduce, NoSQL data stores analyzed using the MapReduce framework and in NoSQL stores with SQL front ends. The chapter also portrays how the provenance captured in the MapReduce framework is useful for improving the future executions of job re-runs and anomaly detection, apart from its use in debugging.File | Dimensione | Formato | |
---|---|---|---|
front matter+automatic big data.pdf
Accesso chiuso
Tipologia:
Documento in Versione Editoriale
Licenza:
Digital Rights Management non definito
Dimensione
1.16 MB
Formato
Adobe PDF
|
1.16 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.