Representing Interoperable Provenance Descriptions for ETL Workflows (bibtex)
by André Freitas, Benedikt Kämpgen, João Gabriel Oliveira, Seán O'Riain, Edward Curry
Abstract:
The increasing availability of data on the Web provided by the emergence of Web 2.0 applications and, more recently by Linked Data, brought additional complexity to data management tasks, where the number of available data sources and their associated heterogeneity drastically increases. In this scenario, where data is reused and repurposed on a new scale, the pattern expressed as Extract-Transform-Load (ETL) emerges as a fundamental and recurrent process for both producers and consumers of data on the Web. In addition to ETL, provenance, the representation of source artifacts, processes and agents behind data, becomes another cornerstone element for Web data management, playing a fundamental role in data quality assessment, data semantics and facilitating the reproducibility of data transformation processes. This paper proposes the convergence of this two Web data management concerns, introducing a principled provenance model for ETL processes in the form of a vocabulary based on the Open Provenance Model (OPM) standard and focusing on the provision of an interoperable provenance model for Web-based ETL environments. The proposed ETL provenance model is instantiated in a real-world sustainability reporting scenario.
Reference:
André Freitas, Benedikt Kämpgen, João Gabriel Oliveira, Seán O'Riain, Edward Curry, "Representing Interoperable Provenance Descriptions for ETL Workflows", Chapter in 3rd International Workshop on Role of Semantic Web in Provenance Management (SWPM 2012), pp. 43-57, 2012. [slides]
Bibtex Entry:
@incollection{Freitas2012c,
abstract = {The increasing availability of data on the Web provided by the emergence of Web 2.0 applications and, more recently by Linked Data, brought additional complexity to data management tasks, where the number of available data sources and their associated heterogeneity drastically increases. In this scenario, where data is reused and repurposed on a new scale, the pattern expressed as Extract-Transform-Load (ETL) emerges as a fundamental and recurrent process for both producers and consumers of data on the Web. In addition to ETL, provenance, the representation of source artifacts, processes and agents behind data, becomes another cornerstone element for Web data management, playing a fundamental role in data quality assessment, data semantics and facilitating the reproducibility of data transformation processes. This paper proposes the convergence of this two Web data management concerns, introducing a principled provenance model for ETL processes in the form of a vocabulary based on the Open Provenance Model (OPM) standard and focusing on the provision of an interoperable provenance model for Web-based ETL environments. The proposed ETL provenance model is instantiated in a real-world sustainability reporting scenario.},
annote = {<a href="http://www.slideshare.net/andrenfreitas/representing-interoperable-provenance-descriptions-for-etl-workflows">[slides]</a>},
author = {Freitas, Andr{\'{e}} and K{\"{a}}mpgen, Benedikt and Oliveira, Jo{\~{a}}o Gabriel and O'Riain, Se{\'{a}}n and Curry, Edward},
booktitle = {3rd International Workshop on Role of Semantic Web in Provenance Management (SWPM 2012)},
doi = {10.1007/978-3-662-46641-4_4},
file = {:Users/ed/Library/Application Support/Mendeley Desktop/Downloaded/Freitas et al. - 2012 - Representing Interoperable Provenance Descriptions for ETL Workflows.pdf:pdf},
keywords = {Data Transformation,ETL,LEIdataspace,Linked Data,Provenance,Web},
mendeley-tags = {LEIdataspace},
pages = {43--57},
title = {{Representing Interoperable Provenance Descriptions for ETL Workflows}},
url = {http://www.edwardcurry.org/publications/preprint_provenance_ETL_workflow.pdf},
year = {2012}
}
Powered by bibtexbrowser