Representing Texts as Contextualized Entity-Centric Linked Data Graphs

Freitas, Andre; O'Riain, Sean; Curry, Edward; da Silva, Joao C.P.; Carvalho, Danilo S.

doi:10.1109/DEXA.2013.21

by Andre Freitas, Sean O'Riain, Edward Curry, Joao C.P. da Silva, Danilo S. Carvalho

Abstract:

The integration of a small fraction of the information present in the Web of Documents to the Linked Data Web can provide a significant shift on the amount of information available to data consumers. However, information extracted from text does not easily fit into the usually highly normalized structure of ontology-based datasets. While the representation of structured data assumes a high level of regularity, relatively simple and consistent conceptual models, the representation of information extracted from texts need to take into account large terminological variation, complex contextual/dependency patterns, and fuzzy or conflicting semantics. This work focuses on bridging the gap between structured and unstructured data, proposing the representation of text as structured discourse graphs (SDGs), targeting an RDF representation of unstructured data. The representation focuses on a semantic best-effort information extraction scenario, where information from text is extracted under a pay-as-you-go data quality perspective, trading terminological normalization for domain-independency, context capture, wider representation scope and maximization of textual information capture.

View PDF

Reference:

Andre Freitas, Sean O'Riain, Edward Curry, Joao C.P. da Silva, Danilo S. Carvalho, "Representing Texts as Contextualized Entity-Centric Linked Data Graphs", In 2013 24th International Workshop on Database and Expert Systems Applications, IEEE, Prague, pp. 133-137, 2013.

Bibtex Entry:

@inproceedings{Freitas2013d,
abstract = {The integration of a small fraction of the information present in the Web of Documents to the Linked Data Web can provide a significant shift on the amount of information available to data consumers. However, information extracted from text does not easily fit into the usually highly normalized structure of ontology-based datasets. While the representation of structured data assumes a high level of regularity, relatively simple and consistent conceptual models, the representation of information extracted from texts need to take into account large terminological variation, complex contextual/dependency patterns, and fuzzy or conflicting semantics. This work focuses on bridging the gap between structured and unstructured data, proposing the representation of text as structured discourse graphs (SDGs), targeting an RDF representation of unstructured data. The representation focuses on a semantic best-effort information extraction scenario, where information from text is extracted under a pay-as-you-go data quality perspective, trading terminological normalization for domain-independency, context capture, wider representation scope and maximization of textual information capture.},
address = {Prague},
author = {Freitas, Andre and O'Riain, Sean and Curry, Edward and da Silva, Joao C.P. and Carvalho, Danilo S.},
booktitle = {2013 24th International Workshop on Database and Expert Systems Applications},
doi = {10.1109/DEXA.2013.21},
file = {:Users/ed/Library/Application Support/Mendeley Desktop/Downloaded/Freitas et al. - 2013 - Representing Texts as Contextualized Entity-Centric Linked Data Graphs.pdf:pdf},
isbn = {978-0-7695-5070-1},
month = {aug},
pages = {133--137},
publisher = {IEEE},
title = {{Representing Texts as Contextualized Entity-Centric Linked Data Graphs}},
url = {http://www.edwardcurry.org/publications/Freitas_WebS13.pdf},
year = {2013}
}