Representing Texts as Contextualized Entity-Centric Linked Data Graphs (bibtex)
by André Freitas, João C. Pereira da Silva, Danilo S. Carvalho, Sean O'Riain, Edward Curry
Abstract:
The integration of a small fraction of the information present in the Web of Documents to the Linked Data Web can provide a significant shift on the amount of information available to data consumers. However, information extracted from text does not easily fit into the usually highly normalized structure of ontology-based datasets. While the representation of structured data assumes a high level of regularity, relatively simple and consistent conceptual models, the representation of information extracted from texts need to take into account large terminological variation, complex contextual/dependency patterns, and fuzzy or conflicting semantics. This work focuses on bridging the gap between structured and unstructured data, proposing the representation of text as structured discourse graphs (SDGs), targeting an RDF representation of unstructured data. The representation focuses on a semantic best-effort information extraction scenario, where information from text is extracted under a pay-as-you-go data quality perspective, trading terminological normalization for domain-independency, context capture, wider representation scope and maximization of textual information capture.
Reference:
André Freitas, João C. Pereira da Silva, Danilo S. Carvalho, Sean O'Riain, Edward Curry, "Representing Texts as Contextualized Entity-Centric Linked Data Graphs", In 12th International Workshop on Web Semantics and Web Intelligence (WebS 2013), 24th International Conference on Database and Expert Systems Applications (DEXA), Prague, 2013.
Bibtex Entry:
@inproceedings{Freitas2013d,
abstract = {The integration of a small fraction of the information present in the Web of Documents to the Linked Data Web can provide a significant shift on the amount of information available to data consumers. However, information extracted from text does not easily fit into the usually highly normalized structure of ontology-based datasets. While the representation of structured data assumes a high level of regularity, relatively simple and consistent conceptual models, the representation of information extracted from texts need to take into account large terminological variation, complex contextual/dependency patterns, and fuzzy or conflicting semantics. This work focuses on bridging the gap between structured and unstructured data, proposing the representation of text as structured discourse graphs (SDGs), targeting an RDF representation of unstructured data. The representation focuses on a semantic best-effort information extraction scenario, where information from text is extracted under a pay-as-you-go data quality perspective, trading terminological normalization for domain-independency, context capture, wider representation scope and maximization of textual information capture.},
address = {Prague},
author = {Freitas, Andr{\'{e}} and da Silva, Jo{\~{a}}o C. Pereira and Carvalho, Danilo S. and O'Riain, Sean and Curry, Edward},
booktitle = {12th International Workshop on Web Semantics and Web Intelligence (WebS 2013), 24th International Conference on Database and Expert Systems Applications (DEXA)},
file = {:Users/ed/Library/Application Support/Mendeley Desktop/Downloaded/Freitas et al. - 2013 - Representing Texts as Contextualized Entity-Centric Linked Data Graphs.pdf:pdf},
title = {{Representing Texts as Contextualized Entity-Centric Linked Data Graphs}},
url = {http://www.edwardcurry.org/publications/Freitas_WebS13.pdf},
year = {2013}
}
Powered by bibtexbrowser