With increased dependence on efficient use and inclusion of diverse corporate and Web based data sources for business information analysis, financial information providers will increasingly need agile information integration capabilities. Linked Data is a set of technologies and best practices that provide such a level of agility for information integration, access, and use. Current approaches struggle to cope with multiple data sources inclusion in near real-time, and have looked to Semantic Web technologies for assistance with infrastructure access, and dealing with multiple data formats and their vocabularies. This chapter discusses the challenges of financial data integration, provides the component architecture of Web enabled financial data integration and outlines the emergence of a financial ecosystem, based upon existing Web standards usage. Introductions to Semantic Web technologies are given, and the chapter supports this with insight and discussion gathered from multiple financial services use case implementations. Finally, best practice for integrating Web data based on the Linked Data principles and emergent areas are described.
Information professionals performing business activity related investigative analysis must routinely associate data from a diverse range of Web based general-interest business and ﬁnancial information sources. XBRL has become an integral part of the ﬁnancial data landscape. At the same time, Open Data initiatives have contributed relevant ﬁnancial, economic, and business data to the pool of publicly available information on the Web but the use of XBRL in combination with Open Data remains at an early state of realisation. In this paper we argue that Linked Data technology, created for Web scale information integration, can accommodate XBRL data and make it easier to combine it with open datasets. This can provide the foundations for a global data ecosystem of interlinked and interoperable ﬁnancial and business information with the potential to leverage XBRL beyond its current regulatory and disclosure role. We outline the uses of Linked Data technologies to facilitate XBRL consumption in conjunction with non-XBRL Open Data, report on current activities and highlight remaining challenges in terms of information consolidation faced by both XBRL and Web technologies.
Cloud computing has the promise of significant benefits that include reduced costs, improved service provisioning, and a move to a pay-per-use model. However, there also are many challenges to successfully delivering cloud-based services; including security, data ownership, interoperability, service maturity and return on investment. These challenges need to be understood and managed before attempting to take advantage of what the cloud has to offer. In this paper we introduce a nine-step cloud life cycle that can be used for both the migration and the ongoing management of public, cloud-based services. A consortium of organizations using an open-innovation approach developed the life cycle. This paper describes each step of the life cycle in terms of the key challenges faced, and the recommended activities, with resultant outputs, needed to overcome them.
The growing number of datasets published on the Web as linked data brings both opportunities for high data availability and challenges inherent to querying data in a semantically heterogeneous and distributed environment. Approaches used for querying siloed databases fail at Web-scale because users don't have an a priori understanding of all the available datasets. This article investigates the main challenges in constructing a query and search solution for linked data and analyzes existing approaches and trends.
Understanding the Maturity of Sustainable ICT.Curry, E., and Donnellan, B.2012.Green Business Process Management – Towards the Sustainable Enterprise, Springer. BibtexBuy
Green and Sustainable Informatics.Curry, E., and Donnellan, B.2012.Harnessing Green IT: Principles and Practices (in press), John Wiley \& Sons, Inc. BibtexBuy
Data centres are complex eco-systems that interconnect elements of the ICT, electrical, and mechanical fields of engineering and hence the efficient operation of a data centre requires a diverse range of knowledge and skills from each of these fields. The Innovation Value Institute (IVI), a consortium of leading organizations from industry, the not for profit sector, and academia, have developed a maturity model that offers a comprehensive, value-based method for organizing, evaluating, planning, and improving the energy efficiency of mature data centres. The development process for the maturity model is discussed, detailing the role of design science in its definition.
A Distributional Approach for Terminology-Level Semantic Search on the Linked Data Web.Freitas, A.; Curry, E.; and O'Riain, S.2012.In 27th ACM Symposium On Applied Computing (SAC 2012), Riva del Garda (Trento), Italy. Bibtex
Over the past few years there has been a proliferation in the use of sensors within different applications. The increase in the quantity of sensor data makes it difficult for end users to understand situations within the environments where the sensors are deployed. Thus, there is a need for situation assessment mechanisms upon the sensor networks to assist users to interpret sensor data when making decisions. However, one of the challenges to realize such a mechanism is the need to integrate real-time sensor readings with contextual data sources from legacy systems. This paper tackles the data enrichment problem for sensor data. It builds upon Linked Data principles as a valid basis for a unified enrichment infrastructure and proposes a dynamic enrichment approach that sees enrichment as a process driven by situations of interest. The approach is demonstrated through examples and a proof-of-concept prototype based on an energy management use case.
The vision of creating a Linked Data Web brings together the challenge of allowing queries across highly heterogeneous and distributed datasets. In order to query Linked Data on the Web today, end-users need to be aware of which datasets potentially contain the data and also which data model describes these datasets. The process of allowing users to expressively query relationships in RDF while abstracting them from the underlying data model represents a fundamental problem for Web-scale Linked Data consumption. This article introduces a multidimensional semantic space model which enables data model independent natural language queries over RDF data. The center of the approach relies on the use of a distributional semantic model to address the level of semantic interpretation demanded to build the data model independent approach. The ﬁnal multidimensional semantic space proved to be ﬂexible and precise under real-world query conditions achieving mean reciprocal rank = 0.516, avg. precision = 0.482 and avg. recall = 0.491
Linked Data brings the promise of incorporating a new dimension to the Web where the availability of Web-scale data can determine a paradigmatic transformation of the Web and its applications. However, together with its opportunities, Linked Data brings inherent challenges in the way users and applications consume the available data. Users consuming Linked Data on the Web, or on corporate intranets, should be able to search and query data spread over potentially a large number of heterogeneous, complex and distributed datasets. Ideally, a query mechanism for Linked Data should abstract users from the representation of data. This work focuses on the investigation of a vocabulary independent natural language query mechanism for Linked Data, using an approach based on the combination of entity search, a Wikipedia-based semantic relatedness measure and spreading activation. The combination of these three elements in a query mechanism for Linked Data is a new contribution in the space. Wikipedia-based relatedness measures address existing limitations of existing works which are based on similarity measures/term expansion based on WordNet. Experimental results using the query mechanism to answer 50 natural language queries over DBPedia achieved a mean reciprocal rank of 61.4\%, an average precision of 48.7\% and average recall of 57.2\%, answering 70\% of the queries.
This paper describes Treo, a natural language query mecha- nismfor Linked Data which focuses on the provision of a precise and scal- able semantic matching approach between natural language queries and distributed heterogeneous Linked Datasets. Treo’s semantic matching approach combines three key elements: entity search, a Wikipedia-based semantic relatedness measure and spreading activation search. While en- tity search allows Treo to cope with queries over high volume and dis- tributed data, the combination of entity search and spreading activation search using a Wikipedia-based semantic relatedness measure provides a flexible approach for handling the semantic match between natural language queries and Linked Data. Experimental results using the DB- Pedia QALD training query set showed that this combination represents a promising line of investigation, achieving a mean reciprocal rank of 0.489, precision of 0.395 and recall of 0.451.
Linked Data promises an unprecedented availability of data on the Web. However, this vision comes together with the associated challenges of querying highly heterogeneous and distributed data. In order to query Linked Data on the Web today, end-users need to be aware of which datasets potentially contain the data and the data model behind these datasets. This query paradigm, deeply attached to the traditional perspective of structured queries over databases, does not suit the heterogeneity and scale of the Web, where it is impractical for data consumers to have an a priori understanding of the structure and location of available datasets. This work describes Treo, a best-effort natural language query mechanism for Linked Data, which focuses on the problem of bridging the semantic gap between end-user natural language queries and Linked Datasets.
The vision of creating a Linked Data Web brings together the challenge of allowing queries across highly heterogeneous and distributed datasets. In order to query Linked Data on the Web today, end users need to be aware of which datasets potentially contain the data and also which data model describes these datasets. The process of allowing users to expressively query relationships in RDF while abstracting them from the underlying data model represents a fundamental problem for Web-scale Linked Data consumption. This article introduces a distributional structured semantic space which enables data model independent natural language queries over RDF data. The center of the approach relies on the use of a distributional semantic model to address the level of semantic interpretation demanded to build the data model independent approach. The article analyzes the geometric aspects of the proposed space, providing its description as a distributional structured vector space, which is built upon the Generalized Vector Space Model (GVSM). The final semantic space proved to be flexible and precise under real-world query conditions achieving mean reciprocal rank = 0.516, avg. precision = 0.482 and avg. recall = 0.491.
Cloud Computing is offering competitive advantages to companies through flexible and, scalable access to computing resources. More and more companies are moving to cloud environments; therefore understanding the requirements for this process is both important and beneficial. The requirements for migrating from a traditional computing environment to a cloud hosting environment are discussed in this paper, considering this migration from a supply chain lifecycle perspective. The cloud supply chain is examined from a lifecycle perspective for the management of the migration project. This paper illustrates the requirements that need to be considered when adopting a cloud migration strategy and the steps to take in order to manage this process.
The integration of sustainable thinking and performance within day-to-day business activities has become an important business need. Sustainable business requires information on the use, flows and destinies of energy, water, and materials including waste, along with monetary information on environment-related costs, earnings, and savings. Creating this holistic view of economic, social and environmental information is not a straightforward mission from an IT perspective, and implies tackling several challenges such as information granularity and overload, the different projections of the same factual information, and the heterogeneity of information systems. In this paper, we propose an entity-centric approach to Green Information Systems to assist organisations in forming a cohesive representation of the environmental impact of their business operations at both micro- and macro- levels. Initial results from a Small Medium-size Enterprise case study are discussed along with future research directions.
Researchers estimate that information and communication technology (ICT) is responsible for at least 2 percent of global greenhouse gas (GHG) emissions. Furthermore, in any individual business, ICT is responsible for a much higher percentage of that business's GHG footprint. Yet researchers also estimate that ICT can provide business solutions to reduce its GHG footprint fivefold. However, because the field is new and evolving, few guidelines and best practices are available. To address this issue, a consortium of leading organizations from industry, the nonprofit sector, and academia has developed and tested a framework for systematically assessing and improving SICT capabilities. The Innovation Value Institute (IVI; http://ivi.nuim.ie) consortium used an open-innovation model of collaboration, engaging academia and industry in scholarly work to create the SICT-Capability Maturity Framework (SICT-CMF), which is discussed in this paper.
The Web is evolving into a complex information space where the unprecedented volume of documents and data will offer to the information consumer a level of information integration and aggregation that has up until now not been possible. Indiscriminate addition of information can, however, come with inherent problems such as the provision of poor quality or fraudulent information. Provenance represents the cornerstone element which will enable information consumers to assess information quality, which will play a fundamental role in the continued evolution of the Web. This paper investigates the characteristics and requirements of provenance on the Web, describing how the Open Provenance Model (OPM) can be used as a foundation for the creation of W3P, a provenance model and ontology designed to meet the core requirements for the Web.
With increased utilization of data within their operational and strategic processes, enterprises need to ensure data quality and accuracy. Data curation is a process that can ensure the quality of data and its fitness for use. Traditional approaches to curation are struggling with increased data volumes, and near real-time demands for curated data. In response, curation teams have turned to community crowd-sourcing and semi-automatedmetadata tools for assistance. This chapter provides an overview of data curation, discusses the business motivations for curating data and investigates the role of community-based data curation, focusing on internal communities and pre-competitive data collaborations. The chapter is supported by case studies from Wikipedia, The New York Times, Thomson Reuters, Protein Data Bank and ChemSpider upon which best practices for both social and technical aspects of community-driven data curation are described.
Provenance is a cornerstone element in the process of enabling quality assessment for the Web of Data. Applications consuming or generating Linked Data will need to become provenance-aware, i.e., being able to capture and consume provenance information associated with the data. This will bring provenance as a key requirement for a wide spectrum of applications. This work describes Prov4J, a framework which uses Semantic Web tools and standards to address the core challenges in the construction of a generic provenance management system. The work discusses key software engineering aspects for provenance capture and consumption and analyzes the suitability of the framework under the deployment of a real-world scenario.
Consumers of financial information come in many guises from personal investors looking for that value for money share, to government regulators investigating corporate fraud, to business executives seeking competitive advantage over their competition. While the particular analysis performed by each of these information consumers will vary, they all have to deal with the explosion of information available from multiple sources including, SEC filings, corporate press releases, market press coverage, and expert commentary. Recent economic events have begun to bring sharp focus on the activities and actions of financial markets, institutions and not least regulatory authorities. Calls for enhanced scrutiny will bring increased regulation and information transparency While extracting information from individual filings is relatively easy to perform when a machine readable format is utilized (for example, using XBRL, the eXtensible Business Reporting Language), cross comparison of extracted financial information can be problematic as descriptions and accounting terms vary across companies and jurisdictions. Across multiple sources the problem becomes the classical data integration problem where a common data abstraction is necessary before functional data use can begin. Within this paper we discuss the challenges in converging financial data from multiple sources. We concentrate on integrating data from multiple sources in terms of the abstraction, linking, and consolidation activities needed to consolidate data before more sophisticated analysis algorithms can examine the data for the objectives of particular information consumers (for e.g. competitive analysis, regulatory compliance, or investor analysis). We base our discussion on several years researching and deploying data integration systems in both the web and enterprise environments.
Content integration is a key challenge within an organizations Enterprise Content Management (ECM) strategy. In this paper we present the challenges associated with the integration of information sources within a ECM. Content analytics is a viable approach to the integration of structured and unstructured sources. This paper provides an overview of a semantically powered integration approach to , structured and unstructured relationships between the structured and unstructured content. This is achieved using information association using ontology-based entity detection discover and disambiguation. The commercial potential of the technology is discussed in terms of its business value proposition and the market positing of potential products and services.
Future self-management software systems will need to operate in diverse environments with changing requirements. This necessitates flexible system implementations, easily customizable to target domains and associated requirements. An important part of a self-management infrastructure is the self-representation, which models system functionality concerns, allowing their inspection and adaptation. As the range of self-management capabilities expands, the task of creating appropriate self-representations becomes ever more complex. Future self-representations will need to track greater amounts of system information than ever before, and in a way that's flexible, customizable, and portable between system implementations. Meeting these requirements will require a maturing in the design and construction practices for self-representations. The Model-View-Controller design pattern can improve concern separation in a self-representation. This pattern helps encapsulate state, analysis, and realization operations, improving a self-representation's flexibility, customization, and portability.
Nature-inspired algorithms such as genetic algorithms, particle swarm optimisation and ant colony algorithms have successfully solved computer science problems of search and optimisation. The initial implementations of these tech- niques focused on static problems solved on single machines. These have been ex- tended by adding parallelisation capabilities in the vein of distributed computing with a centralised master/slave approach. However, the natural systems on which nature-inspired algorithms are based possess many additional characteristics that are of potential benefit within computing environments. In this paper, we discuss the benefits of nature-inspired techniques within modern and emerging computing environments. Software entities within these environments execute and interact in a fashion that is parallel, asynchronous, and decentralised. Given that the natural environment is in itself parallel, asynchronous and decentralised, nature-inspired techniques are an excellent fit for computing environments that exhibit these char- acteristics. Future research challenges for nature-inspired techniques within emerg- ing computing environments are also discussed.
This dissertation investigates the evolution of coordination techniques between self-managed systems within the problem domain of Message-Oriented Middleware (MOM). The basic goal of autonomic computing is to simplify and automate the management of com- puting systems, both hardware and software, allowing them to self-manage, without the need for human intervention. Within the software domain, self-management techniques have been utilised to empower a system to automatically self-alter (adapt) to meet their environmental and user needs. Current self-managed middleware platforms service their environment in an isolated and in- troverted manner. As they progress towards autonomic middleware, one of the most interesting research challenges facing self-managed middleware platforms is their lack of cooperation and co- ordination to achieve mutually beneficial outcomes. The primary hypothesis of this work is that within dynamic operating environments, coor- dinated interaction between self-managed systems can improve the ability of the individual and collective systems to fulfil performance and autonomy requirements of the environment. Coordination between next-generation middleware systems will be a vital mechanism needed to meet the challenges within future computing environments. As a step toward this goal, this thesis investigates the benefits of coordination between self-manages middleware platforms. This work explores coordination within the realm of Message-Oriented Middleware (MOM). MOM is an ideal candidate for the study of cooperation as it is an interaction-oriented middleware. In addition, self-management techniques have yet to be applied within the MOM domain, providing an opportunity to investigate their application within this domain. The main findings of this research are as follows. The coordination of self-managed systems can improve the ability of the individual and collective systems to fulfil performance and autonomy requirements of the environment. Secondly, the introduction of self-management techniques within MOM systems increases their performance within dynamic operating environments.
Message-oriented middleware (MOM) provides an effective integration mechanism for distributed systems, but it must change frequently to adapt to evolving business demands. Content-based routing (CBR) can increase the flexibility of MOM-based deployments. Although centralized CBR improves a messaging solution's maintainability, it limits scalability and robustness. This article proposes an alternative, decentralized approach to CBR that uses a portable rule base to maximize MOM-based deployments' maintainability, scalability, and robustness
This paper describes a multi-agent system architecture that would permit implementing an established and successful nature-inspired algorithm, Ant Colony System (ACS), in a parallel, asynchronous and decentralised environment. We reviewACS, highlighting the obstacles to its implementation in this sort of environment. It is suggested howthese obstacles may be overcome using a pheromone infrastructure and some modifications to the original algorithm. The possibilities opened up by this implementation are discussed with reference to an elitist ant strategy. Some related exploratory work is reported.
This paper motivates research into implementing nature-inspired algo- rithms in decentralised, asynchronous and parallel environments. These character- istics typify environments such as Peer-To-Peer systems, the Grid and autonomic computing which demand robustness, decentralisation, parallelism, asynchronicity and self-organisation. Nature-inspired systems promise these properties. However, current implementations of nature-inspired systems are only loosely based on their natural counterparts. They are generally implemented as synchronous, sequential, centralised algorithms that loop through passive data structures. For their successes to be relevant to the aforementioned newcomputing environments, variants of these algorithms must work in truely decentralised, parallel and asynchronous Multi- Agent System (MAS) environments. A general methodology is presented for engi- neering the transfer of nature-inspired algorithms to such a MAS framework. The concept of pheromone infrastructures is reviewed in light of emerging standards for agent platform architecture and interoperability. These ideas are illustrated us- ing a particularly successful nature-inspired algorithm, Ant Colony System for the Travelling Salesman Problem.
As the deployment of self-managed reflective middleware platforms increases, the process of collecting and examin- ing information used within the reflective process becomes ever more complex. The quality of such information is vi- tal to ensure the successful outcome of the self-management process. However, the cost associated with the collection of this information plays a major role in influencing the success of a self-managed system. Within typical deployment environments it is not uncom- mon for multiple self-managed systems to be deployed, each collecting information for use within their respective reflec- tive computations. In many cases, these systems will collect the same information, replicating the effort required to re- trieve the information. Such replication could be avoided by sharing information between systems to reduce the overall cost of collection within the deployment environments. Current self-managed systems lack adequate support for information collection and sharing. This work proposes the use of an independent information service to assist in the col- lection and management of information within self-managed middleware systems.
Varieties of Message-Oriented Middleware (MOM) platforms are available each with their own propriety functionality to solve specific messaging challenges. At present, it is not possible to mix and match these propriety features into a customized MOM solution. A number of patterns have been identified that allow a software systems implementation to be more flexible and extensible. This work investigates the use of one such pattern, the POSA Interceptor pattern, in the construction of a MOM framework that is easily customised and extended in a structured way. This framework, Chameleon, is designed to support the use of interceptors (message handlers) with a MOM platform to facilitate dynamic changes to the behaviour of the deployed platform. The framework also allows for interceptors to be used on both the client-side and server- side, permitting advance functionality to be deployed to the client, and for co-operation between client-side and server-side interceptors.
A prerequisite of participating in an enterprise system is the ability to cope with the rigorous demands experienced within the system. In order to cope with these demands, a number of infrastructure support services are available to assist developers in their creation. A key obstacle to the widespread deployment of agent technology is the relative immaturity of agent technology with regard to its infrastructure. This paper presents a solution to the problem by offering enterprise-level infrastructure services to agent platforms in an agent friendly manner. The proposed solution uses Service- Agent Gateways (SAG) to offer these services within an agent environment. This paper describes the SAG design pattern and presents an implementation of the pattern that offers the functionality of Enterprise Message Services (EMS) to an agent environment. The Java Message Service (JMS)-Agent Gateway enhances the acceptability of agent platforms within business environments, moving them a step closer to full-scale participation in the digital enterprise.
With the development of numerous adaptive and reflective middleware platforms, inter-platform interoperability is a desirable next step. At present, little or no interoperability is possible at the meta-layer of reflective middleware. The emergence of an open standard for meta-layer interaction is imperative to support the development of next-generation middleware that can express their needs and capabilities to platforms with which they interact. In this paper, we de- scribe the foundations of the ARMAdA interaction standard for adaptive and reflective middleware platforms.
This chapter provides an introduction to Message-Oriented Middleware (MOM) covering topics including message queues, Messaging Models, common MOM services, Java Message Service (JMS), and Service-Oriented Architectures (SOA).
Introducing Reflective Techniques to Message Hierarchies.Curry, E.2003.In Doctoral Symposium at 17th European Conference on Object-Oriented Programming (ECOOP 2003), Darmstadt, Germany. Bibtex
Could Message Hierarchies Contemplate?.Curry, E., and Lyons, G.2003.In 17th European Conference on Object-Oriented Programming (ECOOP 2003) (Poster), Darmstadt, Germany. Bibtex
A prerequisite of joining an enterprise system is the ability to cope with the rigorous demands experienced within such systems. One of the most fundamental of these demands is the requirement for enterprise-level systems to have guaranteed reliable messaging between the participants of the system. Our research involves integrating an agent platform with an enterprise messaging service. This first step in combining agent technology with a mainstream messaging service is vital to the participation of agent systems within the digital enterprise. This paper introduces a new Message Transport Protocol (MTP) for the Java Agent DEvelopment (JADE) platform. The new protocol uses the Java Messaging Service (JMS) to deliver inter- platform communication between agent platforms. The paper provides a brief overview of the design of this new MTP, evaluates its performance, and examines the benefits of the MTP in comparison to the other available MTPs, it then concludes and highlights plans for the development of the MTP.
Hierarchical channel structures are used to create granular sub-channels from a parent channel. Utilized in routing situations that are more or less static, they require that the channel namespace schema be both well defined and universally understood. The publish/subscribe messaging model currently requires a message publisher to place messages into a specific channel within the hierarchy. A relocation of responsibility for channel selection logic from the publishing client to the middleware service would open up static channel hierarchies to the application of reflective techniques. This shift in responsibilities enables the service more control over the definition, creation and maintenance of the channel hierarchy. The service is now able to apply reflective and adaptive techniques to dynamically adapt, grow and improve the hierarchy to better meet the needs of its changing environment and operating conditions. This paper describes work-in-progress on the definition of reflective channel hierarchies.