Scalable Data Integration

LoadingLoading previews...
wais-seminar-20171030.mp4
Video
Download (1GB)
wais-seminar-20171030.mp4
wais-seminar-20171030.mp4
1 file in this resource

Scalable Data Integration

Information and data integration focuses on providing an integrated view of multiple distributed and heterogeneous sources of information (such as web sites, databases, peer or sensor data etc.). Through information integration all this scattered data can be combined and queried. In this talk we are dealing with the problems of data integration, data exchange/warehousing, and query answering with or without ontologies. We present an algorithm for virtual data integration where data sources are queried in a distributed way and no centralized repository is materialized. Our algorithm processes queries in the presence of thousands of data sources in under a second. We extend this solution to virtual integration settings where domain knowledge is represented using constraints/ontologies (e.g. OWL2-QL). Subsequently, we examine the Chase algorithm which is the main tool to reason with constraints for data warehousing, and develop an optimization that performs orders of magnitude faster. We also examine hybrid solutions to data integration where both materialization/warehousing and virtual data integration are combined in order to optimize query answering. We discuss how these approaches can help set up future research directions and outline important applications to data management and analysis over integrated data.

View Item

Toolbox

There are no actions available for this resource.