DefiningImageAccess/Resource/SchemaAlignment
From ImageWeb
Contents |
Schema Alignment Work
Work on schema and data alignment has been under way for many years in the relational database community (cf. Doan and Halevy, below), but more recently the needs are appearing in slightly different form for dealing with disparate Semantic Web schemas and ontologies.
Papers
- http://pages.cs.wisc.edu/~anhai/papers/si-survey-db-community.pdf - Doan and Halevy, Semantic Integration Research in the Database Community: A Brief Survey. I found this to be a useful background paper, covering some basic problems and techniques of semantic schema alignment.
- http://www.ics.forth.gr/isl/publications/paperlink/Mapping_TR385_December06.pdf - Haridimos Kondylakis, Martin Doerr and Dimitris Plexousakis, Mapping Language for Information Integration. Describes a mapping language for schema alignment that captures a number of commonly needed alignment patterns.
- http://drops.dagstuhl.de/opus/volltexte/2005/48/pdf/04391.StuckenschmidtHeiner1.Paper.48.pdf - Noy and Stuckenschmidt, Ontology Alignment: An annotated Bibliography. This survey appears to have greater emphasis on schema alignment work for the Semantic Web.
- http://portal.acm.org/citation.cfm?id=1188913.1188916 - Huimin Zhao, Semantic Matching Across Heterogeneous Data Sources, CACM January 2007. This article is oriented toward databases rather than the Semantic Web, and emphasizes the approach of dealing with schema alignment separately from instance alignment.
Links
- http://pages.cs.wisc.edu/~anhai/projects/schema-matching.html - Schema & Ontology Matching page by AnHai Doan.
- http://www.ics.forth.gr/isl/publications/by_year.jsp?Year_of_publication=2006 - references a number of other papers by Martin Doerr that are not online, and which I have not been able to review.
- Meetings/20070426/MartinDoerr - notes from a telephone conversation with Martin Doerr, with allusions to unpublished materials.
- DefiningImageAccess/Standard/CIDOC - notes about CIDOC CRM Core, which seems to have promise as a basis for aligning a range of observation- and media-related metadata standards.
Commentary
We make much of separating schema alignment from coreference, without being entirely clear about what we mean by this. Partly, the distinction isn't always clear but, roughly, what this means is distinguishing between ontology- or schema-alignment and detecting references to the same object or instance in different data sources.
Work on relational database alignment has an easier time of it: the distinction between schema and table data is clear. This is discussed in a recent CACM article: Semantic Matching Across Heterogeneous Data Sources, Huimin Zhao, January 2007 (http://portal.acm.org/citation.cfm?id=1188913.1188916).
For semantic web data, the distinction is sometimes less clear, as there can be some lack of clarity about whether a term is a class (an ontological or schema element) or an instance -- e.g., see http://www.w3.org/TR/swbp-classes-as-values/. Some examples:
- animals vs plants - these would usually be recognized as classes of objects
- Dolly the sheep - an instance of a sheep
- Hydrogen - the class of hydrogen atoms, but statements about Hydrogen might be treating Hydrogen as an instance.
- A strain, or genetic line, of drosophila: a subclass of all drosophila, but gene expression observations would likely relate to this as an instance.
On reflection, I think the initial distinction may be fairly easy: if a term appears in instance data, then recognizing different terms meaning the same thing in different instance data is a coreference problem. If different instance stores describe the same attributes using different schematic structures, then schema alignment is needed. This distinction isn't unambiguous, but I think it serves as a starting point.

