DefiningImageAccess/OAIMore
From ImageWeb
Contents |
Harvesting OAI-PMH metadata beyond Dublic Core
Motivation
To the large extend, the metadata exposed by the OAI-PMH protocol is typically of descriptive metadata which is expressed by means of metadata format of varying complexity, such as Dublin Core. However, more domain-specific metadata, compared to the descriptive metadata, is also needed by applications for the purpose of providing a richer description of their digital objects. For example, as shown in Figure 1, in the SERPENT project (http://serpent.eprints.org/), an image object (such as a piglet squid) can be described by some descriptive metadata, such as its title, item type, copyright, etc. Also, such an object can be described with domain-specific metadata, for example, the site where the image was taken, the habitat of the object contained in the image, etc.
It is possible to expose metadata more descriptive than the Dublic Core in the OAI-PMH thanks to its notion of parallel metadata formats that enables repositories to expose metadata about the same resource in multiple metadata formats[1]. As a matter of fact, metadata records in the OAI-PMH are any data that can be validated against a W3C XML Schema. Therefore, the OAI-PMH can be a medium for incremental, date-sensitive exchange of any form of semi-structured data.
Possible solutions
Given the above example from the Serpent project, in order to expose domain-specific metadata by the OAI-PMH protocol, two possible solutions can be adopted to: 1) one is to use a (DC) property in Serpent’s current DC metadata records to point to the endpoint of the extra domain-specific metadata; and 2) another is to extend the metadata format supported by the current Serpent OAI adoption.
The first solution should be very straightforward and easy to adopt. But there are two potential problems:
- it leaves to discussion about which property can be used to define the endpoint, in order to express the semantics of this endpoint. By semantics we mean, for example, this property is used to link to some domain-specific metadata, using a schema of a particular namespace, which is created at a particular time.
- if two sets of domain-specific metadata are to be interoperated into the descriptive DC metadata, questions will be raised concerning, e.g., how the above chosen property can differ these two sets of metadata, how a harvester can choose from the set of metadata.
The second solution aims to adhere to the specification of OAI-PMH by extending the range of metadata format supported by the current Serpent implementation of Serpent. This means that, Serpent will not only expose metadata records of format “oai_dc”, but also of any format which is compatible with W3C XML Schema, for example “soton_serpent”. Based on the practical experience from [2, 3], in order to realize this extension, we need to undertake the follows:
- deposit the domain-specific metadata in the OAI-PMH repository and ensure that they are expressed in XML format
- ensure that all metadata formats available for harvest are included in the response to a ListMetadataFormats request.
Figure 2 shows how a metadata harvester or the aggregator could harvest the domain-specific OAI-PMH records from the Serpent repository after the extension of the second solution:
- the Harvester sends a ListMetadataFormats request to the repository in order to obtain a list of supported metadata formats.
- Below shows an example response from the repository:
- Then the Harvester could use a particular metadata prefix to harvest metadata records from the Serpent repository, for example, using the request of
<OAI-PMH xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/
http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
<responseDate>2007-04-20T12:58:22Z</responseDate>
<request verb="ListMetadataFormats">http://archive.serpentproject.com/perl/oai2/
</request>
<ListMetadataFormats>
<metadataFormat>
<metadataPrefix>oai_dc</metadataPrefix>
<schema>http://www.openarchives.org/OAI/2.0/oai_dc.xsd</schema>
<metadataNamespace>http://www.openarchives.org/OAI/2.0/oai_dc/</metadataNamespace>
</metadataFormat>
<metadataFormat>
<metadataPrefix>soton_serpent</metadataPrefix>
<schema>http://archive.serpentproject.com/OAI/2.0/soton_serpent.xsd</schema>
<metadataNamespace>http://archive.serpentproject.com/OAI/2.0/soton_serpent/
</metadataNamespace>
</metadataFormat>
</ListMetadataFormats>
</OAI-PMH>
Another advantage of this approach is to facilitate the Schema Registry Service to perform mappings between metadata from different repositories that use different metadata schemas. As shown in Figure 3, the second approach also enables the Aggregator service to be aware of the namespace of the metadata schema that is used to describe the harvested metadata records. This information about metadata schemas enables the Schema Registry service to perform the mapping for the Aggregator service.
Related discussion
Problems with using OAI-PMH to serve domain specific metadata have been raised by Chris Gutteridge at Southampton: see Meetings/20070416/DefiningImageAccess-ECS-Southampton#ePrints_and_domain_specific_metadata.
We believe, but are not yet certain, that similar concerns will affect DSpace. Question posed at http://lists.ontonet.org/mailman/private/bioimage/2007q2/000664.html.
Discussion with OULS about Fedora strongly suggests that, for normal access, similar concerns will arise with Fedora. See: Meetings/20070503/DefiningImageAccess-SERS-Oxford#Domain_specific_metadata.
Reference
[1] Herbert Van de Sompel, Michael L. Nelson, Carl Lagoze, and Simeon Warner, Resource Harvesting within the OAI-PMH Framework, D-Lib Magazine, 10(12), December 2004
[2] Van de Sompel, Herbert, Jeff Young and Thom Hickey. "Using the OAI-PMH ... Differently," D-Lib Magazine, 9(7/82), July/August 2003 <doi:10.1045/july2003-young>
[3] Caroline Arms, Naomi Dushay, Muriel Foulonneau, Kat Hagedorn, Arwen Hutt, Diane Hillmann, Ann Lally, Bill Landis, Clay Redding, Jenn Riley, Sarah Shreeves, Jewel Ward and Simeon Warner Best Practices for Shareable Metadata August 2005



