DefiningImageAccess/RepositorySurvey

From ImageWeb

Jump to: navigation, search

Contents

Repository data to survey

Cambridge DSpace

Meetings/20070122/Dspace@Cambridge - meeting notes

Collections

Survey Summary

  • Software system: DSpace
  • Repository URL: http://www.dspace.cam.ac.uk/
  • Image collection surveyed:
    • Videos of anthopologist interviews
    • Photographs of Royal Commonwealth Society Photograph Project
    • Images of Kilise Tepe Project

These three collections were surveyed during this project as they were recommended by the repository provider as image collections that are well annotated. Generally speaking, these three collections were all annotated using only Dublic Core metadata although the DC elements used to describe each collection slightly varies, as illustrated by the following table.

Dublic Core elements Anthopologist videos Commonwealth photographs Kilise Tepe images (open archive) Kilise Tepe images (dark archive)
dc:creator “Lastname, Firstname”, More than one creator is possible “Lastname, Firstname, birthdate – deathdate”, “Initial Lastname”, “Initial Lastname”
dc:contributor N/A Yes N/A N/A
dc:coverage N/A expressed using the Getty vocabulary describes both spatial information by the name of the place and the temporal information Same as the open archive
dc:date expressed in the format of ISO 8601 data/time format, as YYYY-MM-DDThh:mm:ssTZD; more than one of this attribute are associated with each image object Same as anthopologist videos each image object is associated with date information in possibly more than one format and distinctive values Same as the open archive
dc:description more than one of this attribute are associated with the same image object, and each contains a different content Yes Yes Yes
dc:identifier N/A Yes N/A N/A
dc: format (e.g. 421699095 bytes or application/octet-stream), both the size and the physical medium information about the image objects are described the format of the image, the camera that was used to take the image, the condition under which the image was taken both the size and the physical medium information about the image objects are described Same as the open archive
dc:language e.g. en_GB or en_US, en, es Yes Yes Yes
dc:publisher N/A Yes Yes Yes
dc:relation N/A the related resource Yes Yes
dc:rights N/A e.g. “Copyright Cambridge University Library” both the owner of the image object and the status of the image (open or close) Same as the open archive
dc:source N/A the resource from which the described resource is derived N/A N/A
dc:subject Each image with multiple subject descriptions, e.g. Himalaya, interview, Nepal, Napa, and fieldwork; but inconsistency exists with the expressions Yes N/A N/A
dc:title Free text description Yes Yes Yes
dc:type e.g. “Video”, “Working Paper”, “Recording,oral”, etc Yes e.g. Table, Data, BW Image, Charts, Chart Data, Tables, Map, Drawn Image, Data Sheet BW image, Colour image, Drawn image, data sheet
Observations
  • The coverage of the metadata provided by each collection varies: The Commonwealth photographs seem to be provided with the richest metadata information among the three surveyed collections and the anthropologist videos are provided with the least rich metadata.
  • The quality of the metadata provided by each collection varies: For example, the metadata provided for the anthropologist videos are mostly expressed in a consistent format, while the dc:date provided by the Kilise Tepe project is expressed in varied format without distinguishing the multiple values associated with the same image.
  • A controlled vocabulary is not always used: In the the Commonwealth photographs collection, the dc:corage is expressed using Getty terms. This not only guarantees the consistency of its metadata but also provides the opportunity to integrate this dataset with many others that are described using the standard Getty terms. However, this practice is not applied to most of the metadata published by these three collections.
  • Inconsistent metadata representation formats are used within one institutional repository: For example, the dc:author element is expressed differently by each image collection, all of which are provided by the same institution.

Southampton

  • Meeting notes
    • The Serpent repository is currently undergoing curation and not all the metadata are available by the OAI-PMH protocol.
    • Tim has agreed to help us to export the metadata from the Access database in XML format, and exposes them by the OAI-PMH protocol.
    • Other images and their metadata that are available in the Southampton can be obtained from the ROAR registry (see http://roar.eprints.org/?action=search&query=soton&sa=Search)
    • More than one image can be uploaded into one OAI record, such as the four piglet squid images are given one OAI identifier as one record (http://archive.serpentproject.com/160/)
    • The domain-specific metadata schema for describing the images that are accessbile through the Serpernt web site has to be obtained from the Access database directly.

Survey Summary

Observation of Southampton EPrints

The image collections cannot be easily retrieved through the official Southampton EPrints web site (http://eprints.soton.ac.uk/), but through the ROAR project (http://roar.eprints.org/?action=search&query=soton&sa=Search), which provides a profile for each of its registered repository. A small collection of images, including JPEG, TIFF, EIFF, etc, are identified through ROAR’s repository profile web page:

  • ~37 JPEG images
  • ~3 TIFF images
  • ~2 MPEG videos

Most of these images are provided by the School of Art and described using DC metadata, including dc:title, dc:creator, dc:subject, dc:description, dc:publisher, dc:date, dc:type, dc:identifier, dc:format, and dc:relation

Image collections from the Serpent project

Because only domain-independent DC metadata have been published by both Cambridge and Southampton institution repositories, we are interested to learn the domain-specific metadata published for image collections. The image metadata from the Southampton Serpent project was studied for this purpose. Another reason for studying Serpent repository is because it is set up using EPrints software system, the same as the Southampton institution repository. Face-to-face discussions with the Southampton EPrints developers revealed that it is possible to publish both images using EPrints and domain-specific metadata along with these images, although the latter requires some extra administrative effort from the repository administrator. Therefore, we aim to review the metadata published in the Serpent EPrints repository in order to understand to what extent domain-specific metadata can be provided using an existing repository software system.

This review showed that both DC metadata and rich domain-specific metadata are published for image collections. The DC metadata published in Serpent includes: dc:title, dc:coverage, dc:right, dc:subject, dc:description, dc:publisher, dc:date, dc:type, dc:identifier, dc:format, and dc:relation. The domain-specific metadata in Serpent describes the following aspects: the item type, classification, behaviour, site, site description, depth, latitude, longitude, countries, habitat, etc.


OAI-PMH session notes

C:\Documents and Settings\Graham>curl "http://archive.serpentproject.com/perl/oai2?verb=ListRecords\&metadataPrefix=oai_dc"  >a1.tmp
  % Total    % Received % Xferd  Average Speed          Time             Curr.
                                 Dload  Upload Total    Current  Left    Speed
100   520    0   520    0     0  16774      0 --:--:--  0:00:00 --:--:--     0

C:\Documents and Settings\Graham>curl "http://archive.serpentproject.com/perl/oai2?verb=ListRecords&metadataPrefix=oai_dc"  >a1.tmp
  % Total    % Received % Xferd  Average Speed          Time             Curr.
                                 Dload  Upload Total    Current  Left    Speed
100 99420    0 99420    0     0  36972      0 --:--:--  0:00:02 --:--:-- 59963

C:\Documents and Settings\Graham>curl "http://archive.serpentproject.com/perl/oai2?verb=ListRecords&resumptionToken=archive/100/10791589/oai_dc"  >a2.tmp
  % Total    % Received % Xferd  Average Speed          Time             Curr.
                                 Dload  Upload Total    Current  Left    Speed
100   98k    0   98k    0     0  37666      0 --:--:--  0:00:02 --:--:-- 61330

C:\Documents and Settings\Graham>curl "http://archive.serpentproject.com/perl/oai2?verb=ListRecords&resumptionToken=archive/200/10791589/oai_dc"  >a3.tmp
  % Total    % Received % Xferd  Average Speed          Time             Curr.
                                 Dload  Upload Total    Current  Left    Speed
100   99k    0   99k    0     0  36178      0 --:--:--  0:00:02 --:--:-- 56923

Imperial College

Repositories, repository projects and imaging projects:

Microscopy and fluorescence imaging - http://www3.imperial.ac.uk/bioengineering/research/biologicalandmedicalimaging/microscopyandfluorescenceimaging

It is notable that nothing here really constitutes what one might consider to be an image repository, despite there being a lot of research that is fundamentally dependent on capturing and analyzing images.

From the project kick-off meeting, (notes here) we heard that IC have sizeable image holdings in:

  • Neurosciece - NeuroGrid project
  • Medical imaging
  • Environmental science
  • Insect database and trees
  • Geo and earth sciences
  • Plate tectonics and cracks for oil drilling
  • Astronomy

Oxford

Others

Personal tools
Oxford DMP online
MIIDI
Claros