DefiningImageAccess/RepositorySurvey
From ImageWeb
Contents |
Repository data to survey
- A summary of the progress of the current survey can be seen at DefiningImageAccess/Repository
- http://www.openarchives.org/OAI/openarchivesprotocol.html - OAI-PMH spec
Cambridge DSpace
Meetings/20070122/Dspace@Cambridge - meeting notes
- http://www.lib.cam.ac.uk/dspace/
- http://www.dspace.cam.ac.uk/
- http://www.dspace.cam.ac.uk/handle/1810/23
Collections
- OAI-PMH URIs (found by Googling for "cambridge dspace oai"):
- Anthropological ancestors set -- videos of anthopologist interviews (rich metadata, videos) hdl_1810_25:
- DefiningImageAccess/Repository/Cambridge/Anthropological ancestors set
- http://www.dspace.cam.ac.uk/dspace-oai/request?verb=ListIdentifiers&metadataPrefix=oai_dc&set=hdl_1810_25
- http://www.dspace.cam.ac.uk/dspace-oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.dspace.cam.ac.uk:1810/28
- http://www.dspace.cam.ac.uk/handle/1810/28
- Note the relationship between the OAI identifier in a specific record's metadata, and the URI
.../handle/...used to retrieve the content.
- Royal Commonwealth Society Photograph Project hdl_1810_752:
- DefiningImageAccess/Repository/Cambridge/Royal Commonwealth Society Photograph Project
- with modified DC
- Naming conform to the National Council on Archives' Rules for the construction of personal, place and corporate names.
- Location names have been taken from the Getty Thesaurus of Geographical Name.
- Kilise Tepe Project images
- DefiningImageAccess/Repository/Cambridge/Kilise Tepe Project images
- hdl_1810_31289 --- Kilise Tepe
- hdl_1810_33157 --- Kilise Tepe (dark archive)
Survey Summary
- Software system: DSpace
- Repository URL: http://www.dspace.cam.ac.uk/
- Image collection surveyed:
- Videos of anthopologist interviews
- Photographs of Royal Commonwealth Society Photograph Project
- Images of Kilise Tepe Project
These three collections were surveyed during this project as they were recommended by the repository provider as image collections that are well annotated. Generally speaking, these three collections were all annotated using only Dublic Core metadata although the DC elements used to describe each collection slightly varies, as illustrated by the following table.
| Dublic Core elements | Anthopologist videos | Commonwealth photographs | Kilise Tepe images (open archive) | Kilise Tepe images (dark archive) |
|---|---|---|---|---|
| dc:creator | “Lastname, Firstname”, More than one creator is possible | “Lastname, Firstname, birthdate – deathdate”, | “Initial Lastname”, | “Initial Lastname” |
| dc:contributor | N/A | Yes | N/A | N/A |
| dc:coverage | N/A | expressed using the Getty vocabulary | describes both spatial information by the name of the place and the temporal information | Same as the open archive |
| dc:date | expressed in the format of ISO 8601 data/time format, as YYYY-MM-DDThh:mm:ssTZD; more than one of this attribute are associated with each image object | Same as anthopologist videos | each image object is associated with date information in possibly more than one format and distinctive values | Same as the open archive |
| dc:description | more than one of this attribute are associated with the same image object, and each contains a different content | Yes | Yes | Yes |
| dc:identifier | N/A | Yes | N/A | N/A |
| dc: format | (e.g. 421699095 bytes or application/octet-stream), both the size and the physical medium information about the image objects are described | the format of the image, the camera that was used to take the image, the condition under which the image was taken | both the size and the physical medium information about the image objects are described | Same as the open archive |
| dc:language | e.g. en_GB or en_US, en, es | Yes | Yes | Yes |
| dc:publisher | N/A | Yes | Yes | Yes |
| dc:relation | N/A | the related resource | Yes | Yes |
| dc:rights | N/A | e.g. “Copyright Cambridge University Library” | both the owner of the image object and the status of the image (open or close) | Same as the open archive |
| dc:source | N/A | the resource from which the described resource is derived | N/A | N/A |
| dc:subject | Each image with multiple subject descriptions, e.g. Himalaya, interview, Nepal, Napa, and fieldwork; but inconsistency exists with the expressions | Yes | N/A | N/A |
| dc:title | Free text description | Yes | Yes | Yes |
| dc:type | e.g. “Video”, “Working Paper”, “Recording,oral”, etc | Yes | e.g. Table, Data, BW Image, Charts, Chart Data, Tables, Map, Drawn Image, Data Sheet | BW image, Colour image, Drawn image, data sheet |
Observations
- The coverage of the metadata provided by each collection varies: The Commonwealth photographs seem to be provided with the richest metadata information among the three surveyed collections and the anthropologist videos are provided with the least rich metadata.
- The quality of the metadata provided by each collection varies: For example, the metadata provided for the anthropologist videos are mostly expressed in a consistent format, while the dc:date provided by the Kilise Tepe project is expressed in varied format without distinguishing the multiple values associated with the same image.
- A controlled vocabulary is not always used: In the the Commonwealth photographs collection, the dc:corage is expressed using Getty terms. This not only guarantees the consistency of its metadata but also provides the opportunity to integrate this dataset with many others that are described using the standard Getty terms. However, this practice is not applied to most of the metadata published by these three collections.
- Inconsistent metadata representation formats are used within one institutional repository: For example, the dc:author element is expressed differently by each image collection, all of which are provided by the same institution.
Southampton
- Links:
- SERPENT (Serpent images of deep-sea organisms (rich metadata, attactive, unusual):
- DefiningImageAccess/Repository/SERPENT
- http://archive.serpentproject.com/
- http://serpent.eprints.org/
- http://serpent.eprints.org/284/ - piglet squid image
- http://serpent.eprints.org/perl/oai2?verb=Identify
- http://serpent.eprints.org/perl/oai2?verb=ListRecords
- http://serpent.eprints.org/perl/oai2?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:GenericEPrints:284 - this last one, GetRecord with identifier, returns an error (something about regexp not matched). The date-range selection on ListRecords doesn't seem to work consistently. ListRecords returns 100 entries at a time. Selection sets are not given any meaningful naming. For the time being, it seems we're stuck with ListRecord.
- It appears that the Serpent data has rich metadata other than DC (see an example description about a Piglet Squid), but it's not clear what prefixes we have to use to get at that data.
- OAI-PMH URIs:
- Meeting notes
- The Serpent repository is currently undergoing curation and not all the metadata are available by the OAI-PMH protocol.
- Tim has agreed to help us to export the metadata from the Access database in XML format, and exposes them by the OAI-PMH protocol.
- Other images and their metadata that are available in the Southampton can be obtained from the ROAR registry (see http://roar.eprints.org/?action=search&query=soton&sa=Search)
- More than one image can be uploaded into one OAI record, such as the four piglet squid images are given one OAI identifier as one record (http://archive.serpentproject.com/160/)
- The domain-specific metadata schema for describing the images that are accessbile through the Serpernt web site has to be obtained from the Access database directly.
Survey Summary
- Software system: EPrints
- Repository URL: http://eprints.soton.ac.uk/; http://archive.serpentproject.com/; http://roar.eprints.org/?action=search&query=soton&sa=Search
- Image collection surveyed:
- A small collection from the EPrints institution repository
- Serpent project
Observation of Southampton EPrints
The image collections cannot be easily retrieved through the official Southampton EPrints web site (http://eprints.soton.ac.uk/), but through the ROAR project (http://roar.eprints.org/?action=search&query=soton&sa=Search), which provides a profile for each of its registered repository. A small collection of images, including JPEG, TIFF, EIFF, etc, are identified through ROAR’s repository profile web page:
- ~37 JPEG images
- ~3 TIFF images
- ~2 MPEG videos
Most of these images are provided by the School of Art and described using DC metadata, including dc:title, dc:creator, dc:subject, dc:description, dc:publisher, dc:date, dc:type, dc:identifier, dc:format, and dc:relation
Image collections from the Serpent project
Because only domain-independent DC metadata have been published by both Cambridge and Southampton institution repositories, we are interested to learn the domain-specific metadata published for image collections. The image metadata from the Southampton Serpent project was studied for this purpose. Another reason for studying Serpent repository is because it is set up using EPrints software system, the same as the Southampton institution repository. Face-to-face discussions with the Southampton EPrints developers revealed that it is possible to publish both images using EPrints and domain-specific metadata along with these images, although the latter requires some extra administrative effort from the repository administrator. Therefore, we aim to review the metadata published in the Serpent EPrints repository in order to understand to what extent domain-specific metadata can be provided using an existing repository software system.
This review showed that both DC metadata and rich domain-specific metadata are published for image collections. The DC metadata published in Serpent includes: dc:title, dc:coverage, dc:right, dc:subject, dc:description, dc:publisher, dc:date, dc:type, dc:identifier, dc:format, and dc:relation. The domain-specific metadata in Serpent describes the following aspects: the item type, classification, behaviour, site, site description, depth, latitude, longitude, countries, habitat, etc.
OAI-PMH session notes
C:\Documents and Settings\Graham>curl "http://archive.serpentproject.com/perl/oai2?verb=ListRecords\&metadataPrefix=oai_dc" >a1.tmp
% Total % Received % Xferd Average Speed Time Curr.
Dload Upload Total Current Left Speed
100 520 0 520 0 0 16774 0 --:--:-- 0:00:00 --:--:-- 0
C:\Documents and Settings\Graham>curl "http://archive.serpentproject.com/perl/oai2?verb=ListRecords&metadataPrefix=oai_dc" >a1.tmp
% Total % Received % Xferd Average Speed Time Curr.
Dload Upload Total Current Left Speed
100 99420 0 99420 0 0 36972 0 --:--:-- 0:00:02 --:--:-- 59963
C:\Documents and Settings\Graham>curl "http://archive.serpentproject.com/perl/oai2?verb=ListRecords&resumptionToken=archive/100/10791589/oai_dc" >a2.tmp
% Total % Received % Xferd Average Speed Time Curr.
Dload Upload Total Current Left Speed
100 98k 0 98k 0 0 37666 0 --:--:-- 0:00:02 --:--:-- 61330
C:\Documents and Settings\Graham>curl "http://archive.serpentproject.com/perl/oai2?verb=ListRecords&resumptionToken=archive/200/10791589/oai_dc" >a3.tmp
% Total % Received % Xferd Average Speed Time Curr.
Dload Upload Total Current Left Speed
100 99k 0 99k 0 0 36178 0 --:--:-- 0:00:02 --:--:-- 56923
Imperial College
- Meeting held on 15 June 2007 .
- Meeting notes at Meetings/20070615/DefiningImageAccess-ImperialCollege.
Repositories, repository projects and imaging projects:
- spir@l (http://www3.imperial.ac.uk/library/digitallibrary/digitalrepository) - A College-wide Digital Repository, known as spir@l, using DSpace software, is currently being designed to hold the College's research output. The project, led by the Library, working in close collaboration with ICT, has been funded centrally until July 2008 to carry out the preliminary tasks of set up, design and configuration of the system to hold electronic copies of academics' publications.
- Links to subject areas: http://www3.imperial.ac.uk/library/digitallibrary/weblinks - these appear to be library-curated links to external web resources. The following examples have been highlighted to us:
- Biosciences – example, Molecular Biology and Genetics Database (RCSB PDB)
- Chemistry – example, Chemical Data, eMolocules
- Earth Sciences – example, BUBL links
- Physics – labs (CERN, ESO) and preprints (arXiv.org)
- Using Oracle to create an institutional repository (for images, or papers, or both?)
- Centre for Integrative Systems Biology at Imperial College - CISBIC
- Imaging and Medical Robotics, http://www3.imperial.ac.uk/biomedeng/research/medicalimaging
- FILM, the Facility for Imaging by Light Microscopy - not including coordinated facilities for archival or republication. The facility has a file server to copy image files between microscopes, personal computers and analysis workstations: the server is only to be used as a scratch disk to transfer data, not for data storage.
- Earth/Geosciences imaging:
- High-resolution sonar for shallow water imaging - http://www3.imperial.ac.uk/earthscienceandengineering/research/geophysics/seafloorimaging/shallowwaterimaging/
- Using seabed bathymetry to image inversion structures on the English Channel shelf - http://www3.imperial.ac.uk/earthscienceandengineering/research/geophysics/seafloorimaging/englishchannel/
- Time lapse side-scan sonar imaging of bleached coral reefs (Seychelles) - http://www3.imperial.ac.uk/earthscienceandengineering/research/geophysics/seafloorimaging/tropicalhabitats/
- Flexure of the Canary Islands - http://www3.imperial.ac.uk/earthscienceandengineering/research/geophysics/earthstructure/canaryislands/
- Medical Imaging:
- Prostate ultrasound 3D imaging - http://www3.imperial.ac.uk/mechatronicsinmedicine/projects/theprobot/operationalstrategy/imaging/
Microscopy and fluorescence imaging - http://www3.imperial.ac.uk/bioengineering/research/biologicalandmedicalimaging/microscopyandfluorescenceimaging
- Computer and image aided surgery - http://www3.imperial.ac.uk/bioengineering/research/biologicalandmedicalimaging/computerandimagingaidedsurgery
- Image Analysis:
- http://www3.imperial.ac.uk/people/a.bharath
- Bioengineering Vision Research Group - http://www.bg.ic.ac.uk/Research/Vision
It is notable that nothing here really constitutes what one might consider to be an image repository, despite there being a lot of research that is fundamentally dependent on capturing and analyzing images.
From the project kick-off meeting, (notes here) we heard that IC have sizeable image holdings in:
- Neurosciece - NeuroGrid project
- Medical imaging
- Environmental science
- Insect database and trees
- Geo and earth sciences
- Plate tectonics and cracks for oil drilling
- Astronomy
Oxford
- Meeting held on 3 May 2006 - see http://antiparos.zoo.ox.ac.uk/calendar/view_entry.php?id=878&date=20070503
- Meeting notes at Meetings/20070503/DefiningImageAccess-SERS-Oxford.
- Fallback for Oxford image collections, or maybe as well: digital images part of what was (is?) Oxford Digital Archive of Flora Graecae
- http://www.ouls.ox.ac.uk/isbes/taxonomic_collections/flora_graeca_in_the_21st_century (Talk to Neil Jeffreys, esp about metadata access.)
Others
- http://wiki.dspace.org/index.php/OaiInstallations --- a list of OAI access points, esp. the University of Wales Aberystwyth publishes domain-specific metadata by using prefix other than "oai_dc". see http://cadair.aber.ac.uk/dspace-oai/request?verb=ListMetadataFormats.

