DefiningImageAccess/ProjectPlan
From ImageWeb
Defining Image Access Project Plan
Defining Image Access: Requirements for interoperable discovery and delivery of image data stored in DSpace, EPrints and Fedora-based institutional repositories using a data web approach.
Project duration: 1 Jan 2007 to 30 June 2007.
JISC project page: http://www.jisc.ac.uk/whatwedo/programmes/programme_rep_pres/defining_image_access.aspx
This document: http://imageweb.zoo.ox.ac.uk/wiki/index.php/DefiningImageAccess/ProjectPlan
Project plan submitted to the JISC on 29 January 2007: http://imageweb.zoo.ox.ac.uk/wiki/index.php?title=DefiningImageAccess/ProjectPlan&oldid=2202. (See page history tab for subsequent changes.)
The Defining Image Access Project is a short six-month requirements analysis project to investigate what is required to develop and provide discovery and delivery interoperability for image data held in DSpace, EPrints and Fedora-based institutional repositories, the three main open-source software systems used within the UK HE/FE sector, using a data web approach.
Funded from the Discovery to Delivery strand of the JISC Repositories and Preservation Programme
Contents |
Overview of project
Background
Data webs are a new concept in storage and integration of digital information recently proposed by the Applicant and his colleague Graham Klyne. They involve the distributed web publication of data and accompanying metadata, together with lightweight harvesting and aggregation of the metadata describing these data into central searchable RDF metadata registries that permit discovery, and provide direct links back to the original data sources to allow delivery. Data webs provide focused domain- or subject-specific implementations to meet defined data publication, access, integration and meta-research need. As such, they offer a step towards Tim Berners-Lee’s vision of the World Wide Web as a global ‘web of data’, the Semantic Web.
Aims and objectives
We aim to develop strategies and software designs for an image web to discover published images of all types and subjects across heterogeneous respositories, based on examination of institutional repositories at Oxford, Southampton, Cambridge and Imperial College (based variously on Fedora, ePrints and Dspace).
We will study the existing repository software and metadata schemata to learn how they handle images and image metadata, with particular reference to granularity and detail of available information and how they are exposed for potential harvesting. We anticipate that our design will use Semantic Web techniques to uncover co-references in heterogeneous repositories, and use these as a basis for building cross-references between them.
Guiding principles:
- To work with existing repository and metadata formats as we find them.
- To design for use of existing software tools wherever possible.
- To design for use of existing metadata standards wherever possible.
- To avoid creating new information resources when adequate resources already exists.
- To leave control of respository content and access firmly with the publishing institution. We aim to work with whatever information they choose to make publicly available, allowing each publisher to balance their costs and benefits of making metadata freely available.
- To maintain full visibility of the existing repositories, leading users back to the sources rather than acting as a viewport through which they may be accessed.
- To minimize the amount of metadata that is harvested and aggregated, consistent with achieving our goals.
- To work as far as possible within the World Wide Web architectural framework.
- To design around the use of lightweight web application technolologies, with loose coupling between existing systems and maximum opportunity to replace or update any element of the technology used.
- To look toward further developments (not within the scope of this project) to create data webs by means of which images held within institutional repositories can be made cross-searchable with locally published research images and with images held by journal publishers, museums, other institutions and public databases (e.g. bioinformatic databases).
Motivation
Our image web work is motivated by problems in post-genomic life science research. However, this JISC funded project is not so limited, and is intended in part to explore what kinds of image resources may be available from institutional repositories. Through separate projects, we are also pursuing data web activities related to online journals and other digital collections, laboratory research user requirements, capturing image metadata as an integral part of research workflows and cooperative construction of metadata schema (ontologies); these other activities are outside the scope of the Defihning Image Access project.
Gene expression experiments involve creating microscopic images of parts of organisms to which genetic techniques have been applied to make visible those regions in which a chosen gene is active (expressed). This information, in turn, helps researchers to understand how the genetic mechanisms interact with other life processes to guide the development of an organism, for example by showing how gene expression patterns vary between different mutations of a given reference organism. Producing these in situ gene expression images can be very time consuming, and a single image might be used in different lines of research. We anticipate that in the near future, such images generated in the course of a research project will be published online with sufficient metadata to interpret the image (e.g. organism, genetic strain, observed gene, developmental stage, etc.).
A researcher exploring factors causing sterility in Drosophila may create a number of in situ images of gene expression in the testes, with a primary goal of studying sperm development. Another researcher may be interested in distribution patterns of gene expression products within a cell for study of internal cell transport mechanisms, for which some of the same images might contain useful information. But how is such a researcher to discover that the image even exists? Our Image Data Web aims to support using a single search operation to find, say, images of expression patterns for aly genes in Drosophila melanogaster that may be stored in various repositories with different but overlapping descriptive metadata. To do this requires some mechanisms to access and partially match metadata from different images sources, and locate those images that have associated metadata meeting some given criteria.
There are many unknowns about the ways in which institutional repositories deal with image collections and associated metadata; our work in this project is to survey a sampling of such image collections and associated software systems, and draw some conclusions about what can be achieved now and in the foreseeable future to find useful research images from available repositories.
Overall approach
Our project is predicated on the notion that by using widely deployed web software components and ideas, we can short-circuit many of the development complexities that escalate the cost and deployability of many information-sharing systems. We are entering a time of availability of several highly developed and work-hardened Semantic Web software tools, in addition to the many widely used and scalable web server and content management systems. We aim to focus our design efforts on information design and software selection rather than software design.
To achieve our goals, we need to build a solid understanding of existing open source repository systems (which are themselves examples of established web-based data management systems), with particular attention paid to their data handling capabilities, and also to the specific metadata schemas that are used by deployed systems. We will also conduct a detailed (though not necessarily exhaustive) survey of software tools that might be used in the construction of an image data web. When we have gathered and analyzed this information, which we hope will be generally useful to the repository community, we will propose a design and implemenation plan primarily geared to informing a follow-on development and deployment project. Extensive involvement with the project's consultant partners is a key element of our plan.
Alongside the primary means noted, we will also use experimentally a selection of collaborative web-based tools for managing the project and gathering information generated. This will give us insights into the style and possibilities for new web-based styles of collaborative working that can inform our application of web ideas to repository access. (The tools selected to date include: Semantic Media Wiki, Drupal web content management system and WebCalendar. These are all sufficiently stable that we can depend on their basic operations for conducting the project, while experimenting with the possibilities they offer.)
Important issues to be addressed include:
- Minimizing the technical impact that a data web system will have on the systems currently used by repository providers or their users (though we do, of course, aim to influence their work patterns for the better)
- Understanding the metadata available from the repositories surveyed
- Identifying technical mechanisms for working with established repositories
- Selecting or designing components that are easily adapted to specific information requirements
- Adopting an approach that is compatible with widely deployed web software systems
- Designing a core metadata schema (ontology) that can be used to plan and answer queries that operate across the repisitories considered.
The current project is scoped to explore discovery of and access to images in repositories operated by the project's repository partners, by surveying software systems, available access/query mechanisms and metadata schemas used. We do not aim to conduct an exhaustive survey of software tools, the goal being to identify some that are suitable, and why, rather than declaring an overall winner. While we may undertake some limited piloting as part of our evaluation work, a demonstrable pilot system is not a goal for this project.
Critical success factors include:
- Understanding the capabilities of the designated repository software systems.
- Understanding the metadata schemas used and presented by the selected repositories.
- Understanding the specific mechanisms available for querying and accessing content in the selected repositories.
- Identifying actual and potential use of metadata standards.
- Identifying open standards and open source tools that can handle the "heavy lifting" for an image data web implementation.
- Devising a core metadata schema that can support a range of identified queries across multiple repositories.
- Devising a costable plan for implementing an image data web.
- Ensuring consistency of our recommendations with the JISC's strategy for repository development.
Project outputs
The Defining Image Access Project’s deliverable will be a Project Report that will (a) detail the findings and conclusions from our investigations, (b) recommend best practices that should be supported by the JISC and adopted to enhance image interoperability between institutional repositories, (c) provide implementation guidelines for the creation of data webs, for use by those running institutional repositories, and (d) identify existing open source software systems that can provide elements of the desired data web functionality.
The final report will include recommendations and observations concerning:
- Repository software systems.
- Metadata standards.
- Metadata publication for image discovery.
- Web standards.
- What to harvest versus what to access on-demand from source repositories.
- An outline proposal for the instantiation and ongoing support of our design for such an image web, to be made freely available for the benefit of the UK academic community.
We will also create a project web site, a wiki and an e-mail discussion list, to keep all interested parties fully informed of the project’s work and progress as it develops. As part of the project, we plan to hold workshops at which we can bring experts together for seminars and more informal exchanges, and we anticipate that such seminar presentations will, with the authors' permissions, be made available on the wiki.
Project outcomes
We will determine what is required to expose appropriate image metadata in such a manner that these can be harvested, to marshal these into a data web metadata registry where they can be indexed and made cross-searchable, and to provide links back to the repositories for retrieval/delivery of the original images.
This project will lay foundations both for an anticipated subsequent JISC follow-up project in which the vision expressed here for image interoperability between institutional repositories can be realized, and also for continuing work on our broader vision of data webs that will, for the first time ever, render published research images held in institutional repositories interoperable and cross-searchable with those held in on-line journals, museum collections and elsewhere.
Stakeholder analysis
| Stakeholder | Interest / stake | Importance |
|---|---|---|
| Researchers using images | Ability to easily locate images stored in a range of institutional repositories, with sufficient supporting information to properly interpret their content | Allowing new lines of research based on existing published observations; reducing future research costs by re-using hard-won observational data |
| Researchers publishing images | Online publication of images | Facilitating re-use of observations for the independent verification of conclusions drawn, and as a basis for additional lines of research, enhancing the value and reputation of the original work |
| Institutional repository managers and operators | Understanding better how to serve the needs of research users effectively and economically; creating a framework for long-term preservation of research image data | Improving the cost/benefit in repository provision; enhancing institutional visibility by providing additional access to valued resources |
With reference to stakeholders, it was observed during the kick-off meeting: not everything happens in a silo. Value may come by looking across interests. More generally, the ability to create new facilities (e.g. mashups) may be of value to a wider audience - as-yet unidentified stakeholders developing new and emerging applications.
Risk analysis
Identified risk factors include the following:
| Risk | Probability | Severity | Description and mitigation |
|---|---|---|---|
| Staffing - loss of key staff | Low | High | The project goals are well understood by all members of the core team, and we believe that all have sufficient grasp of the key elements to make continued progress if one project member should become unavailable. For DMS, this project is the starting point for realizing the data web vision. He is committed to it and is relatively free of other academic (teaching) commitments during its timespan. For GK, this project is very close to his personal ambitions to gain experience of building real applications using Semantic Web ideas. JZ has indicated that the project topic and location is well suited to her professional and personal aspirations. Using the web facilities to lock in knowledge gained as we conduct the project reduces the risk of losing all associated knowledge in the event of departure of a team member. By working with a number of partner institutions, and maintaining frequent contacts, some of the knowledge gained should be dispersed beyond the core team. |
| Organisational | Low | Low | The core team is small and co-located, so we don't anticipate organizational problems there. We are dependent on input from our project partners, and to gain this we are working to involve them early in the project and to be responsive to their interests and concerns; in most cases, we have more than one contact in each partner institute. We already control most of the technical resources we need to conduct this project, other than those (e.g. network, power, etc.) whose failure would have implications far more serious than for just our project. |
| Technical | Low | Low | In some sense, this requirements and evaluation project is a risk mitigation exercise for a larger development project that we hope will follow. Our purpose is to build a sufficient understanding of the disparate technologies we propose to employ before we are committed to their use in a more substantial project. |
| External suppliers | Low | Low | We are dependent on the project partners to provide information about their repositories and associated technologies and practices. We will also depend on being able to identify suitable open source software with which to build our envisioned image data web. The unlikely event of failure on either score might be viewed as preventing a more costly failure of a larger development project. |
| Legal | Low | - | We don't anticipate any legally related risks. We will, for the most part, be working with information that our partners choose to make publicly available, we are using open source software for most of our project activities, and it is our goal that the results of our work will not be subject to any confidentiality agreements. There is no foreseeable risk to life, limb or well-being of any person in the work we are proposing to undertake. |
| Work permit | Med | High | Jun Zhao needs a work permit, whose issue might be delayed by bureaucratic processes. Of all of those who applied to work on this project, Jun clearly had the most relevant background and skills for this work, so we don't anticipate problems on that score. The work permit application is well under way, and we have good grounds to expect minimum impact on this project due to delays in processing it. [2007-01-05: The Home Office has approved our application for Jun's work permit .] |
| Office move | Med | Med | There is a foreseeable possibility that we will be required to move our office during the course of this project, which could cost us 2-4 weeks of lost productivity. |
| Equipment failure | Med | Med | We are planning to use available equipment (mainly computers and network connections) for this project. Failure of any of the key systems could expose us to delays or worse. The main web server is running on a virtual machine, which we believe could be transferred to another available host with minimal loss or disruption to information, though there would be some loss of productivity through the transition. The server is automatically backed up daily (though we have not yet attempted a restoration from these backups). |
Standards
It is our firm intention that image data webs will be based to the maximum extent possible on open standards, and in particular we aim to maximize our use of widely deployed web standards.
One of the purposes of this project is to identify the full range of appropriate standards to use in a future development. A full analysis of standards to be used will be an output from rather an input to this project. We trust our partners will help us in identifying and evaluating any JISC-recommended standards that we should be considering.
Some specific standards that we expect to figure in our analysis include:
| Specification | Version | Notes |
|---|---|---|
| HTTP | 1.1 | http://www.ietf.org/rfc/rfc2616.txt |
| XML | 1.1 | http://www.w3.org/TR/REC-xml/, http://www.w3.org/TR/xml11/ |
| RDF | 1.0 | http://www.w3.org/2001/sw/RDFCore/ (links to specification set) |
| SPARQL | (in draft) | http://www.w3.org/TR/rdf-sparql-query/ |
| Z39.50 | (2003?) | http://www.loc.gov/z3950/agency/Z39-50-2003.pdf |
| SRU/SRW | 1.1 | http://www.loc.gov/standards/sru/sru-spec.html, http://www.loc.gov/standards/sru/srw/index.html |
| OAI-PMH | 2.0 | http://www.openarchives.org/OAI/openarchivesprotocol.html |
Technical development
This is a requirements gathering, scoping and evaluation project, rather than a development.
We will be guided by an eventual intention to use an agile development approach based on Extreme Programming, some basic tenets of which are (a) incremental development in small demonstrable steps, (b) use of automated unit tests to avoid regression when code is changed.
Intellectual property rights
We intend that this project should create a collation of non-confidential information that will be freely available for the benefit of the UK academic community. Our purpose is to facilitate sharing of academic research data, a goal that is likely to be compromised if the tools to achieve this are not themselves open and freely sharable. We intend to focus our software evaluation on packages that are open source.
We further intend that any further software development will be open-sourced.
An as-yet unanswered question is whether we should prefer a "viral" open source licence like the GNU Public Licence, or a more flexible form of licence that permits closed-source derivative works. Fundamentally, we don't want licensing concerns to get in the way of our goals of improving sharing of research images. It may be that the form of licence may be determined by the choice of software used to build an image data web. In considering the possible licences, we will consult with JISC's OSS-Watch advisory service, with whom we have good contacts. The conclusions of such consideration will be included in our final report.
Project Resources
Project Staff
David Shotton (Principal Investigator), mailto:david.shotton@zoo.ox.ac.uk.
Graham Klyne (Project Manager), mailto:graham.klyne@zoo.ox.ac.uk.
Jun Zhao (Research Officer, from March 2007), mailto:jun.zhao@zoo.ox.ac.uk (until 1st March mailto:zhaoj@cs.man.ac.uk).
All members of the Image Bioinformatics Research Group, Department of Zoology, University of Oxford.
Postal address: Department of Zoology, University of Oxford, South Parks Road, Oxford OX1 3PS, United Kingdom.
Project Consultant Partners
UKOLN, University of Bath Julie Allinson (Project Consultant), mailto:j.allinson@ukoln.ac.uk; Digital Repositories Programme Support Team, UKOLN, University of Bath
Independent consultant
Dan Brickley (Project Consultant), mailto:danbri@danbri.org; Independent Semantic Web Consultant
CCLRC e-Science Centre, Rutherford Laboratory
Brian Matthews, mailto:B.M.Matthews@rl.ac.uk; CCLRC e-Science Centre
Alistair Miles, mailto:A.J.Miles@rl.ac.uk; CCLRC e-Science Centre
Repository and library services consultant partners
Cambridge University
Patricia Killiard, mailto:pk219@cam.ac.uk; Director of DSpace@Cambridge; Head of Electronic Services and Systems, Cambridge University Library
Tom De Mulder, mailto:tdm27@cam.ac.uk; Systems Administrator for DSpace@Cambridge; Cambridge University Computing Service
Imperial College
Clare Jenkins, mailto:c.jenkins@imperial.ac.uk; Director of Library Services, Imperial College Library
Yiota Polydoratou, mailto:p.polydoratou@imperial.ac.uk; Researcher and StORe Project Officer, Imperial College Library
John Darlington, mailto:j.darlington@imperial.ac.uk; Director, Imperial College Internet Centre
Dolores Iorizzo, mailto:d.iorizzo@imperial.ac.uk; Arts, Humanities and Cultural Heritage Co-ordinator, Imperial College Internet Centre
Oxford University
Sally Rumsey, mailto:sally.rumsey@ouls.ox.ac.uk; Project Manager of the Institutional Repositories, Oxford University Library Services (OULS)
Neil Jeffferies, mailto:neil.jefferies@sers.ox.ac.uk; Acting Development Manager & IT Strategy Coordinator, Oxford University Library Services (OULS)
Southampton University
Jessie Hey, mailto:jmnh@ecs.soton.ac.uk; EPrints Repository, University of Southampton Libraries; and School of Electronics and Computer Science
Les Carr, mailto:lac@ecs.soton.ac.uk; EPrints Repository software developer; and Senior Lecturer, School of Electronics and Computer Science
Adam Field, mailto:af@ecs.soton.ac.uk; EPrints Repository Liaison/Systems Officer, University of Southampton Libraries
Other consultant partners
Oxford University Computing Services (OUCS)
Michael Fraser, mailto:mike.fraser@oucs.ox.ac.uk; Co-ordinator, Research Technologies Service and Director of Intute Arts and Humanities, Oxford University Computing Services
Howard Noble, mailto:howard.noble@oucs.ox.ac.uk; Project Manager of JISC ASK Project, Oxford University Computing Services
Peter Robinson, mailto:peter.robinson@oucs.ox.ac.uk; Director of OxCLIC, Research Technologies Group, Oxford University Computing Services
Oxford e-Research Centre (OeRC)
Anne Trefethen, mailto:anne.trefethen@oerc.ox.ac.uk; Director, Oxford e-Research Centre
David Wallom, mailto:david.wallom@oerc.ox.ac.uk; Technical Manager, Oxford e-Research Centre
Project management
This is a purely investigative project by a small core team, so formalization of reporting and decisions is unlikely to be helpful. Where progress is not achieved by team consensus, definitive direction will come from the PI; all core team members are contractually reporting to the PI.
The framework for conduct and management of the project is intended to facilitate free and open communication between the core project team and the consultant partners using Web-based tools. To this end, we start with a dedicated system (http://imageweb.zoo.ox.ac.uk/) whose role is to capture and communicate information about the project itself and the topics of investigation. Initial facilities provided for this purpose include a semantic wiki, a group blog system and a mailing list system with web-accessible message archives. These facilities may be expanded as needs arise.
Training needs for this project will be handled internally, and consist mainly of instruction in using the project management and co-ordination systems.
Programme support
JISC Programme Manager: Balviar Notay (mailto:b.notay@jisc.ac.uk)
This project has been conceived from a position of exploiting common and widely deployed Web technologies to improve access to research image data. We are aware that there is much work by members of the JISC community that address access to image data, emphasizing database and library system technologies, with which we are becoming familiar. We aim to use Web techniques to improve access to these more traditional repository systems, and we will look to our consulting partners to help us deepen our understanding of those systems, so we can better devise approaches for exposing them to a wider web audience.
No areas of formal support from the JISC Programme Manager have been identified, but she has been requested to convene a meeting with the leaders of other relevant JISC-funded projects concerned with image data.
Budget
Start date: 01/01/2007, Duration: Six months
| Direct project costs | Full economic costs | 80% contribution from JISC |
|---|---|---|
| Staff | £69,314 | £55,451 |
| Consumables | £500 | £400 |
| Travel | £3,600 | £2,880 |
| Meetings | £1,800 | £1,440 |
| External consultant fees | £3,525 | £2,820 |
| Sub-total | £78,739 | £62,991 |
| Donations in kind from consultant partners | £37,653 | |
| TOTAL | £116,392 |
.
Detailed project planning
Workpackages
- Project plan outline in HTML format - use this link to view a Gantt chart.
- Project plan in 'planner' format - this is an XML format used by the 'Planner' software (http://live.gnome.org/Planner, http://winplanner.sourceforge.net/).
(The project plan has been converted to use GanttProject (http://ganttproject.biz/) for resource analysis purposes; unfortunately, the conversion program has retained the task start/end dates while ignoring the project non-working days, so actual effort estimates on the GanttProject plan are incorrect. Hence the Planner version has been left here, even though the day-to-day working copy is using GanttProject.)
Notes:
- A particularly important activity, identifying search themes, is somewhat buried in WP3. At the project kick-off meeting, it was recognized that some documented use cases would be generally helpful to underpin meaningful dialogue between project partners.
- The core metadata schema work is distributed across three different work packages: related work survey (WP1.d) to identify available schema, institutional repository surveys (WP3) to collect and analyze information about actual repository metadata schemas used, and core ontology design (WP5) to design a schema based on these analyses.
WP1: Project management and administration
Infrastructure for project activities, communication between project partners and visibility of progress with respect to intended goals.
| Activity | Start date | End date | Responsibility | Outputs |
|---|---|---|---|---|
| (a) Project plan | 2007-01-01 | 2007-01-03 | GK | Prepare project plan |
| (*) Project plan to JISC | 2007-01-31 | 2007-03-01 | DS | Project plan in wiki and submitted to JISC |
| (b) Project management | 2007-01-08 | 2007-06-22 | GK | Ongoing at 0.5 day/week. |
| (c) Project infrastructure (extras) | 2007-04-02 | 2007-04-06 | GK | Virtual host system, Linux base system, automatic security patching, security review, HTTP server, mail server, mailing list server, web site (Semantic Media Wiki), CMS for group blog (Drupal). Also, if needed, version management (SVN), Project tracking/ticket, Group calendar. |
| (d) Related work survey | 2007-01-15 | 2007-02-07 | GK | Survey related projects, papers and proposals suggested by project partners; summarize on web site. (The summary is not intended to be a detailed technical analysis of each, but an overview sufficient to indicate possible relationship to this project.) |
| (e) Kick-off meeting | 2007-01-05 | 2007-01-05 | DS, GK, JZ | Meeting of project participants to discuss plans and goals, and set the scene for on-site visits to gather more detailed requirements and technical details. |
| (f) Tools and technologies workshop | 2007-02-09 | 2007-02-09 | DS, GK, JZ | Workshop for technical discussion and assessment of potential tools and technologies of relevance to data webs. |
| (g) JISC interactions meeting | 2007-03-09 | 2007-03-09 | DS, GK, JZ | Consider interaction of data webs with Intute and other JISC initiatives; establish requirements for data webs to be effective in augmenting the achievements of other JISC repository initiatives. Application profile work. |
| (h) Prepare interim progress report | 2007-03-26 | 2007-03-30 | DS, GK | Preparation of progress report for the JISC. |
| (*) Interim progress report to the JISC | 2007-03-31 | 2007-03-31 | DS | Submission of progress report to JISC |
| (i) Evaluation | 2007-06-18 | 2007-06-22 | GK, DS | Evaluate results according to selected criteria |
| (j) Draft report | 2007-06-04 | 2007-06-29 | DS, GK, RH, DB | Prepare initial draft of final report. Circulate for review by project partners. Collect feedback and update draft report as appropriate. |
| (*) Submit draft report to the JISC | 2007-06-30 | 2007-06-30 | DS | Submit draft of final report to JISC |
| (k) Draft completion report | 2007-06-25 | 2007-06-29 | DS, GK, JZ | Draft completion report for the JISC |
| (*) Submit completion report | 2007-06-30 | 2007-06-30 | DS | Submit completion report to the JISC |
| (l) Final report | 2007-07-09 | 2007-07-13 | DS, GK, JZ | Apply final updates to project report |
| (*) Submit final report | 2007-07-31 | 2007-07-31 | DS | Submit final report to the JISC |
WP2: Repository software evaluation
Determine capabilities of the various repository software packages with respect to media handling and metadata publication, and gather information about metadata schemas used by different systems. We intend to perform a trial installation of each of the repository systems, but we may be able to gather much background information from previous work, and trim the actual evaluation process.
| Activity | Start date | End date | Responsibility | Outputs |
|---|---|---|---|---|
| (a) Design initial schema for evaluation | 2007-02-19 | 2007-02-23 | GK, JZ | Create a schema (ontology) for repository evaluation criteria, which can be used as a basis of creating a formal or semi-formal structure for recording evaluation observations, alongside informal and unstructured observations. Ideally, this will codify some basic information about repositories that will be available concerning each of the repositories examined, thus providing some objective basis for comparison. |
| (b) Repository partner visits | 2007-02-26 | 2007-04-20 | DS, GK, (JZ) | Project team visits repository partners to gather more detailed information about the repository systems and metadata schemas used. Record notes about each (formal and informal), and create a draft report in the web site. |
| (c) Locate repository software | 2007-02-19 | 2007-02-23 | GK | Locate Eprints, Dspace and Fedora open source software packages. Select suitable versions and variants (if any) for evaluation. |
| (d) Evaluate Eprints software | 2007-03-05 | 2007-03-21 | GK | Obtain, install and experiment with Eprints repository software; gather initial information into wiki about ease of installation and configuration, general capabilities, media handling, metadata collection and storage, metadata granularity, content and metadata-based queries, metadata export. (Allow 1 week for initial evaluation) |
| (e) Evaluate Dspace software | 2007-03-21 | 2007-04-06 | GK | Obtain, install and experiment with Dspace repository software - similar covereage as item (d). (Allow 1 week for initial evaluation) |
| (f) Evaluate Fedora software | 2007-04-09 | 2007-04-25 | GK | Obtain, install and experiment with Fedora repository software - similar covereage as item (d). NOTE: Oxford are using a commercial variant of Fedora; this initial investigation will focus on the open source software, but in discussion with our Oxford repository partner, we will aim to uncover any differences between the open and commercial versions. (Allow 1 week for initial evaluation) |
WP3: Repository metadata characterization
Analyze the metadata content of the partner repositories, identify common and differing approaches to metadata provision.
| Activity | Start date | End date | Responsibility | Outputs |
|---|---|---|---|---|
| (a) Design initial schema for evaluation | 2007-02-19 | 2007-03-02 | GK, JZ | Create a schema (ontology) for metadata evaluation criteria, which can be used as a basis of creating a formal or semi-formal structure for recording evaluation observations, alongside informal and unstructured observations. |
| (b) Enumerate repository schemas | 2007-03-05 | 2007-03-13 | JZ | Gather and organize information about metadata schemas used natively by each of the repository partners. Explore what (if any) additional metadata could be easily served if it was available |
| (c) Compare repository schemas | 2007-03-13 | 2007-03-21 | JZ | Analyze the repositry schemas and identify key similarities and differences. Organize the available metadata into categories including (but not limited to): repository description, image description, domain-independent subject description; domain-specific subject description. Look for recurring patterns across repository schemas |
| (d) Enumerate standard schemas | 2007-03-21 | 2007-03-29 | JZ | Gather and organize information about public standard metadata schemas that are used by or relate to the metadata used natively by the repositories. |
| (e) Identify search themes | 2007-01-15 | 2007-02-23 | DS, GK, JZ | Identify or propose a number of possible search themes, in discussion with operators (and preferably also some users) of the various repositories. This should include a number of typical or exemplar queries that repository users could wish to perform.
Note: this activity has been moved forward, ahead of the rest of the work package, following discussion at the project kick-off meeting. |
WP4: Aggregation tool evaluation
Select and evaluate a range of tools that might be used to provide some or all of the metadata collection/aggregation functionality of the data web.
| Activity | Start date | End date | Responsibility | Outputs |
|---|---|---|---|---|
| (a) Design initial schema for evaluation | 2007-04-02 | 2007-04-04 | GK, JZ | Create a schema (ontology) for tool evaluation criteria, which can be used as a basis of creating a formal or semi-formal structure for recording evaluation observations, alongside informal and unstructured observations. |
| (b) Draw up list of potential tools | 2007-04-05 | 2007-04-10 | GK, JZ | Looking for tools to cover: access to repository metadata; conversion of metadata to a common format; reconciliation of metadata vocabularies; storage of metadata; simple inference on metadata to find potential coreferences; planning and construction of metadata queries; basic web browser interface to data web resources; programmatic access to data web resources |
| (c) Shortlist tools for evaluation | 2007-04-10 | 2007-04-13 | GK, JZ | Survey tools and select those that appear to be most suitable based on available information about the technology and and user experiences. Record reasons for selection or non-selection. The long-term goal of a lightweight system of loosely coupled pieces will influence the choice of potential components, as that is central to our vision of a data web. |
| (d) Evaluate shortlisted tools | 2007-04-13 | 2007-05-04 | GK, JZ | Install and explore the capabilities of the selected tools. Draft conclusions. Consider moving additional tools to the shortlist for evaluation if any of those originally selected are quickly seen to be unsuitable. Record observations in the wiki using the schema and additional free text. |
An initial list of tools that have been identified for possible evaluation includes the following. The list is far from definitive, and is intended as a pump primer rather than an indication that the tools mentioned are necessarily relevant to our goals. Drawing up and selecting from a fuller list is an important facet of this project.
- Semantic data management and access tools
- Semantic portal - http://www.swed.org.uk/swed/swed_technical_resources.htm
- Joseki - http://www.joseki.org/
- Sesame - http://www.openrdf.org/about.jsp, http://www.openrdf.org/
- HeuristScholar.org, a Collaborative Knowledge Space (CKS) for the Humanities and Social Sciences, (SHSSERI) - http://acl.arts.usyd.edu.au/shsseri/index.php
- Query tools
- D2R server - http://sites.wiwiss.fu-berlin.de/suhl/bizer/d2r-server/
- DARQ federated query - http://darq.sourceforge.net/
- JAFER - http://www.jafer.org/
- (other SRU/SRW software?)
- Tagging tools
- Connotea - http://www.connotea.org/
- Dictate - http://www.jisc.ac.uk/whatwedo/programmes/programme_pals2/project_dictate.aspx
- ALIPR - http://www.alipr.com/
- Drupal - http://drupal.org/, http://www.imageweb.org/drupal
- Rich Tags JISC project (just starting)
- Others
- VITAL (Fedora front-end used by Oxford - ask Sally Rumsey for details)
WP5: Core ontology proposal
Design for a core ontology for the specific repositories and search themes surveyed, and articulation of selection criteria for inclusion of terms in that core.
| Activity | Start date | End date | Responsibility | Outputs |
|---|---|---|---|---|
| (a) Design syntactic mappings | 2007-05-07 | 2007-05-10 | GK, JZ | Design mappings from the format of the various repository into RDF abstract syntax. At this stage, we are not concerned with semantic reconciliation, and no attempt will be made to match vocabulary terms |
| (b) Design semantic mappings | 2007-05-10 | 2007-05-15 | GK, JZ | Based on the common RDF abstract syntax from item (a), design semantic (vocabulary) mappings (inferences) that can make explicit certain detectable overlaps of information in the various repositories |
| (c) Analyze search themes | 2007-05-15 | 2007-05-18 | GK, JZ | Analyze the search themes from WP3(e) and identify minimal metadata that would be required to support those themes. |
| (d) Design a core ontology | 2007-05-21 | 2007-05-23 | GK, JZ | Design for a core ontology covering the repositories surveyed; articulate criteria for inclusion of terms in the core ontology; as far as possible, the core ontology proposal should use existing ontology standards identified in the project survey phase WP1(d); suggest ways that users could build upon the selected core ontology. The last point is included to keep in view the idea that the core ontology is not required to immediately solve all search problems; by planning for extensibility, the core can be kept smaller and cleaner. |
WP6: Synthesis
Bringing together the results of various consultatations and evaluations to a combined set of proposals and recommendations. This may involve partial piloting of some of the ideas, if time allows.
See also preparation of the final report in WP1.
| Activity | Start date | End date | Responsibility | Outputs |
|---|---|---|---|---|
| (a) Data web software design | 2007-05-23 | 2007-06-04 | GK, JZ | Based on evaluations of repository software, metadata and available tools, propose a design and plan for implementing an experimental image data web based, as much as reasonably possible, on existing software tools and components. The "experimental" qualification here is suggestive that functionality, flexibility and ease of implementation will be favoured over absolute performance or capacity of the implemented system; stability and reliability will not be compromised. What we want to achieve is a system that will allow us to expose an image data web to a community of users, in order to to refine our designs based on feedback from deployment in support of real research problems. |
| (b) Repository metadata recommendations | 2007-06-04 | 2007-06-08 | GK. JZ | "Preferences" may be a better word here than "recommendations", as our goal is, as far as possible, to work with existing repositories as we find them. But we would be remiss if we didn't document adjustments to the design or operation of existing repository systems that would make our task easier, or improve the quality of information that a data web might offer. |
Evaluation plan
The main output from this project is a report surveying existing software and proposing a design and plan for creating a data web. Elements to evaluate include (a) do we cover the ground we intended to cover, (b) are our proposals for creating a data web supported by our survey work, available software and/or current practice, and (c) do our proposals form a consistent element of JISC strategic directions.
| Factor to Evaluate | Questions to Address | Method(s) | Measure of Success |
|---|---|---|---|
| Repository survey coverage | For each repository, have we evaluated the software, the metadata schema and the available content? | Examination of draft final report | Record of coverage; omissions noted |
| Tool software survey coverage | Have we surveyed an adequate range of tools, and did we sufficiently evaluated those we surveyed? | Consultants' reviews of draft final report | Note additional tools we might have surveyed, and un-reported issues that might impact creation of a data web using tools we did survey |
| Metadata standard coverage | Did we identify relevant metadata standards for the core intology design? | Consultants' reviews of draft final report | Omissions noted |
| Supportability of the implementation plan | Is the suitability of components proposed for data web implementation supported by our survey work? For components that are not commonly used, have we sufficiently demonstrated functionality and stability? Does the design hang together and cover all key goals? | Review of technical design | For each element of the design, point to supporting survey work or common practice. Enumerate unevaluated technical risks. |
| Consistency with JISC strategies | Do our proposals advance JISC strategies for repositories? | Programme Manager's and consultants' reviews of draft final report | Note mismatches or disregard of existing JISC work |
.
Quality plan
For a survey, requirements gathering and scoping project of this nature, quality of outcomes are covered by the evaluation plan (above).
Dissemination plan
Project outcomes will be disseminated by the following means:
| Timing | Dissemination activity | Audience | Purpose | Key message |
|---|---|---|---|---|
| Project lifetime | Workshops and meetings of project partners | Project partners, invited experts, JISC | Project communication and collaborative development of recommendations | Ideas for and state of work in progress |
| Project lifetime, and beyond | Web site | Project partners, anyone else interested | Project communication and collaborative development of recommendations | Record of meetings, survey results, draft and final reports |
| End of project | Final report | JISC, developers, operators, users | Foundation for future development work | Survey results, recommendations and proposals for deployment of an image data web |
| Future | Developed software | Repository operators and users | Delivery of working data webs | Working software based on the results of this projects' work |
.
Exit and sustainability plans
Take-up and embedding of project outputs:
| Project outputs | Action for take-up & embedding | Action for exit |
|---|---|---|
| (a) Final report and supporting materials | We aim to use the final report and supporting materials as the basis for an implementation of image data webs to support specific researcher areas. We also hope that any lessons coming out of our survey of repository systems and schema will be taken up by our project repositories partners; improvements employed by established repositories have a basis for becoming part of more widely accepted best practice | Publication of the final report and supporting materials |
Project outputs that may have potential to live on after the project ends:
| Project outputs | Why sustainable | Scenarios for taking forward | Issues to address |
|---|---|---|---|
| (a) Web site containing resources and information pertaining to institutional repositories, repository access and metadata aggregation tools, and proposals and plans for implementing an image data web. | We have built the web site to serve not just the current project, but to also provide a space for collaborating on future development - as such, we expect active use and maintanenance of the site to survive this project. The tools chosen are all designed to support user contribution of content, with relatively little input required from a supervising "webmaster". The site itself is hosted on a VMWare virtual machine, allowing it to be moved relatively easily to a new hosting environment. |
A continuation project continues to use the same web site, or, failing that, transfer of web site hosting to new ownership | (No additional issues noted) |
| (b) Image data web implementation | A software implementation is not part of this project, but we intend that the resources we do create will create a higher point of departure for such a development in the future. Deployed systems that improve the capabilities and productivity of researchers are the ultimate goal of our work. | Following this project, assuming that no insurmountable obstacles are uncovered, we plan to apply for further funding to implement and deploy image data webs to support specific areas of research.
In connection with this, we have already obtained BBSRC funding to create a complementary system for a laboratory data management system to integrate semantic data management into laboratory research, both by facilitating access to semantic data on the web (such as would be provided by a data web), and by capturing research images and supporting metadata in a form suitable for publication to an image data web. | (No additional issues currently noted) |
.
Additional material - input from other image-related projects
People it would be good to contact or visit in the course of the project, in addition to consultant partners, include:
- Sheila Anderson, Director of AHDS [1], who headed the recently completed JISC-funded Digital Images Archiving Study and Moving Pictures and Sound Archiving Study (mailto:sheila.anderson@ahds.ac.uk; tel: 020 7848 1981).
- Andrew Wilson, Director of the current JISC-funded AHDS Project Metadata Generation for Resource Discovery (mailto:andrew.wilson@ahds.ac.uk; tel: 020 7848 1982).
- Both 26 - 29 Drury Lane, 3rd Floor, King's College London, LONDON, WC2B 5RL
- Karla Youngs, TASI Director at ILRT, Bristol TASI and ILRT's Imaging Group co-ordinator, (mailto:karla.youngs@bristol.ac.uk).
- Grant Young, TASI Technical Research Officer, (mailto:grant.young@bristol.ac.uk).
- Both at Institute for Learning and Research Technology, University of Bristol, 8-10 Berkeley Square, Bristol BS8 1HH. Tel: 0117 928 7091.
- Paola Hobson, Motorola (mailto:paola.hobson@motorola.com), aceMedia Project aceMedia, re multimedia metadata/ontology work.

