Meetings/20070622/DefiningImageAccess-FinalMeeting

From ImageWeb

Jump to: navigation, search

Contents

Defining Image Access - final meeting

22 June 2007, Wolfson College

Present

  • Jun Zhao
  • Peter Robinson
  • Julie Allinson
  • Brian Matthews
  • Alistair Miles
  • Dolores Iorizzo
  • David Shotton
  • Graham Klyne


David Shotton - initial presentation

Recap data web ideas. Ideas have evolved.

Dolores raises question of anotation schema: original sources or presented common schema? Don't know yet.

Also brief mention of aggregation to common point - engineering choice, don't know yet.

Description of image webs.

Stakeholders: "everybody has a different take" - slide shows a wide diversity of motivations for participation in an image web.

Progress towards making image webs: early meetings leading to the current scoping project.

Scope: requirements for interoperable discovery and delivery of image data in inst repos using Dspace, Eprints and Fedora.

  • how to expose metadat
  • marshall
  • links back to original sources

Analysis of current university repositories practice.

Capabilities of repository software; access mechanisms.

Evaluate technical approaches to building a data web.

Primary deliverable is a report. Plus the wiki.

Peter notes distinction between wiki as train of thought vs final presentation.

Wiki front page shoukld invite self-registration for copmmenting?

Achievements:

  • Established body of knoweldge and experience
  • productive workshop meetings
  • making contacts in the community
  • refined our own ideas
  • seen interest from other parties

Conclusions:

  • not many image collections in repos
  • available collections lack adequare metadata
  • metadata access mechanisms are limited

Fruitful discussion with Chris Gutterige, how to augment ePrints to address these problems.

Fedora - need to build intrerface and content model

DSpace (current generation) is very restricted.

Unavailability of metadata limits our ability to pursue original goals directly.

Schema registry hand crafted

Useful tools (e.g. Southampton)


Peter - suggests report should recommend work on bulk metadata ingress for images.

Graham Klyne - technical synthesis

Pick up on hand-crafting of schemas; subscription model

Peter - need to explicitly state the problems of metadata acquisition - provide tooling to get it in a transmittable form.

ACTION: GK, add this to report

Peter - CLiC has list of packages used by image collection managers; Luna insight

ACTION: GK, add this to report

ACTION: GK, add note to report about visual nature of images and its impact of user interface issues (cf. DanBri's comment)

Jun Zhao

Talking about use of ePrints as a repository and publication mechanism for a local research group generating images and annotations of gene expression in drosophila spermatogenesis. Describes background requirements, a look to future goals, current researcher practice (image files+Excel), problems of standard user interface for bulk ingest of images, inspiration from SERPENT and eBank projects projects, development of special tools for bulk ingest, initial work to tailor the display. Much work remains to be done, but ePrints has got us started very quickly (about 2 weeks effort so far).

Peter's comments:

  • images coming in collections means that tools for bulk upload are important
  • control fields presented during search/browse? Jun: yes.
  • alternatives to web-form upload
  • alternative views? e.g. zoom. Jun: yes, can use plugins (cf. ecrystals 3D rotating models)
  • annotating regions of images? David: Dictate, can this be subverted to do region annotation?

David Shotton - future plans

  • Building a demonstrator image web
  • live journal content
  • other JISC projects

Phase 4: creating a real demonstrator, starting in Autumn?

DW-40: frictionless interface between papers in repositories and research datasets.

  • local hosting for images
  • papers in repositories
  • building a test data web around these

Talking to publishers: Wiley-Blackwell, OUP, ...

PDF as the embodiment of a static printed page. Publishers discard lost of available data to create a PDF. More web-sympathetic publications would have, e.g., live links to data sources.

Examples

  • pre/post tsunami aerial images
  • lens to move over graphic to extract information (gatech.edu)
  • click on enzyme ribbon diagram to view PDB page; from there to 3D visualization
  • all images as clickable videos

These suggest ways that publishers can enhance papers in the publication process.

Also for repository articles, with links to live datasets. CLADDIER "ping". Brian M: Recreate network of backward an d forward citations (the goal), like blog trackbacks - adapting existing blog protocols; also Rich Tags trying something similar. Adding metadata to notification with backlink. Complementary: relabelling parts of graphs: data publication/citation (Sam Pepper?). (Citation with content-based fragment addressing?)

(Some links added to report outline)

Common repository interface

IEMSR

Alistair: NSDL metadata regsitry (uses SKOS)

Intute repository search

eBank, R4L, SPECTRa

  • Dolores mentions role of PMR and CML in accelerating data publication for chemistry.
  • data entry forms for crystallography data, apply checks to data submitted

StORe and CLADDIER

SCARP, Image Store

CoKE - collaborative knowledge extraction - currently at pre-application stage

CLAROS - data webs for classical art

Alistair Miles - CLADDIER demo

JOAI harvester, harvest from three specified locations. Also indexes content using Lucene,\ and provides lightweight protocol for querying index via OAI-PMH request. Ajax front-end to construct lucene query and display results with both papers and data.

Unbsolved problem is getting "ping" metadata into the repositories


Julie Allinson - ORE and SWORD

ORE

Resource focus as opposed to OAI-PMH metadata focus.

Recent white paper - web-centric OAI-ORE perspective.

Also, focus on compound digital objects.

Institutional repositories and others. Links to supportimng tools.

Still discussing what a compound information object is. Scoping problem.

For us: image plus metadta, esp. domain-specific

Underpinned by Web architecture. Don't want to reinvent stuff.

Boundary of compound object, and relationshipos between parts is lost when moved onto the web. Machine-readable "splash pages".

Relationship named graph and ReM is unclear - muddle over representation.

Issues:

  • containment node - separate from named graph.
  • containment vs reference.
  • named graph discovery

Important, but out of scope?:

  • authority and ownership
  • vocabularies for link types
  • vocabularies for proerties of resources

Next steps:

  • ReM serialization: RDF/XML, TriX, ATOM, (MPEG DIDL, YADS)

SWORD

Aims: improve efficiency of ingest into repository, multiple deposits, common interface, interoperability.

Based on standard specification.

Builds on Deposit API work. (That work linked from SWORD wiki.)

R4L work at Southampton - data services pick up files and import to repo - see Simon Coles' scenario on SWORD wiki.

(cf. MURDER)

(Intralibrary from Intralect, underpins JORUM - learning object reposiotry.)

Pain points:

  • no standard way currently
  • no standard interface for tagging/authoring
  • no standard interface for transfer between repos
  • no standard interface for externally initiated workflow
  • no standard interface for deposit in SOA

Deposit: two components:

  • Explain (about the repo)
  • Deposit
    • -> receipt + id

APP drawbacks:

  • pushing design boundaries?
  • requires 'signifcant interpretation' to deposit compound object

Dolores: CASPAR project insights - need to think about presentation of ingest?

Choice of packaging standards - problem area? For now, use accept element, but still need community agreement.

When might we see a prototype for EPrints? Project runs until end August, hopefully by then.

Extensibility to arbitrary metadata schemas (optional elements?) Ducking issue: metadata is in the package.

Stuart Lewis (Aberystwyth), repository bridge project - demonstrated deposit from Dspace to Fedora.

Alistair Miles - Image Store

Digital Curation - maintaining and adding value ... current and future use

Concludes: enabling re-use

  • Short term: publication/sharing
  • Long-term: preservation

(Can these be separated?)

(Dependence on a highly informed designated community?)

Methodology: different disciplines - immersive case studies - gather best practices and find opportunities to apply.

Deliverables:

  • Case study reports
  • Synthesis report

Image Store: Images or video (plus associated data) are key elements of the scientific research.

  • Wildlife CRU - badger videos in Wytham woods, behavioural observations
  • Images of gene expression in Drosophila
  • Bevahoural ecology research videos - animals in controlled environment
  • Electron microscopy of Trypanasomes - large archive of electron micrographs on film + small number of electron tomography. Use film as controls for hypotheses based on tomographic data.

Goals:

  • What would it take to curate assets? Propose feasible strategies, short and long term views.
  • Looking at a small defined subset of the scientific legacy

Deriving requirements for applications to do real work on scientific preservation

Case study methods:

  • attitudes, reequirements, risks and constraints
  • Producer PoV: assets, context, risks, attitudes
  • Consumer PoV: re-use cases; requirements

Strategies feasibiluty and recommendations

Foundations: OAIS, CCSD/PAIMAS (http://public.ccsds.org/default.aspx, http://public.ccsds.org/publications/archive/651x0b1.pdf). Challenge: OAIS is highly centralized, no real appreciation of decentralized data.

GK: Preservation antithetical to web? Brian picks up on notion of trust in preservation: if not a single authority then use trusted services.

Post-hoc vs "Sheer" (virtually transparent) curation. get as much as possible out of what the scientists are already creating.

Hypotheses:

  • scientists have no time or money for curation
  • no funding precendent
  • heavyweight post-hoc approachjes unlikely to be feasible in Image Store contexts
  • look for ways to integrate curation activities into working practices.

Open question: to what extent focus on specific cases, as opposed to using these to informing proposals for general practices.

  • require significant chage -> high risk rejection
  • good data management practice => good curation practice
  • bridge local-level and higher-level activities

Open issues: (still being determined)

David Shotton

Video of Betty the toolmaking crow.

Brief presentation of SABRE research before&after workflow diagrams.

Personal tools
Oxford DMP online
MIIDI
Claros