Meetings/20070104/BioImageWeb Consortium meeting
From ImageWeb
Contents |
Present
- David Shotton (Oxford)
- Richard O'Beirne (Oxford Journals)
- Clair Bird (Oxford Journals, molecular biology)
- Robert Kiley (Wellcome, Head of e-strategy)
- Elizabeth Newbold (British Library, UK Pubmed Central)
- Michael Selway (System Simulation, JISC PIXUS)
- Dolores Iorizzo (Imperial College)
- Will Wilcox (Blackwells, journal production manager)
- Matthew Day (Nature, database publisher, web pub group)
- Rachel Kotarski (BioMed central)
- Ed Pentz (Crossref, exec director)
- Edward Wates (Blackwell, UK publishing ...)
- Jun Zhao (Oxford)
- Graham Klyne (Oxford)
David's Intro
Background
(ask for slides)
Epochs:
- integration of data - databases, data warehouses
- integration of process - distributed queries, workflow
- integration of knowledge - semantic unification
What next?
- managing new bursts of data
- putting data in context - knowledge for interpretation and re-interpretation
Need:
- discovery
- ... (missed)
- ... (missed)
(Universals and particulars) Universals - public knowledge Particulars - unbounded, incomplete, not widely available, raw data subject to IPR restraint
Imaging central to post-genomic biological research
Need for descriptive metadata to describe images, interpret what is shown
Publication of research images -- THE WEB MAKES THIS POSSIBLE
Bioinformatics research increasingly in-silico
Goal is to facilitate interoperable access to research images, in an age of "personal" data publication.
What is a data web?
- publish data somewhere, anywhere, on the web
- lightweight tools to harvest, index, and link back to sources. Use a light touch: enough to find the data (but what is enough? we don't entirely know yet.)
Why not just Google? (see slides)
What have we done (see slides - 5 phases)
Stakeholder analysis -- get a copy for the wiki (some comes from DIA project plan)
BioImageWeb Consortium
- an informal group, with some focus on Bio images, even though longer term goals are generic
- no formal agreement; open meetins (don't reveal secrets)
- future funded alliances may need formal agreements
Name? ImageWeb? BioImageWeb?
David has done lots of talks
Roles for Publishers
- Matthew is leaving
- Edward Wates question about DEFImage being specific to repositories
- David has a proposal...
The agenda is re-jigged
Moving from dat web as a hub to bdata web as a service (e.g. click on image in one paper and see semantically related images/notes from other repositories). Cf. ImageBlast - locating semantic related images. Link paper-to-paper through images. Sorta like CrossRef linking via citations to articles.
Example: transdifferentiation of stem cells. Images of Green Flourescent Protein (GFP) labelled cells appear in unexpected places.
Dolores: what here is not possible with text analysis? David: the idea is based on using text analysis, but is focused on finding the related images, rather than just related articles. Example of value-add service.
Dolores: we badly need an annotation system (need to dig here what Dolores envisages).
DEFImage: doesn't address annotation. Only looks at repositoriy software and metadata. For publishers: we also need to look at your software, and your user requirements. need to study what kind of secondary services would be desirable. Then we can look to building prototype image webs. Then on to production service. CrossRef already has buy-in from publishers, once concept is shown to work.
Funding for requirements analysis?:
- DTI sensors and imagiung for medical applications. Instrument focused, and very commercially focused.
- Wellcome: Clinical Research Infrastructure Initiative -- not really a fit, very medically oriented
- Research councils: don't like funding infrastructure
Normal funding sources not likely to fund. Propose consortium-funded requirements gathering exercise. "ImageWeb Scoping" - to complement the JISC project, with focus on publisher requirements, to pool results. £10K commercial, £5K non-commercial.
Richard said at 1st meeting: publishers need to work harder to expose what they already have. E.g. metadata prsent in publishing process. Anita De Waard is developing system to allow authors to mark up metadata, which can be collected early and propagated through the publishing chain.
Dolores: funding opportunity, FP6 program for digital libraries - EU "desparate for" business models to bring in commercial partners/involvement.
Ed: JISC group PALS funding stream to support publisher/library cooperation. New idea: PALS3(?) call for metadata interoperability. Ed Pentz is on the committee.
Matt: (Nature's contributions?) V interested in what's going on here, Nature is interestes, we want to contribute, can contribute content. Nature wants to make more use of images - time is to do something. This year, will start to do something. Technological challenges to sort out (behind the scenes handling of images). Also, have open text mining interface. All offered as evidence that Nature are doing real work in this area. Web Publishing Team have developers, maybe help with dev time (maybe, not sure); technical advice is OK. Connotea. Nature proceedings (upload/document sharing/no peer review, just moderation); Connotea + resources? Regarding money; can't respond to that - will take up with seniors. Other help: meeting rooms. David: possible implementation of Connotea into ImageWeb registry - maybe option for development contribution?
Start with low ambitions. Don't worry too much about higher applications - yet. really happy to see that things really are still moving. DS: need to do something or disband.
(David mentions PictureAustralia)
Robert Kiley: would like to see things; can harvest that this afternoon. OAI-PMH/Z39.50 mentioned.
Dolores: videos within published papers. Jane Hunter, Carl Lagoze collaborations?
Ed: PLOS`is registering DOIs for all images, but not much metadata yet.
Aside - Q for Jun: could we create a web interface to do schema-driven survey; any useful tools? Look into survey tools for Drupal/Joomla/etc.
Jun: very happy to see that publishers really want to get involved; past work was concerned about problems of building links with publishers.
Dolores: involvement with W3C multimedia interoperability group (which? nail down)
Afternoon session
Seeking feedback: what else do we need to know? where are the faults in our vision?
Edward Wates, Blackwell
800000 articles currently, up to 140000 or so. Large critical mass. DS slide on publisher interest pretty accurate. But will need ... founder members put up cash for CrossRef that was subsequently repaid. Driving traffic is not enough to raise this kind of money. Requirements gathering exercise is a valid exercise -- what would be ... (the payback?) ... why should we do this? And, what is the sustainability model for the Data Web? CrossRef model?
Ed Pentz: not creating some new central organization. Funding model is different, not ongoing. Crossref mission to have publishers work together to do new things they won't do individually.
Sustainability model part of initial fasibility study (Ed pentz, Edward Wates).
DI: bring in users as well as publishers as part of the requirements gathering. ("It would be silly to do a frequirements model just based on the publishers") [PURL.org?] Richard B..., Claire Jenkins, (ANO)
Richard O'Beirne, OUP
"from a technical point of view, publishers need to learn how to be invisible" (conflict with branding?). Currently, moving to XHTML, em,bedding RDF for CC lincences, OAI-PMH and extensions. (something about NIH syndrome) Publishers aren't good at doing R&D - this is a way to change that position/view. Happy to propose funding to people here. Claire Bird: happy to support Richard with management at OUP. Keep it simple at the start to get something in place. Making articles "more intelligent" - how can all this free open access "stuff" actually be used? ImageBlast is a compelling vision for the future. Richard: current production processes strip out value -- standardization of content discards metadata. Flickr extracts metadata and ...
GK: can use author-original DOI? Yes. There's an argument here for using a non-HTTP URI.
DI: intelligent articles *and* books. TEI facilities at Oxford, and others.... Need: coreference, disambiguation, annotation.
DS: also, users can annotate. Esp for images. DI: but some annotations are very specialized, can't be outsources. DS: Must be benefits for authors. DI: therefore, must work top-down (to seed process and value).
EW: images are downsampled.
DS: what do publishers think of supplementary material? Blackwells: >35000 accesses/year.
Claire: gut feeling is that this is a useful direction to pursue. Would like to do straw poll of journal (editors?).
DS: it is maybe time to publish something -- ideas for when and where? Richard: D-lib? BioMed central? Ed W: what hoping to gain by publication? DS: not entirely sure, a sense of reporting what we're doing. GK: CACM forum?
Rachel Kotarski, BioMed Central
Rachel Kotarski, BioMed central: definitely want to be involved in BioImageWeb. Benefits for authors are good. Already using RDF. Access no problem -- all CC licenced. Also in the process of creating a peer-reviewed image library - can ask for metadata. May be some licensing issues here - not all CC-licensed. Funding of requirements gathering - can't say; need to go back to management.
Richard: are authors willing to submit images? Rachel: it depends on the author. Aspects of trust come into play.
Michael Selway, System Simulation
Michael Selway, System Simulation: it's all very interesting. Intelligent articles and re-use of images singled out. What are user requirements? Issues of scalability and preservation. Start-small approach is vital. Question about funding: what is minimum requirement? DS: basis of full-time researcher+50% project manager for 6 months. Maybe more in light of new areas to examine raised today. Grosses out at about £100K.
What should/could be the role of BL? DS: take a lead and change the world (!).
DI: Bill Olivier (?) matching funds?
Not a sublisher or an author. But it touches on what SS do. We can do all of this already (almost) -- don't need new standard; a number of products already exist. What is different? The semantic web angle. Vocabulary "coherence", semantic coherence is a hard problem, doing just a bit would be a big win.
Raw commercial interest: standards are good for selling software. Image library software / digital asset management (DAM) software is a key business area for them. Need to embed fruits of DAM into other applications. Good to be there in the early days.
Specific expertise in a small number of areas: scoping and standardization, tools in the area of thesaurus development and controlled vocabularies, user interface design (HCI) "interaction design", prototyping and building demonstrators, tools and libraries built up overb the years, "the bridge" multiprotocol mapper, protocol gateways. Finally, contact with image libraries (Wellcome, stationery office / brit pharm, RTE image library, british red cross soc., Getty images). Experience of working with academics; usually 1-2 Euro projects - FP6 preservation in broadcast media (w/Southampton). Links with Imperial.
DI: someone to involve: Stephan Rutger - multimedia ... technology enhanced learning. Senior prof at OU media lab. DI can demo tool showing innovative search results and visualization. Salmon maps(?). Dendrogram trees. Concepts and clusters of words.
These might be some useful tools to build near-term on a data web
Importance of tools that can present results in interesting ways
Robert Kiley, Wellcome
Have lots of data in PubMedCentral, in XML. Can do this now.
Also have an image repository; 15000-20000 biomedical; have an OAI repository. Don't use DOIs.
Funding: Wellcome library would be able to find £5K, if it wanted to.
The main thing Wellcome can offer immediately is access to data.
DI: evidence that this is a problem for which there is a desperate need -- talk to any (...) researcher. Example: medical science can non-invasively diagnose intestinal diseases; video camera in a pill can record c. 15 hours of video; want to automatically select interesting bits of image - is a hard problem. Need something similar for isolating interesting data from the the "firehose" of available data.
DI: Success breeds success... if we can get the masses of data organized, we'll get much more value from it. As has happened for physics data at (??).
Future planning
DI: protoyping as part of requirements gathering
Additional partners:
- intellectual content - urgent topics
- production issues
- metadata standards
- semantic technology alternatives
- identifiers
- publisher metadata
Discussion about thenrole and extent of prototyping in a scoping study. Differing views.
Next meeting: demo meeting in London, to stretch the imagination.
- eChase project (6th framnework) access to multimedia assets, largely cultural heritage.
- GridEcon - economic models for the grid
Michael and Dolores to organize a demo meeting in London
Edward Wates: call a meeting of people with financial interests; would need to be carefully planned.
Ed Pentz: prepare and circulate a draft project proposal.
Publishers to hold a separate conference call.
All participants: email David with list of priorities.
DI: talks of an "industrial forum" to set an agenda for future developments; discussions under Chatham House rules. Suggests to develop and sell BioImageWeb along similar lines.
Richard: suggests survey of what are publishers doing now, what do they use, what metadata. GK: I;d be interested to collect and coordinate this information.
Ed Pentz: suggests a separate technically focused group to discuss architectiral issues, say regarding use of DOIs.
Name: BioImageWeb seems fine for everyone.
MyImageWeb
InsideOut.org
OurImageWeb.org
DI: tell others what we're doing. If we don't, someone else will. Establish ourselves as thought leaders.
(Phew.... out of steam.)
Action:
GK - create a mailing list for the consortium.

