DefiningImageAccess/Progress/20070801
From ImageWeb
Contents
|
Defining Image Access, progress tracking, 1 August 2007
Present:
- Graham Klyne (GK)
- Jun Zhao (JZ)
Agenda:
- Review actions from DefiningImageAccess/Progress/20070716 (5 mins)
- Summarize activity since last progress review (5 mins)
- Review progress on ePrints evaluation (10 mins)
- Progress on preparation of final report (5 mins)
- Review state of wiki (5 mins)
- Planning future activities
- Any more comments for post project review? (cf. DefiningImageAccess/PostProjectReview) (5 mins)
Summary:
- Activity since last review:
- Work on ePrints, Fly-TED is nearly to publishable form
- Continuing addition of survey pages to wiki
- Some difficulties with using EPrints have been identified
- Work on final report is well under way: David should complete an initial draft for JISC within the next day or so.
- Starting to look at technologies to explore in further projects
Review actions from last meeting
Actions from today DefiningImageAccess/Progress/20070801 are:
- ACTION 20070801.1
- JZ, Add links to EPrints training materials to the EPrints wiki page
- ACTION 20070801.2
- JZ, Include text description field in EPrints database for Fly-TED
- ACTION 20070801.3
- JZ, Complete logic for images with two strains
Actions outstanding from DefiningImageAccess/Progress/20070716 are:
- ACTION 20070716.1
- GK, provide Jun with pointers to PyParsing library and examples. Done.
Actions outstanding from DefiningImageAccess/Progress/20070607 are:
- ACTION 20070607.1
- JZ evaluate mSpace when available - still awaiting software
- ACTION 20070607.2
- GK chase availability of mSpace - Telephoned Max Wilson 20070716 - he's away at conferences for next two weeks - try again later.
Actions outstanding from DefiningImageAccess/Progress/20070511 are:
- ACTION 20070511.2
- JZ to document findings so far of metadata content in repository collections surveyed so far - continues - aim to do this in the next week or so now the initial ePrints evaluation is done Done. See DefiningImageAccess/RepositorySurvey, "Survey summary" sections.
- ACTION 20070511.6
- JZ, GK to review and refine software evaluation notes - some work by GK on EPrints page - not likely to be completed until after report is drafted
Summarize activity since last progress review
Note, activity is now moving beyond the original Defining Image Access project plan, informed by findings from the project.
Jun:
- EPrints Fly-TED development:
- Updated image descriptions from Liz, created new EPrints subject tree to match contents, and leanred PyParsing to parse content out of free text data from spreadsheet. There is some information in the textual descriptions that isn't captured in the subject tree (signal strength information). Text description is not in database. We now feel that the problems of signal strength are less acute if the free text annotation is included in the database, as well as subject tree details that are extracted from it.
- Some images contain images of two strains (e.g. for comparison). The subject tree mechanism has no way to distinguish subject tags associated with the different strains. Our feeling is that it's not worth trying to capture this level of detail using the subject-tree mechanism. Rather, if all applicable expression patterns are captured, a human observer can see the applicability to different strains by looking at the descriptive text (see above). (cf. Alan Rector N-ary pattern?)
- Southampton EPrints surgery visit: (see below)
- Working with local researchers to refine Fly-TED database presentation and interface.
Graham:
- Preparation of content for final report
- Continuing addition of survey pages to wiki, covering metadata standards and software tools
- Preparing ideas for follow-on demonstrator project proposal
- Attended "Trends and Transients" session of CSW XML Summer School. Interesting information about Microformats, Web 2.0 developments and XML processing. See Meetings/20070725/XMLTrendsAndTransients.
- Returned to exploring FlyBase, with a view to submitting queries and getting XML results back. There seems to be a discrepancy between the documentation and the live site.
- Exploring technologies for possible incorporation into the FlyData project, including Monad Comprehensions for data Queries, Continuation Passing for web browser/server interactions, and metaprogramming approaches to unifying server/client code.
Progress on ePrints evaluation
Southampton EPrints surgery visit
(See DefiningImageAccess/Tool/Eprints#EPrints_Surgery_in_Southampton_on_25_July.2C_2007).
This was a valuable meeting, in which many things were discussed. Notably:
- Setup of the SERPENT repository, which in some respects we are trying to emulate. For SERPENT, all uploads/deposits are managed by a central administrative team, so we arec facing a rather different set of problems.
- Discussed customization of EPrints' built-in process for image upload. Two choices were offered, per-image annotation or bulk upload of data:
- Scientists upload an image (or small set?) and then enters annotations via a Web form. We don't think this would work for our researchers, who have already captured most of their metadata in a spreadsheet.
- Scientists upload one or more images and spreadsheet containing associated metadata. This seems closer to our requirements, but there seem to be some unresolved problems here.
- EPrints approach to custiomized metadata is very focused on developing plug-ins for metadata import, display and querying. Out approach of buk import is a more complex process, but in the long run is arguably easier in our circumstances.
- EPrints can hide images from public access, but cannot hide metadata. If researchers have information they are not ready to publish, the proposal is to use two repositories: one internal-facing, and one public facing. The presumption here is that data (images and metadata) can easily be migrated from the private area to the public repository.
- Jun collected many references to training materials that are far better than those mentioned on the main EPrints site, and has permission to post links to these from our wiki.
- ACTION 20070801.1
- JZ, Add links to EPrints training materials to the EPrints wiki page
State of our Fly-TED repository
All of the available data has been loaded, though the XML spreadsheet metadata is still evolving to better capture the required image content semantics. Jun is conducting regular reviews with Liz and Elin, who are doing most of the image acquisition and annotation work.
PyParsing has proved to be a useful tool for extracting metadata field values from a structured English description in the spreadsheet, but not all information is easily captured. We now feel that the full text description should be included in the EPrints metadata, as well as the extracted formally-defined information. This way, additional information entered by researchers will not be lost.
- ACTION 20070801.2
- JZ, Include text description field in EPrints database for Fly-TED
We also need to add logic to handle images which contain images of two strains of Drosophila, e.g. for comparison. This is in hand.
- ACTION 20070801.3
- JZ, Complete logic for images with two strains
The work on this pahse is nearly done, and Jun looks forward to the opportunity to explore wider issues. (Maybe: creation of a SPARQL endpoint using Joseki and Jena model loader? - #g.)
Comments on EPrints suitability as a research group image repository
Looking forward, we see the following areas of difficulty using EPrints as an image repository:
- New subject trees require creation of a new database table (but they are handled by the existing user interface code).
- New attributes require the creation of new database fields and/or tables.
- New attributes require new user interface elements for deposit, display and search elements.
We see that in its standard form an EPrints repository does not easily handle evolving user data and metadata requirements, which we think are also inevitable in a research setting like ours. However, we do have some ideas for creating a set of plugins that will alleviate some of these problems.
Imperial College meeting update
Meetings/20070615/DefiningImageAccess-ImperialCollege DefiningImageAccess/RepositorySurvey#Imperial_College
The meeting notes and repository page have been updated with further details provided by Dolores.
Preparation of final report
Initial notes by GK DefiningImageAccess/Report, continuing drafting by DS.
We held a meeting to try and refine the thrust of the report. Some "Key Messages" notes have been added to the report notes page. Drafting of the report continues in a Word document, which David is currently revising.
Review state of wiki
Several areas have been added and restructured to accommodate new information. Information about Microformats and related topics has been added.
Areas that need some reworking are:
- software evaluations: separate conclusions from procedures. Make the conclusions easier to find and access.
- indexing: review to see if key information is easy to find.
- use of additional SMW attributes (Schema, Idea, Note) to highlight remarks so they can be rediscovered later.
- Schemas: http://imageweb.zoo.ox.ac.uk/wiki/index.php/Special:SearchTriple?attribute=Schema&do=Search+Attributes
- Ideas: http://imageweb.zoo.ox.ac.uk/wiki/index.php/Special:SearchTriple?attribute=Idea&do=Search+Attributes
- Notes: http://imageweb.zoo.ox.ac.uk/wiki/index.php/Special:SearchTriple?attribute=Note&do=Search+Attributes
We expect to tackle this after the final report has been drafted.
Dissemination activities
David spoke about Data Webs at a Collaborative Knowledge Extraction Symposium in Edinburgh.
Comments for post project review
See: DefiningImageAccess/PostProjectReview
No new comments.
| Task | Planned effort | Planned start | Planned finish | Status |
|---|---|---|---|---|
| WP1(b) Project management | 12d | 2007-01-08 | 2007-06-22 | Ongoing. |
| WP1(d) Related work survey | 10d | 2007-01-15 | 2007-02-16 | Essentially done, but entries are being added and updated as work proceeds. See DefiningImageAccess/RelatedWork. |
| WP1(i) Draft final report and presentation | 15d | 2007-06-08 | 2007-06-28 | Initial outline and content of report and accompanying presentation have been created in the wiki (DefiningImageAccess/Report, DefiningImageAccess/Presentation). Work to draft the final report document based on this material is under way. |
| WP2(d) ePrints software evaluation | 5d | 2007-03-21 | 2007-04-26 | Our initial evaluation is done, and is being followed by a deployment for publishing gene expression images with annotations. |
| WP3(b) Enumerate repository schemas | 5d | 2007-03-05 | 2007-03-13 | No formal schema information is available from Imperial College. This activity is now complete. |
| WP4(all) Software tool evaluation | 24d | 2007-04-02 | 2007-05-04 | Current work focusing on ePrints and soon to use Joseki/Jena to provide a SPARQL endpoint. Awaiting access to mSpace software from Southampton. |
| WP6(b) recommendations | 5d | 2007-05-23 | 2007-06-08 | Started thinking about problem areas. Isolation of some key issues is part of the technical outline at http://imageweb.zoo.ox.ac.uk/drupal/files/20070530-ResearchDataPublication.ppt. Need to pick up on comments made at the final project meetings, and make sure these are represented in the final report. |
| Dissemination | David talked about Data Webs at a meeting in Edinburgh about high-throughput video analysis for animal behaviour research. |

