FlyWeb: linking laboratory image data with public databases and publication repositories for Drosophila functional genomics
The FlyWeb Project is linking research image data from FlyTED with related data from the Berkeley Drosophila Genome Project, FlyBase, FlyAtlas and other sources. This project builds on the findings of the data web requirements analysis project Defining Image Access.
Duration: Oct 2007 to May 2009.
Visit openflydata.org, a cross-database search service for Drosophila gene expression data.
- Latest News: openflydata.org has two new search applications, search by gene batch and search by tissue expression profile (2009-04-14)
- FlyWeb/MilestoneThree -- all search tools, data, services and software released in milestone 3 (FM3)
- FlyWeb/MilestoneTwo -- all search tools, data, services and software released in milestone 2 (FM2)
- FlyWeb/MilestoneOne -- all search tools, data, services and software released in milestone 1 (FM1)
- We've deployed has two new gene expression data search applications to openflydata.org: search by gene batch and search by tissue expression profile.
- We've been working on a relatively complete mapping of the FlyBase Chado relational database to RDF. This includes an expanded version of the Chado-in-OWL ontology and a relatively complete set of D2RQ mappings (see the http://openflydata.googlecode.com site for more information).
- We've expanded our survey of Related Work in preparation for a paper on the FlyWeb project.
- Announcement of second milestone release to relevant collaborators and mail lists
- Announcement of first milestone release to relevant collaborators and mail lists
- Presentation of FlyWeb at UK SWIG meeting
- Presentation of FlyWeb at the W3C Health Care and Life Science Interest Group Telecom
- We've launched openflydata.org, a web site offering search applications (just one at the moment), data and web services for the Drosophila research community.
- We've migrated our SPARQL endpoints to Jena TDB triple stores running in a small Amazon EC2 instance. EC2 and TDB are giving us good load and query performance (see some benchmarking results), certainly good enough for SPARQL/AJAX mashup applications. We did this in response to poor loading performance on our in-house virtual server platform running SDB/postgres.
- We've update the image mashup search application, incorporating a number of usability improvements suggested by our collaborating Drosophila researchers.
- We've revisited the mappings from FlyBase and BDGP relational schemas to RDF, and tried to develop a principled approach to translating the data, making as little interpretation and remaining as faithful as possible to the source data. We used these mappings to generate new RDF dumps from FlyBase and BDGP, and developed ontologies to support them.
- Jun represented FlyWeb at the W3C Semantic Web Health Care and Life Sciences Interest Group face-to-face (at the W3C Tech Plenary).
- Al went to the London Fly Meeting, to draw inspiration for future developments.
- Al attended the Dublin Core Metadata Conference, and shared insights with an emerging community of linked data developers.
- Jun spend some adding new data to the fly-ted database.
- We've migrated our code to three projects hosted at Google Code: sparqlite, flyui and openflydata
- We've spent some time reviewing the design of the user-interface applications, and done some refactoring to support more fine-grained testing and extensibility/customisation.
- We've implemented case-insensitive search in the image mashup search application.
- We all visited Helen White-Cooper to demo the new image mashup application and discuss future developments [Minutes].
- We've developed a first search application integrated a genefinder feature (using data from flybase) with image data from both flyted and bdgp.
- We've developed a genefinder widget to disambiguate queries to gene identifiers, using data from flybase.
- We've begun an investigation of the alignment between gene names used by FlyTED, BDGP and FlyBase. There are some gaps and some ambiguities which we'll have to resolve if we want to provide perfect precision and recall in the FlyTED-BDGP Image Mashup.
- We've migrated our triple stores to use Jena's SDB storage platform, backed by Postgres databases. We've also developed a cut-down implementation of the SPARQL protocol specifically for use with SDB, designed to be robust and scalable. We've used these software to deploy new SPARQL endpoints for FlyTED, BDGP and FlyBase gene names.
- We've generated an initial RDF representation of gene name data from FlyBase. These data are downloadable as compressed files or can be queried via a SPARQL endpoint. This is early work, schema and/or URIs may change. We used the rdf-dump utility in D2R server, see also the D2R mapping file we used. The resulting RDF data was then loaded into a Joseki store for serving as SPARQL (see rodos SPARQL endpoint above, and also rodos setup notes). Serifos setup notes has info about running D2R dump-rdf, which assumes a standard install of D2R Server.
- We've been exploring FlyBase and working out options for accessing gene name and synonym data, to support synonym and disambiguation of gene names within our research tools. See also the FlyBase entity-relationship diagrams we reverse engineered from the flybase database, to help understand the schema.
- We've added a new feature to the FlyTED-BDGP mashup to present a list of all gene names used by each data resource, as a prelude to better support for gene name synonyms and disambiguation.
- We spent some time with Helen White-Cooper, our collaborating Drosophila researcher, evaluating the FlyTED-BDGP mashup. We're using her feedback to drive our next development cycle. One important feature we're working on is building in an alignment of gene names across the two data resources, so that important results aren't missed in the mashup.
- Jun Zhao presented a paper on Building a Semantic Web Image Repository for Biological Research Images at ESWC2008. The paper is mostly on her work on the FlyTED database, but includes a discussion of FlyWeb also. See also her slides.
- We've deployed a first version of the mashup of images from FlyTED and BDGP at http://rodos.zoo.ox.ac.uk/flyweb/search/flyted-bdgp/ ... this is an alpha release, demonstrating simple search by gene name across two different data resources.
- Jun Zhao presented a paper on Provenance and Linked Data in Biological Data Webs at the WWW2008 Linked Data workshop. See also her slides.
- We started work on a mashup of images from FlyTED and BDGP. We're using a SPARQL/AJAX design pattern, and documentation we've generated so far includes basic and extended diagrams of client-side MVC architecture, a diagram capturing a strategy for test-driven development of an asynchronous MVC UI, a UI state diagram, and error-handling within an asynchronous MVC controller.
- We discussed a SPARQL/AJAX design pattern for building a mashup of images from FlyTED and BDGP.
- We sketched five research tools & discussed features with our researchers (sketch 1, sketch 2, sketch 3, sketch 4, sketch 5).
- We asked our researchers to sort data sources, by importance to their work, and to tell us about any pain points they had in using those sources.
- We produced a list of features to be implemented for FlyWeb project, based on our users' studies. These features will be selected by their priorities and updated continuously throughout this project.
- We collected a set of high-level use cases, based on previous interviews with our researchers.
- We came up with the Data Web Layer Cake, a new architecture for building data webs from the ground up.
- The Data Access Survey summarises existing access mechanisms to data and metadata resources of BDGP, FlyBase, PubMed and PubMed Central. See also FlyWeb/BDGP BDGP and its SPARQL endpoint. (Ref WP3 (b).)
- Golden standards for publishing Linked Data.