Flyweb/Gene query strategy

From ImageWeb

Jump to: navigation, search

Contents

Gene name query implementation strategy

Notes from a meeting held 2 July 2008 to explore options for handling Gene names

We decided early on that the Flybase Chado datbase is an appropriate resource for gene synonym and identifier information. (See Gene Name Synonyms investigation of Chado FlyBase database.)

Options

  • REST web service performing SQL to Flybase.org
    • Code to perform queries
    • Code to handle REST interface
    • Code to format results
  • REST web service performing SQL to local copy of FlyBase
    • As above, plus deployment of local FlyBase copy
  • SPARQL query to local extract from FlyBase
    • Design RDF structures and URIs
    • Code to create local data from FlyBase
    • Deploy and load SPARQL endpoint
    • Regular update/reload (e.g. monthly, matching FlyBase reloads)
  • SPARQL query to local extract from FlyBase, using D2R to perform conversion
    • Design RDF structures and URIs
    • D2R mapping to create local data from FlyBase
    • Deploy and load SPARQL endpoint
  • SPARQL query to FlyBase.org, using D2R server
    • Design RDF structures and URIs
    • D2R mapping to extract data from FlyBase
    • Deploy and load D2R server
  • SPARQL query to FlyBase local copy, using D2R server
    • As above, plus deployment of local FlyBase copy
  • REST web service accessing precomputed synonym lookup
    • Code to read local data, find FlyData matches and build synonym tables
    • Code to handle REST interface
    • Code to format results
      • Lucene as an option
  • Other options
    • OpenLink Virtuoso
    • Topbraid
    • RDQuery
    • ...


Browser API options

  • RESTful full SPARQL
  • RESTful restricted SPARQL (just enough SPARQL)
  • Custom RESTful API

It is presumed that returned results will be in a JSON format.

Responses to all queries must be received in under 2 seconds.


Considerations

  • Case insensitivity
  • Simplest/easiest to implement
  • Lightest/least resource intensive in use
  • Performance
  • Least intrusive to FlyBase
  • Research contribution
  • Risk/likely robustness


Pros/Cons

Consideration A. SQL to Flybase.org B. SQL to local FlyBase copy C. D2R server to FlyBase.org D. D2R server FlyBase copy E. SPARQL to FlyBase extract, via D2R F. SPARQL to FlyBase extract G. Precomputed lookup
Regular update vs live data? + - + - - - -
Effort to implement + ~ ~ - + + -
Resources to deploy + - ~ - + + +
Generic value? - - + + ~ - -
Case insensitivity  ?  ?  ?  ?  ? + +
Performance ~ + - ~ + + +
Intrusive to FlyBase ~ + - + + + -
Research contribution - - + + ~ - -
Risk/likely robustness ~ + - ~ ~ + +
A B C D E F G


Conclusion from meeting

Initially, the leading contenders were A and F, as these were seen as getting us to interesting user-visible results most quickly.

But in further discussion, the generic/research value of working with a D2R mapping and the fact that we could use a D2R component (RDFdump?) to extract RDF from FlyBase weighed in favour of 'E', since it also allows a fallback/escape to option F is the D2R mapping proves inadequate for our purposes, and would also provide a potential route to the more interesting case of using D2R server against the live Flybase.org database.

The approach finally selected was to spend a day or two (at most) exploring D2R mapping from the FlyBase Chado database to create RDF for a Joseki SPARQL endpoint, with a fallback plan of writing a custom Java/Python script to create the RDF from Chado if D2R proves inappropriate for this task.

GK Note: reviewing this, after the meeting, I think we should start by constructing some SPARQL test queries that exemplify the kinds of information retrieval we need to perform.
Personal tools
Oxford DMP online
MIIDI
Claros