FlyWeb/LinkedData

From ImageWeb

Jump to: navigation, search

Contents

Principles for publishing a linked FlyWeb

The following principles, based on the documents of LinkedData, are suggested to take in mind when publishing our FlyWeb

For choosing the URIs

Do we need multiple URIs related to a single non-information resource (say a gene)?

A typical example of the usage of such multiple URIs could be:

  • use an identifier for the resource
  • use an identifier for a related information resource suitable to HTML browsers (with a web page representation)
  • use an identifier for a related information resource suitable to RDF browsers (with an RDF/XML representation)

For the gene resources published by our FlyTED project, we could have:

This allows, given an identifier to the gene, a browser can either display an HTML page or an RDF file which describes the resource.

I do not think we need to worry about what should be presented to a client yet. But this should be kept in mind when we want to be part of the LinkedData community. Note that, a LinkedData Browser (such as Disco), would automatically interpret a resource identifier with the following URL: http://www.fly-ted.org/page/gene/Adh, which gives a nice HTML presentation of the RDF data about the gene.

For reusing terms from well-known vocabularies

At the moment, I can only think about the following two schemas that might be useful for our project:

There are some recommendations for defining your own terms. I think this would be irrelevant for our group, but one can find the information in the section of Defining your own terms.

What should I return as RDF description for a URI

What triples should be returned in RDF representation in response to dereferencing a URI identifying a non-information resource?

  • The description: all triples from your dataset that have the resource's URI as the subject.
  • Backlinks: all the triples from your dataset that have the resource's URI as the object.
  • Related descriptions: any additional information about related resources, e.g. say something about the author along with a book's URI.
  • Metadata: any metadata you want to attached to your published data, e.g. an URI identifying the author or the licensing information.

Provenance metadata about each release of FlyWeb

This proposal adopts the concep0t of Named Graphs.

This means that for each release of FlyWeb, we have a Named Graph, say, http://flyweb.zoo.ox.ac.uk/map/1. This named graph contains at minimum all the statements that assert the resource from one data source is equivalent to the resources from other data sources on the FlyWeb. Biological knowledge is frequently updated and the biological databases which preserve this knowledge are correspondingly updated. This means that the mapping between two resource may stand at one time but not at another.

The metadata about each named mapping graph would state:

  • the time the mapping occurred
  • who performed the mapping
  • the version of the databases that were used during the mapping

If such a provenance metadata is kept for each release of FlyWeb, we would have for a given gene resource the following descriptions:

http://www.fly-ted.org/resource/gene/Act5C
    rdf:type    flyted:Gene ;
    foaf:page   http://www.fly-ted.org/page/gene/Act5C  
    rdfs:isDefinedBy http://www.fly-ted.org/data/gene/Act5C
    owl:sameAs http://www.fruitfly.org/gene_product/RE02927

<graph>
    <uri>http://flyweb.zoo.ox.ac.uk/map/1</uri>
    <triple>
        <uri>http://www.fly-ted.org/resource/gene/Act5C</uri>
        <uri>owl:sameAs</uri>
        <uri>http://www.fruitfly.org/gene_product/RE02927</uri>
     </triple>
    <triple>
        <uri>http://www.fly-ted.org/resource/gene/alh</uri>
        <uri>owl:sameAs</uri>
        <uri>http://www.fruitfly.org/gene_product/LD39491</uri>
     </triple>
</graph>

http://flyweb.zoo.ox.ac.uk/map/1
   dc:creation SS:MM:HH:DD:MM:YYYY
   dc:createBy XXXYYY
   dc:derived  http://flybase.org/v5
   dc:derived  http://www.fly-ted.org/v1
   dc:derived  http://www.fruit-fly.org/20070309

Using such metadata, we can see all the mappings that have been asserted between resources, no matter whether this mapping still stands any longer or not. For example, we can image having the following statements for resource http://www.fly-ted.org/resource/gene/alh:

http://flyweb.zoo.ox.ac.uk/map/1 http://www.fly-ted.org/resource/gene/alh owl:sameAs http://www.fly-ted.org/resource/gene/LD39491
http://flyweb.zoo.ox.ac.uk/map/2 http://www.fly-ted.org/resource/gene/alh owl:sameAs http://www.fly-ted.org/resource/gene/SD04152
http://flyweb.zoo.ox.ac.uk/map/2 http://www.fly-ted.org/resource/gene/alh owl:notSameAs http://www.fly-ted.org/resource/gene/LD39491

This means that in FlyWeb's mapping release v1, gene alh is same as gene LD39491 from BDGP. However, with the updating of knowledge, this knowledge no longer stays but is replaced with the new knowledge that alh is the same as SD04152 and it is not the same as LD39491. By providing such extra metadata information about each mapping, it

  • gives biologists the potential of navigating FlyWeb with the version of data resources they work with, rather than forcing them to always use the most updated version of database.
  • enables scientists to debug why a new mapping is proposed and why an existed mapping disappeared
  • provides more trust to FlyWeb
Personal tools
Oxford DMP online
MIIDI
Claros