FlyWeb/LinkedData
From ImageWeb
Contents |
Principles for publishing a linked FlyWeb
The following principles, based on the documents of LinkedData, are suggested to take in mind when publishing our FlyWeb
For choosing the URIs
- Use HTTP URIs for everything
- Define URIs in an HTTP namespace under our control
- Use mnenonic names for URIs
- Keep the URIs persistent and stable
- Consider existing conventions if naming multiple URIs for a single non-information resource is needed
- Use some kind of primary keys (e.g. Gene identifiers, ISSNs) inside your URIs to guarantee their uniqueness
Do we need multiple URIs related to a single non-information resource (say a gene)?
A typical example of the usage of such multiple URIs could be:
- use an identifier for the resource
- use an identifier for a related information resource suitable to HTML browsers (with a web page representation)
- use an identifier for a related information resource suitable to RDF browsers (with an RDF/XML representation)
For the gene resources published by our FlyTED project, we could have:
- an identifier for the gene, e.g. http://www.fly-ted.org/resource/gene/Adh
- an identifier for a related information resource suitable to HTML browsers, e.g. http://www.fly-ted.org/page/gene/Adh (which should be redirected to http://www.fly-ted.org/view/geneid/Adh.html)
- an identifier for a related information resource suitable to RDF browsers, e.g. http://www.fly-ted.org/data/gene/Adh, which should be redirected to http://www.fly-ted.org/joseki/gene/Adh.
This allows, given an identifier to the gene, a browser can either display an HTML page or an RDF file which describes the resource.
I do not think we need to worry about what should be presented to a client yet. But this should be kept in mind when we want to be part of the LinkedData community. Note that, a LinkedData Browser (such as Disco), would automatically interpret a resource identifier with the following URL: http://www.fly-ted.org/page/gene/Adh, which gives a nice HTML presentation of the RDF data about the gene.
For reusing terms from well-known vocabularies
At the moment, I can only think about the following two schemas that might be useful for our project:
- Friend-of-a-Friend (FOAF), vocabulary for describing people.
- Dublin Core (DC), defines general metadata attributes.
- Simple Knowledge Organization System (SKOS), vocabulary for representing taxonomies and loosely structured knowledge.
There are some recommendations for defining your own terms. I think this would be irrelevant for our group, but one can find the information in the section of Defining your own terms.
What should I return as RDF description for a URI
What triples should be returned in RDF representation in response to dereferencing a URI identifying a non-information resource?
- The description: all triples from your dataset that have the resource's URI as the subject.
- Backlinks: all the triples from your dataset that have the resource's URI as the object.
- Related descriptions: any additional information about related resources, e.g. say something about the author along with a book's URI.
- Metadata: any metadata you want to attached to your published data, e.g. an URI identifying the author or the licensing information.
Provenance metadata about each release of FlyWeb
This proposal adopts the concep0t of Named Graphs.
This means that for each release of FlyWeb, we have a Named Graph, say, http://flyweb.zoo.ox.ac.uk/map/1. This named graph contains at minimum all the statements that assert the resource from one data source is equivalent to the resources from other data sources on the FlyWeb. Biological knowledge is frequently updated and the biological databases which preserve this knowledge are correspondingly updated. This means that the mapping between two resource may stand at one time but not at another.
The metadata about each named mapping graph would state:
- the time the mapping occurred
- who performed the mapping
- the version of the databases that were used during the mapping
If such a provenance metadata is kept for each release of FlyWeb, we would have for a given gene resource the following descriptions:
http://www.fly-ted.org/resource/gene/Act5C
rdf:type flyted:Gene ;
foaf:page http://www.fly-ted.org/page/gene/Act5C
rdfs:isDefinedBy http://www.fly-ted.org/data/gene/Act5C
owl:sameAs http://www.fruitfly.org/gene_product/RE02927
<graph>
<uri>http://flyweb.zoo.ox.ac.uk/map/1</uri>
<triple>
<uri>http://www.fly-ted.org/resource/gene/Act5C</uri>
<uri>owl:sameAs</uri>
<uri>http://www.fruitfly.org/gene_product/RE02927</uri>
</triple>
<triple>
<uri>http://www.fly-ted.org/resource/gene/alh</uri>
<uri>owl:sameAs</uri>
<uri>http://www.fruitfly.org/gene_product/LD39491</uri>
</triple>
</graph>
http://flyweb.zoo.ox.ac.uk/map/1
dc:creation SS:MM:HH:DD:MM:YYYY
dc:createBy XXXYYY
dc:derived http://flybase.org/v5
dc:derived http://www.fly-ted.org/v1
dc:derived http://www.fruit-fly.org/20070309
Using such metadata, we can see all the mappings that have been asserted between resources, no matter whether this mapping still stands any longer or not. For example, we can image having the following statements for resource http://www.fly-ted.org/resource/gene/alh:
http://flyweb.zoo.ox.ac.uk/map/1 http://www.fly-ted.org/resource/gene/alh owl:sameAs http://www.fly-ted.org/resource/gene/LD39491 http://flyweb.zoo.ox.ac.uk/map/2 http://www.fly-ted.org/resource/gene/alh owl:sameAs http://www.fly-ted.org/resource/gene/SD04152 http://flyweb.zoo.ox.ac.uk/map/2 http://www.fly-ted.org/resource/gene/alh owl:notSameAs http://www.fly-ted.org/resource/gene/LD39491
This means that in FlyWeb's mapping release v1, gene alh is same as gene LD39491 from BDGP. However, with the updating of knowledge, this knowledge no longer stays but is replaced with the new knowledge that alh is the same as SD04152 and it is not the same as LD39491. By providing such extra metadata information about each mapping, it
- gives biologists the potential of navigating FlyWeb with the version of data resources they work with, rather than forcing them to always use the most updated version of database.
- enables scientists to debug why a new mapping is proposed and why an existed mapping disappeared
- provides more trust to FlyWeb

