A general expectation is that web applications may access provenance information in the same way as any web resource, by dereferencing its URI. Typically, this will be by performing an HTTP GET operation. Thus, any provenance information may be associated with a URI, and may be accessed by dereferencing that URI using normal web mechanisms.
The problem of accessing some required provenance information then reduces to the problem of finding its URI, which is dealt with separately in section .
This specification thus RECOMMENDS that if a publisher wishes to make provenance information available, it is published as a normal web resource, and provision is made for the URI of the provenance to be discoverable.
This presumption of using web retrieval to access provenance does not preclude use of other mechanisms. In particular, alternative mechanisms may be needed if there is no URI associated with some particular provenance data. One such mechanism is suggested in .
On the presumption that provenance data is a resource that can be accessed using normal web retrieval, one needs to know a URI to dereference. The provenance URI may be known in advance, in which case there is nothing more to specify. If the provenance URI is not known, then a mechanism to discover a provenance URI must be based on some information that is available to the would-be accessor. We also wish to allow that provenance information could be provided by parties other than the provider of the original resource. Indeed, provenance data for a resource may be provided by several different parties, each with different concerns.
We start by considering mechanisms for the resource provider to also indicate a provenance URI. Because the resource provider controls the response when the resource is accessed, this allows for direct indication of a provenance URI. Three mechanisms are described here:
For a document accessible using HTTP, [[POWDER-DR]] describes a mechanism for associating metadata with a resource by adding an HTTP
Link header field to the HTTP response to a GET or HEAD operation (other HTTP operations are not excluded, but are not considered here). Since the POWDER specification was published, the HTTP linking draft has been approved by the IETF as [[LINK-REL]] (http://tools.ietf.org/html/rfc5988).
The same basic mechanism can be used for referencing provence data, for which a new link relation type is registered according to the template in :
Link: provenance-URI; rel="provenance"
provenance-URI is the URI of a provenance resource for which information is returned. At this time, the meaning of provenance links returned with other HTTP response codes is not defined: future revisions of this specification may define interpretations for these.
An HTTP response MAY include multiple provenance link headers, indicating a number of different resources that are known to the responding server, each providing provenance about the accessed resource.
Are the provenance resources indicated in this way to be considered authoritative? I.e. if the client trusts information returned by the server (e.g. is prepared to act on inferences based on the returned data), should it also trust the provenance data, or should trust in the linked provenance data be determined separately? If the linked data is to be trusted, then the data from multiple linked provenance resources MUST be consistent if it is to be meaningful. I favour an approach whereby trust in the provenance resources is established independently, which is similar to the situation for any other resource; e.g. based on the domain that serves it, or an associated digital signature.
For a document presented as HTML or XHTML, without regard for how it has been obtained, [[POWDER-DR]] describes a mechanism for associating metadata with a resource by adding a
<Link> element to the HTML
The same basic mechanism can be used for referencing provence data, for which a new link relation type is registered according to the template in @@CHECK USE OF LINK RELATION REGISTRY IS OK HERE@@:
<meta name="wdr.issuedby" content="http://authority.example.org/company.rdf#me"/>
<link rel="provenance" href="provenance-URI">
<title>Welcome to example.com </title>
provenance-URI is the URI of a provenance resource for the containing document.
An HTML document header MAY include multiple provenance link elements, indicating a number of different resources that are known to the creator of the document, each providing provenance about the document resource.
@@TODO - use of link relation registry with HTML <link> elements
@@TODO - The POWDER specification also adds: Documents MAY also include any of the attribution data from the POWDER document in meta tags. In particular, the issuedby field is likely to be useful to user agents deciding whether or not to fetch the full POWDER document. Any attribution data encoded in meta tags within an HTML document should be the same as that in the POWDER document. In case of discrepancy, the POWDER document should be taken as more authoritative. Is there a parallel we should add here for provenance?
If a resource is presented as RDF (in any of its recognized syntaxes, including RDFa), it may contain references to its own provenance using additional RDF statements.
For this purpose a new RDF property,
prov:hasProvenance, is defined as a relation between two resources, where the object of the property is a resource that provides provenance data about the subject resource. Multiple
prov:hasProvenance assertions may be made about a subject resource.
@@TODO: document namespace. Check naming style. Use provenance model namespace? Define as part of model?
The mechanisms for provenance discovery described above have all assumed the provenance URI is being supplied by the provider of the original resource. Where provenance information is provided by a third party without any collaboration from the original resource provider, the provenance link cannot be provided directly and a different approach must be considered.
We assume that the application or person requesting provenance information has the URI or other unique identification of the resource for which provenance is required, and also has a URI for a third-party service that provides a provenance information service. Specifically, the third party service URI is the URI of a SPARQL endpoint which is queried for the desired provenance information.
If the requester has a URI for the original resource, they simple issue a simple SPARQL query for the URI(s) of any associated provenance data; e.g., if the original resource has URI
@prefix prov: <@@TBD>
SELECT ?provenance_uri WHERE
<http://example.org/resource> prov:hasProvenance ?provenance_uri
If the requester has identifying information that is not the URI of the original resource, then they will need to construct a more elaborate query to locate the target resource and obtain its provenance URI(s). The nature of identifying information that can be used in this way will depend upon the third party service used, further definition of which is out of scope for this specification. For example, a query for a document identified by a DOI, say
1234.5678, might look like this:
@prefix prov: <@@TBD>
@prefix idservice: <@@TBD>
SELECT ?provenance_uri WHERE
[ idservice:hasDOI "1234.5678" ] prov:hasProvenance ?provenance_uri
The mechanisms described here focus on finding the URI(s) for provenance information. Below, will consider access to provenance information for which there is no separate URI.
(This section will describe the use of a SPARQL endpoint serice to obtain provenance information directly from a service provider. No new protocol or vocabulary elements are defined: the mechanisms are used are thosed described above, coupled with possible use of provenance vocabulary terms in a SPARQL query.)
(How to discover provenance services. There is nothing particular about provenance on this respect, and this section will discuss some of the available options without adding any new normative specification.)
@@TODO - registration of "provenance" link relation, per http://tools.ietf.org/html/rfc5988#section-6.2.1.
Provenance is central to establishing trust in data. If provenance information is corrupted, it may lead agents (human or software) to draw inappropriate and possibly harmful conclusions. Therefore, care is needed to ensure that the integrity of provenance data is maintained.
When using HTTP to access provenance information, or to determine a provenance URI, secure HTTP (https) SHOULD be used.
When retrieving a provenance URI from a document, steps SHOULD be taken to ensure the document itself is an accurate copy of the original whose author is being trusted (e.g. signature checking, or verifying its checksum aainst an author-provided secure web service). against
@@TODO ... privacy, access control to provenance (from Edinburgh meeting). In particular, note that the fact that a resource is openly accessible does not mean that its provenance information should also be.
@@TODO ... more, probably
Many thanks to Robin Berjon for making our lives so much easier with his cool ReSpec tool.