Databank-specific metadata properties
From ImageWeb
Contents |
Databank-sopecific metadata properties
NOTE: this is currently an informal draft copied from a private email from Anusha Ranganathan, which I'm posting here so that it doesn't get lost. Expect details to change. In due course, this content should be replaced with a link to an official definition
Namespaces used:
xmlns:oxds='http://vocab.ox.ac.uk/dataset/schema#' xmlns:foaf='http://xmlns.com/foaf/0.1/' xmlns:dcterms='http://purl.org/dc/terms/' xmlns:bibo='http://purl.org/ontology/bibo/' xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#' xmlns:ore='http://www.openarchives.org/ore/terms/'
oxds:currentVersion
This is used to record the latest version of the dataset.
The publishing status ontology has the term version-of-record (http://purl.org/spar/pso/version-of-record), but this seems to refer to a document that has been formally published as a peer-reviewed journal article. So it will not suit us.
May be we should just use dcterms:version. I don't think saying 'latest / current version' gives any more information than just version.
We could also use prism:versionIdentifier as David had mentioned in one of his earlier emails.
oxds:isEmbargoed
Used to indicate the dataset is under embargo
The publishing status ontology has the term embargoed (http://purl.org/spar/pso/embargoed) defined as "The status of material that is subjected to a publication embargo, which means that the material should not be published, or in the case of a press release that it should not be reported on, until a particular date known as the embargo date." This suits are purposes. Can I go ahead and use this instead?
oxds:embargoedUntil
Used to indicate a date until which a dataset in under embargo
Fabio has a term 'has embargo date' (http://purl.org/spar/fabio/hasEmbargoDate) defined as "The date before which an entity should not be published, or before which a press release should not be reported on.". Again, this suits are puposes. Again, can I use fabio:hasEmbargoDate in place of oxds:embargoedUntil?
Are there any advantages to using fabio:hasCreationDate and fabio:hasModificationDate in place of dcterms:created and dcterms:modified.
Also, I do not record the date when a data package was deposited / submitted to databank as this can be deduced from the modified date and version history. It would be very little work to record this information for files added/updated to databank if needed.
Other terms
The other two oxds terms used are oxds:DataSet and oxds:Grouping. Both of these terms are used to describe the type of a dataset. The distinction made in Databank about Dataset and Grouping is based on
- DataSet: An object containing one or many files (including zip files)
- Grouping: An object containing one or many files and folders and derived from a DataSet
Also, when unpacking a dataset in Databank I specifically look to see if the object is of type 'oxds:Grouping' to determine any identifiers supplied by the user and add it to sameAs. On hind sight, this is bad design. I should be able to work without this information and instead just check for blank nodes.
There was a reason for this design at the time: submitters may need to allocate a URI prior to submission that they can use to create cross-references, etc. The "default" case would be no URI provided, hence no owl:sameAs. I'm actually discussing this model in another forum. #g.
- So, just based on the composite parts of the object, are these two types significantly different that they have to be classified differently? I do record the relationship between the two using dcterms:hasVersion and dcterms:isVersionOf, so we do not need to capture this information in the type definition.
- David suggested calling a dataset as a data package, while Graham thought calling it a data package is misleading (I have put together the emails sent in this subject in the databank wiki)
- Onto the question of how is oxds:grouping different from ore:Aggregation, my earlier explanation does not hold water. I realize there is no need to separately distinguish an aggregation from a grouping just because an object contains folders. From all the reading I have been doing on ore:aggregation, there is no mention on what an aggregated resource should be. Again, this was my doing in the first place, and I would like to go back and clean this up and have just one type to describe all datasets or data packages and call them all of type ore:Aggregation. Does this sound sensible or have I missed something?
Do you have a link to the Databank wiki with the email links? #g.

