DefiningImageAccess/Tool/Semantic Portal

From ImageWeb

Jump to: navigation, search


{{#ifeq:Semantic portal|||| colspan="2" style="text-align: center; background: #CAF7B7"|
ToolName:=Semantic portal
}}

{{#ifeq:www.swed.org.uk/swed/||||style="vertical-align: top;" |Link

{{#ifeq:Proposed||||style="vertical-align: top;" |Status {{#ifeq:False||||style="vertical-align: top;" |JISCTool {{#ifeq:Semantic Web||||style="vertical-align: top;" |Focus {{#ifeq:||||style="vertical-align: top;" |Publishes {{#ifeq:||||style="vertical-align: top;" |Uses {{#ifeq:||||style="vertical-align: top;" |RelatedTo {{#ifeq:||||style="vertical-align: top;" |Partner {{#ifeq:||||style="vertical-align: top;" |Contact
DefiningImageAccess/Tool/Semantic Portal
Link:=http://www.swed.org.uk/swed/}}
Status:=Proposed}}
JISCTool:=False}}
Focus:=Semantic Web}}
[[Publishes::{{{Publishes}}}]]}}
[[Uses::{{{Uses}}}]]}}
[[RelatedTo::{{{RelatedTo}}}]]}}
[[Partner::{{{Partner}}}]]}}
[[Contact::{{{Contact}}}]]}}

Contents

Semantic portal

The semantic portal software provides a mechanism to browse through data from multiple data sources. It consists essentially of an RDF data harvester (from predefined locations), a faceted browser and a thesaurus browser. Simple rules may be defined to infer facets that are not explicit in the source data.

There is a harvesting component that can detect changes and reload source data as needed.

The software has since been repurposed to implement a conference information system (IUGO).

Observations

Source data is required to be published as RDF, and is collected to a central model store for browsing.

Customation, in particular creating template files used by the software. However, flexibility brings complexity for the configuration, which can be quite hard work; error messages caused by customization file errors can be unhelpful.

An ontology is required before data can be browsed (compare with mSpace, which we believe is more flexible in this respect).

Facet inference is mainly used with RDFS and SKOS vocabularies to provide thesaurus-derived facets (broader, narrower, etc.)

Uses Lucene text search engine.

Despite early promise, we feel this is not the best available option upon which to base our data webs design:

  • reads only RDF source files; does not work with SPARQL sources
  • there is no SPARQL support for programmatic access to the raw data (?)
  • we don't want to make complete copies of all the data sources
  • the software uses an old version of Jena
  • we understand the software is not being actively developed

Installation

Package Building

  • download the pre-compile the portal.zip and unzip it
  • run ant war in the root directory of PORTAL_HOME
  • notes for tweaking a few lines in the build.xml file:
    • change the core.dir and core.zipname to the right directory according to your installation
    • download the swed_data_entry.zip and swed_data_entry.war files from http://www.swed.org.uk/swed/swed_technical_resources.htm and put them into the dcdist.dir, as given in the build.xml
    • by default, the portal loads data from the local cache, rather than a database.

Package Deploy

  • a Java servlet container, such as Tomcat 4.* or 5.*
  • configure the user of Tomcat for deploying the portal by adding the following line to Tomcat's conf/tomcat-users.xml file:

<role rolename="portalAdmin"/> <user username="user" password="password" roles="portalAdmin"/>

  • deploy the portal.war file in the TOMCAT_HOME/webapps

Administration tasks

  • This is realized by going to the portal's administration page using the username and pwd defined for the portalAdmin role
  • reload data
  • rebuild the Lucene text index
  • and the harvester

Harvester

A harvester is part of the portal application which periodically scans a list of known RDF sources and uploads any changed data. The following actions can be taken for managing the behaviour of the harvester:

  • start a harvester
  • view existing harvested sites
  • update a new site or dataset for harvesting
  • configure or delete a harvested site

Customization

According to the Customisation documentation circulated with the Semantic Portal project, the following different levels of customisation are possible:

  • DataSource Definition
  • Styles
  • Templates
  • Extensions
  • Web.xml
  • Logging

For the purpose of the Defining Image Access project, the first 2 levels of customisation were conducted during the evaluation. Two sets of test data were used:

  • the test data circulated together with the release of portal.zip, located in the folder of data/test/
  • some RDF files of metadata retrieved from the Cambridge DSpace, converted using the Simile OAI to RDF converter (DefiningImageAccess/Tool/OAI-PMH RDFizer).

The first section of the document describes how the customisation was realised by using the first set of test data, and the second section of the document describes how it was realised for the second set of test data.

Tests with supplied data source

  1. Set up DataSource: The main configuration file that defines the DataSources to be viewed by the portal is located at WEB-INF/config/sources.n3, which is an RDF file written in N3 syntax. Semantic Portal supports loading RDF data sources of either N3 format or RDF/XML format.
  2. Facet: The tool supports facet-based navigation of resources, i.e. entities in a datasource. Each facet specifies some properties of the resources being navigated or searched over. Facets are specified using the pcv:facet property.

There are two types of facets:

  • pcv:HierarchicalFacet: which indexes resources that are classified by concepts by some hierarchical structure
  • pcv:AlphaRangeFacet: which indexes resources by the first letter of a literal-valued property

If a resource is associated with a pcv:HierarchicalFacet, it also requires:

  • an ontology to describe the concept that is used to described the resource
  • the SKOS file to give the thesaurus structure
  • a set of rules in order to performance some inference

Tests with Cambridge DSpace repository data

The RDF metadata about the Cambridge DSpace repository contains only DataType attributes. Only the pcv:AlphaRangeFacet can be used for presenting our test data. This means that no classification assisted by the SKOS or customized rules can be used to help users with browsing the repository metadata.

Personal tools
Oxford DMP online
MIIDI
Claros