DefiningImageAccess/Tool/Joseki

From ImageWeb

Jump to: navigation, search


{{#ifeq:Joseki|||| colspan="2" style="text-align: center; background: #CAF7B7"|
ToolName:=Joseki
}}

{{#ifeq:www.joseki.org/||||style="vertical-align: top;" |Link

{{#ifeq:Active||||style="vertical-align: top;" |Status {{#ifeq:False||||style="vertical-align: top;" |JISCTool {{#ifeq:DataEngine, SPARQL Query||||style="vertical-align: top;" |Focus {{#ifeq:||||style="vertical-align: top;" |Publishes {{#ifeq:||||style="vertical-align: top;" |Uses {{#ifeq:||||style="vertical-align: top;" |RelatedTo {{#ifeq:||||style="vertical-align: top;" |Partner {{#ifeq:||||style="vertical-align: top;" |Contact
DefiningImageAccess/Tool/Joseki
Link:=http://www.joseki.org/}}
Status:=Active}}
JISCTool:=False}}
Focus:=DataEngine, SPARQL Query}}
[[Publishes::{{{Publishes}}}]]}}
[[Uses::{{{Uses}}}]]}}
[[RelatedTo::{{{RelatedTo}}}]]}}
[[Partner::{{{Partner}}}]]}}
[[Contact::{{{Contact}}}]]}}

Contents

Joseki

Joseki is an HTTP engine that supports the SPARQL Protocol and the SPARQL RDF Query language. The query engine is based on ARQ.


ARQ 2.0

(From an announcement by Andy Seaborne on 8 May 2007)

ARQ 2.0 is a complete implementation of SPARQL including support for custom filter function, property functions and free text search.

This release also includes a first implementation of the SPARQL/Update language. The API for this is not yet stable; suggestions and feedback on it's utility most welcome.

ARQ 2.0 provides the same external API as ARQ 1.5. Internally, query execution follows the SPARQL algebra, in development by the RDF Data Access Working Group. There has been significant reorganisation and renaming of implementation code to align with terminology of the SPARQL specification.

This version also makes it easier to reuse and extend ARQ. ARQ can also be used just as a SPARQL parser, as a SPARQL algebra generator or as a basis for specialised query engines.

As well as custom functions and custom property functions, the SPARQL algebra can be extended with new operators. There is a new internal language for reading and writing algebra expressions directly, allowing experimentation in query optimization or query features not available in SPARQL currently.

Download:

Home page:

Other links

Observations

Very easy to install.

SPARQL support, based on the Jena ARQ query engine. Also has a simple update mechanism (this is non-standard, as there is currently no standard update mechanism).

Also includes SPARQLer, a basic web interfacefor Joseki, or any SPARQL endpoint.

Looks like a very promising candidate local RDF store for the aggregator and other components.

Notes of experiment with publishing SPARQL endpoints with Joseki

This experiment publishes a SPARQOL endpoint which uses a Jena database as the backend. The dataset is the RDF metadata extracted from the Cambridge DSpace, using the OAIRDFizer. The following configuration is given in the joseki-config.ttl. The document given on the Joseki web site is imcomplete.

Firstly, defining a new service which handles a given database, i.e. _:images. Note, this service (image) needs to be noted in the web.xml file in order to map a query over the endpoint (http://hostname:port/image?query...) to the service.

# Service 3 - SPARQL processor only handling a given dataset
[]
    rdf:type            joseki:Service ;
    rdfs:label          "SPARQL on the image metadata" ;
    joseki:serviceRef   "image" ;
    # dataset part
    joseki:dataset      _:images ;
    # Service part.
    # This processor will not allow either the protocol,
    # nor the query, to specify the dataset.
    joseki:processor    joseki:ProcessorSPARQL_FixedDS ;
.

Then, defining a dataset, named as _:images, which contains a graph stored in a RDBModel. The RDBModel is configured as DEFAULT, as the Jena database does not contain a named graph.

_:images rdf:type ja:RDFDataset ;
        rdfs:label "DSpaceImage" ;
        ja:defaultGraph _:myDatabase 
        .

_:myDatabase   rdf:type    ja:RDBModel ;
    rdfs:label "Images" ;
    ja:connection
    [
        ja:dbType "MySQL" ;
        ja:dbURL          <jdbc:mysql://hostname:port/image> ;
        ja:dbUser         "user" ;
        ja:dbPassword     "pwd" ;
        ja:dbClass        "com.mysql.jdbc.Driver" ;
    ] ;
    ja:reificationMode    ja:minimal ;
    ja:modelName "DEFAULT"
    .

Install Joseki under Tomcat on Chios

Install Tomcat

Kept in my local computer

Install Joseki under Tomcat

This is not working for joseki3.1 but previous release. Better just copy things directly to the webapps/ folder of tomcat.

After configuring the Joseki server:

  1. copy the configuration files, including "joseki-config.ttl", "log4j.properties", and "server-config.wsdd", to the /webapp/WEB-INF/ directory.
  2. run 'ant war-file' in Joseki root directory
  3. copy the war file to TOMCAT_HOME/webapps
  4. copy the "Data" folder to TOMCAT_HOME
  5. restart Tomcat server

The current endpoints include:

Run Tomcat behind Apache

  • unpack the tar file into a working directory: /opt/tomcat-connectors-1.2.25-src
  • missing apxs

To check wether apxs is installed, run: locate apxs (updatedb first if necessary)

If apxs is missing in the installed apache server, it can be installed manual by: yum install httpd-devel and then it should be installed under /usr/sbin/apxs.

  • build the source
    • go into directory native under the working directory
    • run ./configure --with-apxs=/usr/sbin/apxs
  • Run make under directory native
    • make
    • su -c 'make install'
  • Configure the JK connector
    • in the /etc/httpd/conf.d, create file jk_mod.conf, and add the following lines
<IfModule !mod_jk.c> 
   LoadModule jk_module modules/mod_jk.so
</IfModule>

JkWorkersFile "/etc/httpd/conf.d/workers.properties"
JkLogFile "/var/log/httpd/mod_jk.log"

# JkLogLevel emerg
JkLogLevel debug

JkMount /joseki/* ajp13w

The /joseki/* is the context under Tomcat.

    • in the /etc/httpd/conf.d/ create file workers.properties, and add the following lines:
# workers.properties.minimal - 
#
# This file provides minimal jk configuration properties needed to
# connect to Tomcat.
#
# The workers that jk should create and work with
#

worker.list=wlb,jkstatus,ajp13w

#
# Defining a worker named ajp13w and of type ajp13
# Note that the name and the type do not have to match.
#
worker.ajp13w.type=ajp13
worker.ajp13w.host=chios.zoo.ox.ac.uk
worker.ajp13w.port=8009

#
# Defining a load balancer
# 

worker.wlb.type=lb
worker.wlb.balance_workers=ajp13w

#
# Define status worker
#

worker.jkstatus.type=status
  • Configure Tomcat to use the JK Connector

Edit file /var/lib/tomcat/conf/server.xml,

<!-- Define an AJP 1.3 Connector on port 8009 --> 
<Connector port="8009" redirectPort="8443" protocol="AJP/1.3"/>

Configure database for endpoint

In the first step, I am still using MySQL. In the next step, I will migrate to PostgreSQL/SRB, which is recommended by the Joseki team.

Configure data into Jena using model loader

This command-line tool is used to load RDF N3 image metadata into Jena RDF database, for building the Joseki SPARQL endpoint. This is part of the script for

  • harvesting metadata from EPrints using OAI-PMH protocol;
  • translating it into N3 format and writing it to local disk;
  • and then loading this metadata file into Jena database using the cmd-line Jena model loader.

1. Initialise the database:

java -cp lib\jena.jar;lib\mysql-connector-java-3.0.10-stable-bin.jar;lib\commons-logging-1.0.jar;
lib\xercesImpl.jar jena.dbcreate --db jdbc:mysql://localhost:3306/flyted --dbUser root -dbPassword
***** --dbType mysql

2. Load the data:

java -cp lib\jena.jar;lib\mysql-connector-java-3.0.10-stable-bin.jar;lib\commons-logging-1.0.jar;
lib\xercesImpl.jar;lib\iri.jar;lib\icu4j_3_4.jar;lib\antlr-2.7.5.jar jena.dbload --db
jdbc:mysql://localhost:3306/flyted --dbUser root -dbPassword **** --dbType mysql 
file:translator_flyted_big.n3

Browse the endpoint using jSpace 0.40

  • Environment
    • SPARQL endpoint published using Joseki 3.1. and installed on both a linux and a windows computer
    • jSpace 0.40
  • Dataset
    • RDF/XML metadata about drosophila images, which are kept in a Jena/MySQL store, and accessed through the joseki sparql endpoint: http://chios.zoo.ox.ac.uk/joseki/flyted
    • N3 jSpace model file to present the RDF metadata
  • To run the experiment
    • load the metadata file by "Add Source" -> "SPARQL", and give the name, and URL of the endpoint;
    • load the model file by "Add Model", and then give the URL of the model file.
  • Problems
    • cannot load the Joseki endpoint without setting the parameter of "default graph": This turns out to be a bug in jSpace 0.40, which is fixed in v0.41. The error message can simply be ignored. The data source can still be loaded after ignoring the errors.
    • to read jSpace's logs: The logs can be found at "C:\Documents and Settings\user name\Local Settings\Temp". Logs from Tomcat($TOMCAT_HOME/logs) and Joseki(by putting the file log4j.properties under "joseki\WEB-INF\classes") can also be used for debugging.

Browse the endpoint using jSpace 0.41

The goal of using v.41 is to allow us, within jSpace, to browse web pages about each of the images.

Michael's email suggests that there are three possibilities:

  • using the default behaviour of v.41, which takes the string value of the currently selected resource to google for that term restricted to site:wikipedia.org, and then and navigates to the first hit.
  • the second choice is to use the jSpace URL builder, which uses the label of the column the selection to retrieve web pages, rather than the selected column itself.
  • the third option is to do a regular google "I'm feeling lucky" search with the text of the current selection. "This is "dangerous" as its the slowest option, most likely to turn up something un-renderable, and is not always relevant to the search domain."

What jSpace has done for their baseball demo on their web site is based on a custom URL builder's that would go to baseball specific sites, such as retrosheet.org where they got the data from, or to restrict the web queries to google/wikipedia to baseball related stuff, i.e. include the string "baseball" in the search query.

For our experiment, we created a customised Web view builder, which takes the URL of a currently selected image and then navigates to the FlyTED site using that image URL.

Create our customised jSpace Web view builder

This is basically one class that implements jspace.impl.webview.ViewURLBuilder. The code of this class is as following:

 public URL buildViewURL(swapi.Node arg0) throws JSpaceException {
     URL result = null; 
     try {
	result = new URL("http://www.fly-ted.org/" + arg0.toString().split(":")[2]);
				
     } catch (Exception e) {
       
         throw new JSpaceException(e); 
		
     }
     return result;
 }	

This requires us to run jSpace locally, including our extention jar file in the classpath, using the following command:

java -cp "myjspace.jar;jspace.jar" jspace.JSpace

Summary of Joseki 3.1/Postgresql/FlyTED configuration steps

These notes are rather sketchy, having been made after the event from memory - they need tidying up and refinement.

These notes are taken from the procedure for reconfiguring an installed Joseki system to create a fLYted new endpoint service using a Jena Model loaded into a Postgresql database. As such, it briefly touches many of the configuration aspects of Joseki.

See also: http://milos2.zoo.ox.ac.uk/ibrgtech/index.php/Rodos_setup

  • Check PostgresQL database is installed and running: use pgsql(?) to login and check tables.
  • Check that JDBC copnnectfor for Postgress is installed and available to Joseki. Joseki uses JDBC3 module.
  • Check FlyTED data (or required RDF data) has been loaded into PostgresQL (how?)
  • Extend Joseki 3.1 configuration with new service and dataset and model definitions. Note that a Joseki service connects a datraset to a processor. The processor is typically shared by a number of services. The RDF modelname (ja:modelName) is "DEFAULT" unless the model is loaded using named graphs (?).
    • Configuration files:
 /var/lib/tomcat/webapps/joseki3-1

or

 $CATALINA_HOME/webapps/joseki3-1
 $CATALINA_HOME/webapps/joseki3-1/WEB-INF/joseki31-config.ttl
 $CATALINA_HOME/webapps/joseki3-1/WEB-INF/web.xml
    • Joseki config notes:
 ja:RDFDataset flyted-pgsql-rdb
    • web.xml notes:
 <param-name>org.joseki.rdfserver.config</>
 <param-name>/var/lib/tomcat/webapps/joseki3-1/WEB-INF/joseki31-config.ttl</>
  • Put PostgresQL JDBC connector in Joseki webapp lib area:
 $CATALINA_HOME/webapps/joseki3-1/WEB-INF/lib/
  • Test the SPARQL endpoint
    • For logs:
 tail -f $CATALINA_HOME/logs/catalina.out

The six requirements in jSpace

1. (YES) R1: Browse images by the gene names. 2. (YES) R2: Find all its images of one gene, which contain the same expression pattern, regardless of their strains. 3. (Yes) R3: Find all its images of one gene, which contain the same set of expression patterns, regardless of their strains.

When select more than one condition in a column, jSpace allows users to set the relationship between these conditions as either "And" or "Or".

4. (Yes) R4: Classify the images of one gene by their strains and then expression patterns.

5. (Yes) R5: Find all the images showing a certain set of patterns of a particular strain.

6. (TODO) R6: Find all the images containing both somatic cells and gene expressions in the germ line.

7. (NO) R7: Find all the images NOT showing a certain set of patterns of a particular strain.

Personal tools
Oxford DMP online
MIIDI
Claros