Meetings/20070209/DefiningImageAccess-ToolsAndTechnologies
From ImageWeb
Tools and Technologies for Semantic Interoperability Across Scholarly Repositories
9 Feb 2007
David and Graham's combined notes from the meeting are given here.
Contents |
PROLOGUE
Attendees
Speakers
- David Shotton (Oxford University, Zoology)
- Brian Fuchs (Imperial)
- Julie Allinson (UKOLN, Bath)
- Max Wilson (ECS, Southampton)
- Euan Adie (Nature Publishing Group, London) (arriving lunchtime)
- Dan Smith (ECS, Southampton)
Dolores Iorizzo (Imperial College, London) was indisposed and unable to attend. Nikki Rogers (ILRT, Bristol)was prevented from attending because of an illness in her family. Both sent their apologies.
Others who attended
- Jessie Hey (Southampton)
- Emma Tonkin (UKOLN)
- Stephen Andrews (Science Technology Medicine, British Library)
- Ben O'Steen (Oxford Research Archive software engineer), for Sally Rumsey (OULS)
- Michael Fraser (OUCS)
- Brian Matthews (RAL)
- Geoffrey Bilder (CrossRef, Director of Strategic Initiatives)
- Neil Thomson (Head of Data & Digital Systems, The Natural History Museum)
- Graham Klyne (Oxford University, Zoology)
- David Wallom (Oxford University, OeRC)
- Anne Trefethen (Oxford University, OeRC)
Apologies for absence
- Leigh Dodds (Ingenta) - unfortunately, had to withdraw
- Alistair Miles (RAL) - unavailable
- Dan Brickley (Bristol) - unavailable
- Greg Parker (Beazely Archive) - unavailable
- Neil Caithness (OeRC) - unavailable
- Balviar Notay (JISC) - unavailable
- Panayiota Polydoratou (Imperial)- unfortunately, had to withdraw
- Jun Zhao (Oxford University)- unfortunately, had to withdraw
People Introductions
Brian Fuchs - Coordinator of Internet Centre at Imperial College (rebranding of eScience centre). HP computing with grids. Previously Max Plank institute of history of science.
Ben O'Steen, Oxford Repository, working with Sally Rumsey, previously software developer.
Neil Jeffreys, Oxford University libraries, working on repository backend, varied b/g parallel algorithms for transputers, database technology, ...
Mike Fraser - coordinates of research Technology Service at OUCS; interest at strategic level for repository(?) interopability frameworks.
Neil Thompson, Libraries at Natural History Museum.
Brian Matthews, CCLRC, eScience, Semantic Web, SKOS, digital curation, science data, etc.
Geoffrey Bilder, director strategic services at CrossRef; was previously a scholarly technology consulant, Chief Technology Officer at Ingenta, co-founded Brown University's Scholarly Technology Group in 1993 to provide advanced technology consulting on issues related to academic research, teaching and scholarly communication.
Stephen Andrews, Science Technology Medicine team at British Library. Working on UK PubMedCentral, and other small projects (VREs, etc).
Julie Allinton, UKOLN, Repositories Research Officer, supporting JISC programs, OAI-ORE.
Max Wilson, Southampton University researcher, information retrieval & Semantic Web.
Dan Smith, Southampton University researcher, Rich Tags.
Jessie Hey, Southampton University, part of broad group, library and computer science background, working with institutional repository, eprints, (and more...).
Graham Klyne, after a long career as software developer, became involved in Internat, Web and Semantic Web standards development. Now working with David Shotton to build applications using Semantic Web ideas.
David Shotton, Biologist who has become interested in information management.
Asked, with relevance to Data Protection Act, for any objections to storage of personal details (name, affiliation, email address). None heard.
Meeting Introduction
David Shotton (Oxford University)
The data web concept
- Lightweight, adopting Semantic Web and Web 2.0 concepts
- Conforming to standards
- Reusing existing tools and components
This meeting part of JISC Defining Image Access Project, but also relevant to our wider vision embodied in the BioImageWeb Consortium and in the CLAROS Project.
Characteristics of this field:
- Sharing
- Inventing truly novel methods of doing things
- Changing the world
- Bright young things, such as Max and Dan from Southampton, Julie from UKOLN and Nikki from ILRT
Purpose of this meeting – to learn from others, to generate discussion, and to INFORM us of relevant developments in three areas:
- Metadata standards and harvesting systems
- Semantic web tools for using and exposing metadata
- Systems for annotation and tagging
Particularly interested in links between tagging and formal ontologies.
METADATA STANDARDS AND HARVESTING
CIDOC and the Web
Common sense: CIDOC, interoperability and the Web - Brian Fuchs (Imperial College, London)
Presentation here.
(Also involved via Virtual Lightbox project to provide cross-repository access to cultural heritage.)
What is CIDOC CRM: a metadata model derived from data modelling technique that looks for common structure. Integration through relationships via shared concepts wrt objects in the real world: things, people, places, time and relational concepts. (This discussion of CIDOC CRM assumes some prior knowledge.)
CIDOC CRM Core is like Dublin Core, but with relations and revolves around events. E.g. monument to Balzac - metadata for this with defined relationships and dates: Bazac (subject), Rodin (sculptor, made plaster model), bronze casting team (created bronze from model after Rodin's death). Can express all these relationship in CRM in a way that would be impossible/ambiguous with DC. Very helpful and common sense.
CIDIC CRM data modelling technique. Common background; integration through relationships with shared concepts in the real world. Hence grounding in common perception. - Things, People, Places, ...
CIDOC CRM captures the detail of all the different relationships:
Analyze by events:
- Production event - modelling
- Continuation -> another production event - casting
- Occurs after -> Rodin's death
Pros and cons
Pros:
- Likely to work as integration standard
- Makes implicit relations explicit
- Enables structured record of research steps
Cons:
- Judgement call required to determine explicit relationship; disputable? What are the relations?
- Labour intensive (esp. figuring out relations, but in later discussion using a common super-relationship to facilitate this phase was an approach to consider).
Question: How to leverage searching?? How do you do create a search engine that takes into account CIDOC concepts.
Problem: The status of the CIDOC-CRM curation event itself is unclear: Is it a publication, an assertion? It is usually in the background, but one really needs self-referential metadata for provenance information. (Later discussion of coding this in RDF, and oblique references to reification.)
CIDOC in CultureGrid
FP6 Application entitled CultureGrid made with Dolores and Martin Doerr: Wanted to turn CIDOC into a curation model. Goal: integrate people/place/events in distributed diverse data. FP6 infrastructure proposal - 35 institutions, €18m, with cross linguistic support, and data mining techniques for named entity detection and event detection.
Structure: Would have involved a central server, authority server, and subject specific server. Real World Item > local index. Very complex, with 20 different services. Now glad it was not funded!
Strategy to produce a central authority list with links to distributed repository and metadata, using...
- CIDOC CRM
- Cross-language support
- Data mining techniques
High-level architecture looks very data-web like, but at the next level down, the block diagram starts to look very complex.
Application rejected! Reasons - Unsustainable business model:
- Virtual organization model - participants to pay subscription - no revenue stream except membership fee.
- Very high integration costs - no spontaneity.
- Portal approach - very tight integration needed - not scalable (contrast: our data webs approach). A push model.
Consider this evolution of thinking from that proposal to now (See slides for details):
- Then: (CultureGrid proposal): integrate via network then publish to the Internet.
- Now: (reality since that proposal): publish straight to the Internet in machine readable form, skipping the integration system. Users take information direct from the internet, without a separate integration network. Instead, have integration services within the internet (commonly via tagging) as self-standing services, with a pay-per-use integration model.
- Next: (possible shape of future proposal): add more integration services that operate as part of the Internet, combining resources from the Intenet, publishing new results back to the Internet.
Regard all parties as both producers and consumers of data - this data about how users behave returned to repositories (at present, museums don't collect this sort of data, and are poorer for it).
Challenge: How to hook CIDOC CRM data model with tag clouds, (e.g. from Flickr) and create an "authority list".
Pipeline: discovery -> composition -> visualization services.
Example: Mashup toolkit: take available sources; visual rendition of composition (functional program?) get more information -- see Imperial College Internet Centre web site - http://www.internetcentre.ic.ac.uk/web_lab.php ("data mashing", to be published soon?)
Service sources of stuff for this are already on the net: Blogging, geocoding, Del.icio.us, Flickr, etc.
E.g. visual rendition of time, place and event data for NYC into Google Maps (Presently using Eclipse, later will use Flex (http://en.wikipedia.org/wiki/Adobe_Flex?), which is rather easier to program and use.
How?
- Flatten hierarchy of suppliers and users. All users are equal; no distinction between curator/user
- Move services to Internet space
- Enable users to discover and compose services
- Eliminate institutions barriers (e.g. passwords) to use of data
- Turn CIDOC-CRM from a data model into a service (We hope Martin will do it for CLAROS)
Then could publish a curation event as a service composition, and hook it to data mining services.
Data mining services create "authority list"; also, introduces concept of "CIDOC trustedness rating" (some detail missed here). Metadata has to be coded somehow. Choose Dublin Core or CIDOC CRM.
Discussion
Business model: Open service market (for integration services) - pay-per-use, but it may not be ordinary user who pays. Integration services will be "just out there".
Business model: open service market (for integration services). Research at Imperial Internet Centre into open service economic models concludes an assumption of a pay-per-use element.
Mapping CIDOC CRM requires some form of reification/self reference (to deal with curation events?).
Few resources are actually using CIDOC today - curators think it too complex.
CIDOC may be expressed as XML and RDF representations.
Geoffrey Bilder mentioned Yahoo Pipes (just released: http://pipes.yahoo.com/; http://radar.oreilly.com/archives/2007/02/pipes_and_filte.html; http://news.zdnet.co.uk/internet/0,1000000097,39285863,00.htm) looks very much like the service composition framework Brian proposed.
Other examples: Apple Dashborad - mashup within the desktop - Google has similar.
Categorical vs conceptual metadata models: the former stays at the physical level, hence cannot express abstract relationships (DC vs CIDOC-CRM).
David Shotton mentioned Helen White-Cooper's involvement in the Drosophila GeneTrap project; contributing images with annotation using new tool - FlAnnotator - good user take-up despite clunky interface, since users want their images to be well annotated. Immediate benefit: instant searchability by concepts used. [This requirement for immediate benefit seen as key to metadata submission schemes).
Neil Jeffreys: Ideally, don't want to insist on metadata as part of deposition, but collect metadata as natural part of original data collection/acquisition process.
Brian Matthews: The PREMIS metadata model (http://www.oclc.org/research/projects/pmwg/) is good for recording curation events. Perhaps this can overcome problems of CIDOC-CRM in recording curation events.
Brian Fuchs: CIDOC does duplicate functions in OWL, etc. Conceivable strategy is CIDOC CRM module for, say, Protégé.
David Shotton mentioned similarity between CIDOC-CRM and the <INDECS> data model of relationship between people and objects through events (http://www.indecs.org/; http://www.indecs.org/pdf/framework.pdf). The INDECS work has been used as a basis for the DRM part of MPEG21.
Q: Where are the dragons in CIDOC? A: Roughly, getting the relationship structures right.
Comment: Visual programming for service compostion matches a functional programming model.
OAI-ORE and the Pathways Project
Julie Allinson (UKOLN, Bath)
http://www.openarchives.org/ore - 2 full reports on this web site give details. Presentation available here.
ORE: Object Reuse and Exchange: This is a new project with potential for wide impact, funded by Mellon and NSF, but with an international focus. 2 year project, that started October 2006. Grew out of the Augmenting Interoperability meeting in New York last March.
Not a replacement for OAI-PMH, which is metadata centric, and will continue to exist as one approach ot interoperability.. ORE Project will complement that and give richer functionality - it is resource centric.
PATHWAYS is another project upon which ORE builds: "Rethinking scholarly communications". This project proposed an interoperability infrastructure, a shared data model, use of a surrogate format, and 3 shared services. ORE is evolving its own models from this.
Development headed by Carl Lagoze (Cornell) and Herbert van de Sompel (LANL). UKOLN has a small involvement:
- Advisory Committee includes Liz Lyon.
- Technical Committee includes Andy Powell and Les Carr.
- Lisaison Group includes Rachel Heery.
Objectives
1. Develop, identify and profile extensible standards and protocols to allow interoperability in the use of digital objects.
2. Provide effective and consistent ways to discover objects, to refer to them , to disseminate them, and to aggregate and disaggregate them.
3. Establish basis for digital scholarly communication systems, including both systems that manage repositories and systems that leverage that content (e.g. data webs).
4. Based on concept of a compound digital object: might include text, image, audio, dataset, simulation, software, etc., all semantically typed.
Will allow for:
- differences in network locations - repositories, web sites, social networking sites.
- different relationships - e.g. lineage, provenance.
Examples:
- An arXiv paper with different disseminations.
- eScience publication that combines text, data, simulations.
Next step: develop use cases of workflows supporting research and learning ("Pathways").
Key concepts
- Compound digital objects - e.g. alternative views or presentations of the same thing, or text plus image components.
- Can use such structures recursively, but each must be unambiguously and uniquely identified.
- Objects and views must be modelled as resources in Web architecture.
See presentation slides for example of complex objects, and for an ORE Resource modelled as the first-class identifiable object, an ORE aggregation, and an ORE representation.
Two types of relationship:
- intra-aggregation e.g. hasPart, hasView.
- inter-aggregation hasRelationshipTo.
Resources and views of the resources (e.g. xml, html): Web does not at present support aggregation of resources and views of resources. ORE services are transactions that exchange instances of the ORE model.
3 classes of services:
- Harvest - collect batches of resources
- Obtain - get a single resource
- Register - deposit, put, post
Where data sets large, surrogates may be moved, rather than the full assets (i.e. move metadata rather than the whole digital object, that might be Gbytes in size), although this aspect is not being emphasized at present in the project.
To do: Develop Canonical Representation Format (http://www.openarchives.org/ore/documents/ORE-CNI-2006.pdf). Several candidates including MPEG-21, Pathways.
OAI-ORE is very new, but people would like it to have a wide impact, as did OAI-PMH.
Discussion
Relationship with MPEG-21?? Yes, some common structures - developers in common. MPEG-21 will be a strong contender for serialization of this abstract model.
Brian Fuchs: It looks like an extraction from Fedora model. A: Yes, it has links. But we are also engaging ePrints and DSpace.
How do you deal with synchronization - timing of object delivery and updating?
Graham Klyne: What sort of considerations should we take on board, so as to be ORE-compliant? A: Reports from the technical meeting is freely available. The basic principles (Web and RDF, and the notion of complex object) will not change.
Graham mentioned a reference to limited Web support of typed relationships. Julie: There is no standard mechanism for having typed relationships, e.g. there are a variety of extensions of DC. Graham: If it is about anything, RDF is about typed relationships. Also note the move to resurrect the link header of HTTP - the GRDLL community pushing this.
SEMANTIC WEB TOOLS
mSpace
Max Wilson (ECS, Southampton)
Context - advanced search interface, web browser based and server agnostic. Plug-in design. Open Source (soon), and particularly good for multimedia and multidimensional data and RDF. It is a Web2 application using live Javascript, and acts as a focus-less content browser. Developers: Alistair Russell, Max Wilson, Dan Smith and mc schraefel.
Beta test version at http://beta.mspace.fm. Classical music demo - multimedia and multidimensional. Slogan: "I don't know much about classical music, but I know what I like"
This field good because it is multimedia, has lots of metadata, and has technical content (e.g. the term "allegro", that the user or developer may not understand).
It is a column browser visualized as a flat set of level columns. Different attributes are shown in different columns, e.g. Era : Composer : Piece : Orchestra.
Click selection of an instance in a column shows only relevant terms in other columns, with constraints propagated in a progressive way from left to right. One can drag columns to left or right to change the hierarchy, e.g. with "Composer : Piece" can show the types of compositions written by a particular composer, or with "Piece : Composer" can show all people who composed piano sonatas. Users can chose their own path through the data.
Interface also has an Information Panel, a Collection Space in which a user may keep items of interest, and a Multimedia Preview space. It has zooming panels to permit more detailed views.
Preview cues: If don't know what baroque music sounds like, one can listen to a preview (strictly "prelisten"!).
Items of interest can simply be dragged into the Collection Space to save it. If later one drags one item back over the column panels, all relevant information will be displayed, so you will know the context of where it came from.
Development versions:
- v0.5 was open source
- v0.6 was not open source
- v0.7 has some new features and is currently being demo-ed
- v0.8 is an optimized version of v0.7 and will be open sourced in ~2 months
New features:
- Ability to embed components into 3rd party web site
- All columns start fully populated, so can immediately constrain by any facet.
- Backward highlighting - e.g. if choose data in column 3, it works backwards to provide appropriate data in columns 2 and 1
- Added numbers, e.g. Baroque (87 clips), indicates how populated each concept space is
- Use of persistent URLs, so can refer to one's view of the data: "here is link to the whole interface as I was viewing it at a particular date"
Installations:
- BBC News Archive and related articles
- Newsfilm Online - ITN and Reuters newsreels from 1920s
- Louvre Museum, to explore art [Relevant to CLAROS Project]
Future features:
- Annotation tags (Rich tags), popularity counts, time-line tool, graphing tool, etc. will be added.
How does it work?
Multi-tiered architecture: mSpace Server (SPARQL and SQL implementations) > Web server (using php script and http post) > Client side (javascript and http requests). Responses delivered in xml.
RDF used at the back end to express relationships - the mSpace model converts RDF into a faceted browser.
Discussion
Can you browse images as well as sounds? Yes, they are just displayed in the Preview Window.
How to install mSpace? Instructions and template documents are available, CSS, XSLT, Server model, etc., and all the code, and an API describing what user interface will ask for and what server will reply back.
Can you browse across multiple repositories? Yes, if you give it the information correctly: Either (ideally) you need a registry to aggregate content from all sources, then provide links to the individual sources to get data (Data Web model); or alternatively one could develop an mSpace Server that queries multiple resources simultaneously, which is more difficult because of the different metadata representations at different sites.
Is it related to zigzag structures? Response: Supplementary slide shows details, including: • mSpace is Euclidian, zz is non-Euclidian • mSpace connected by attributes of data, zz is independent of content
How to ensure data browsing is up to data, and how to add new data? Not relevant to mSpace, which is an access tool, not a data management tool.
Working to join mSpace up to ePrints. In principle, could use it to explore any repository.
Graham: Can you support multiple mSpace models (e.g. with different arrangements of columns) thought a single server? The mSpace model defines columns, and different purposes would require different columns. So, yes.
Also working on personalization - of interface appearance and user interests [of relevance to CLAROS] Works well on big screen, but also developing mSpace Mobile, a version for PDAs [also of relevance to CLAROS]
Graham: Is it possible to change between mSpace models? One can start with simple browsing, then delve more deeply and bring more columns into play.
An administration tool is being developed to assist in setting up the interface [Graham: Relevance to FlyMine]
David Wallom: Could it jump from displaying data in Southampton to displaying data in Oxford? Yes, since it is web based. (This has already been done between different data servers in Southampton)
Authentication and authorization? Done on a per service basis.
There is also a dumb version that can work without javascript.
Louvre Museum data: The Louvre would not give away their metadata, but would only allow SQL access to their database, so mSPace used D2R to capture RDF and put the user interface on top!!
What do ontologies do? They carefully inform the data being displayed, by controlling terms. But are not part of mSPace.
Semantic Portals, SWED and IUGO
Nikki Rogers (ILRT, Bristol)
Nikki sent apologies that illness in her family prevented her from attending. Her presentation is available here.
TAGGING AND ANNOTATION
Connotea as a database
Euan Adie (Nature Publishing Group, London)
What is Connotea? A social bookmarking service for scientists, inspired by del.icio.us.
Started in late 2004 and still under development - agile programming philosophy of starting small and simple, then progressively improving
Browse web for paper, then click bookmarklet in browser. Connotea harvests URL and passes it to the citation module, which harvest PUBMED ID, title, etc.
Connotea allows user to tag freely. Their tags are public-facing by default, but can be made private or embargoed (e.g. for journalists researching stories).
Can use Connotea to supplement or replace a conventional desktop bibliographic information manager (e.g. Endnote).
Connotea tags are the primary means of navigating boookmarked sites - all bookmarks must have at least one tag. Some tag names are reserved. Machine tags can be used, distinguished by namespace prefixes, e.g. geo:lat=2, geo:long=y (not a standard, but stolen from Flickr), as used in Declan Butler's mashup of Avian flu papers with Google earth (http://declanbutler.info/blog/?p=58; http://www.nature.com/nature/journal/v439/n7072/full/439006a.html).
Users can be members of groups, and can copy items from other users. In theory, Connotea can say to a user "Users who bookmarked X also liked Y" (like Amazon), but this has yet to be implemented.
Similar systems: CiteULike, Bibsonomy, FURL etc. These are not designed for academic use. Euan's presentation has a table of relative advantages and disadvantages (q.v.).
The following text taken from e-mail on this subject from Euan to David 29th January 2007:
"Connotea is open source (but run commercially, at the end of the day – we’ve tried bringing ads in, for example). One of the ideas behind being open source is that if NPG ever decides to stop supporting Connotea, the codebase and data can all be transferred elsewhere. Bibsonomy isn’t open source. Possibly the biggest single difference - Connotea has an API to allow 3rd party applications read/write access to tags & bookmarks – this is how the Eprints integration and various Greasemonkey plugins work. Bibsonomy doesn’t. Connotea has a small but active group of people using the API in cool ways – there are PubMed mashups, library visualizations etc. Connotea uses citation modules that, given an URL, return a citation. The user doesn’t need to look around for the ‘save to citation manager’ link or similar. To add citations to Bibsonomy, you need to find the relevant BibTeX (in most cases, I think that they’re now bringing in scrapers/citation modules too). Groups and friends features are better on Bibsonomy (Connotea is currently lacking in this regard)."
~30,000 active users of Connotea.
Advantages: open source and API - can run your own version of Connotea.
Connotea not under pressure to make money, although Google adverts are present on other pages in the site (not one's personal page).
Has human readable/hackable URLs: e.g. connotea.org/tag/bioinformatics, connotea.org/rss/euanadie (more - see presentation).
Connotea has useful spam filter and other nicities - 5000 spammers deleted so far!
REST-based API, open to all users. Full site functionality exposed. Can view, add, remove one's own posts, and can view posts by others. API returns XML.
Other services using API: Mashups: Interface to Pubmed, Geotags on map, can display tag cloud as a "tree map".
Extending functionality:
Using Connotea as a back-end to other systems : i.e. as a database with each item having URL, user and tags. One group picked up the W3C Annotea and created Annozilla (http://annotation.semanticweb.org/Members/lago/AnnotationTool.2003-08-25.2532) with Connotea as the back end tag store engine. This shows comments on the web page in a browser.
Robert Muetzelfeldt's MultiGuise: Stores environmental models in XML, and output styles in which these are to be displayed (tables, code, etc.) in XSLT. Retrieves data from Connotea, splits into models and style sheets, then use style sheet to apply to document and get display for user. Advantages: all data storage is on Connotea, which also undertakes functions of user authentication and spam filtering.
DICTATE Project = distributed content tagging tool for Eprints (Southampton) - bookmark Eprints documents, and can also see users who have tagged the same document. Not yet being used outside Southampton running it.
Discussion
David: Ease of installation? Connotea has well-written documentation and good mail list support (since traffic is low).
Why don't you create a VMWare image of it, to permit easy installation? Good idea, but not yet done.
Neil Thomson: Natural History Museum is using Connotea. Can it be used for material that is not on web? No, not yet - we know that this is a limitation, but bookmarks are all URIs.
Graham: Machine tags - is this an easy way of mixing informal tags and formally constrained information? How do machine tags work? They are just strings, and Connotea does not treat them any different from other tags - all are in the same database table, so one can't do a search of the kind "Find me longitude greater than 70 degrees".
Robert Muetzelfeldt has hacked the recognition of URIs into something special.
Rich Tags
Dan Smith (ECS, Southampton)
http://mspace.fm/projects/richtags. Presentation here.
Better exploration of digital repositories with semantic social tagging. Project will last 12 months from next Monday. Aims to provide cross-repository exploration using tags in more meaningful way than currently achieved.
Finding: people prefer to use Google rather than a local keyword search. Know that people want to explore concepts: how one thing here relates to that over there - hence recognize the need to relate different tags for same content.
Deliverable: a test framework and user evaluations.
Because tags have no context, they are ambiguous. Thus there are no semantics attached to a geolocation reference. What does it mean - was something written there, or is the paper about that place?
Examples of ambiguity (many meanings for a tag): Orange: fruit, colour, phone. Diversity of terms for the same thing (many tags for a meaning): telephone, phone, mobile, cellphone; film, video, movie. Other tags have only personal individualistic meaning, unquessable to outsiders.
Machine tags e.g. "foaf:depicts=DanielSmith" can be mapped into RDF, but as the tag is just a string, not a URI, thus it is intrinsically ambiguous.
Rich tags aim to keep this simplicity, but to back these tags with semantics, creating easy-to-use tag hierarchies to aid disambiguation. RichTags will permit exploration through these tag hierarchies (today's conventional tags are 'flat', not hierarchical).
Propose user interface with auto-completion of possible tags, showing the tag trees of possible tags to permit the user to disambiguate the possibilities. Project will determine the best way to created these displays.
Users can drill down to find meaning - associative exploration: see things with the same tags, and find related work through a hierarchy matching algorithm.
Problem: How do we seed or bootstrap the system - otherwise the first users will have nothing to select.
Annotations will be exposed through web services and RDF import/export, SOAP and AJAX.
Good tag-to-ontology mapping is crucial for cross-repository sharing and reuse, and the user interface will be key here, so the project will involve extensive iterative user evaluations.
More details of the project is on the web site, with references.
Related projects:
Tag Network Narrative (AHRC) is exploring tags in digital narrative research project.
Connotea and DICTATE are related.
Ontowiki (http://3ba.se/; http://wiki.ontoworld.org/index.php/OntoWiki) has different classes of users - some will have sufficient knowledge to express links as ontologies, while others just use them.
Aquabrowser (http://www.medialab.nl/) is a browser plugin that allow you to see related terms in searches.
Discussion
Graham: Hierarchies: RDF doesn't break down neatly into hierarchies, so needs to make hard choices if doing so. If RichTags requires tag terms to have single parents, does this limit the 'open world' view? If you try to tag something that does not exist, the system will ask for context: From this a classification can evolve, through methods of usage. Danger that early entries to the system will limit future lines of enquiry?
Could you seed the system using Wordnet or OpenCyc?
Queries - will these be possible using SPARQL too? Yes.
Relationship with CIDOC: If we have DC metadata, can we use RichTags to map it to a more coherent model such as CIDOC-CRM?
Private thought: Machine Tags - are we in danger of losing the advantages of namespaces, if we just use a prefix instead of a proper URI?
IUGO - Nikki Rogers
- Slides: http://imageweb.zoo.ox.ac.uk/drupal/files/Rogers%20IUGO%20presentation%20Oxford%2009-02-2007.ppt
Nikki was unable to attend, but sent her slides about IUGO - link above.
END OF MEETING
Next meeting
9 March at UKOLN, Bath. Theme: JISC interactions, especially INTUTE SEARCH.

