Meetings/20070309/JISC-Interactions
From ImageWeb
Contents |
JISC Interactions meeting
Present
- Dan Brickley
- Mike Fraser
- Phil Cross
- Balviar Notay
- Jun Zhao
- David Shotton
- Emma Tonkin
- Julie Allinson
- Graham Klyne
Introduction and background
David Shotton
About data webs - metadata harvesting
Community specific metadata aggregation
Inside out database
Regsitry for marshalling, indexing, integration; RDF for syntactic integration.
note, use of SPARQL/native query for access to existing data sources
open source - preferred, not mandatory
Advantages over Google: deep web access; purpose-specific improves s/n ratio; better semantic coherence; focus on programmatic access for secondary services.
Requirements gathering - what is there; how to marshal it; how to link back.
Examine range of repository data, not just biological, but focus on images.
Of all the myriad JISC image projects, one we clearly need to work with is Intute.
Metadata about repositories: OpenDOAR
Content metadata for image - JISC application profile for images
Eprints Application Profile - inspiring (but how well does it work with "simple" DC?)
Balviar has draft final report for Metadata generation for resource discovery project, can let us have a copy.
Intute search
Phil Cross
Involved with resource gateways (which Intute is) for 8 years.
Intute new service - July 2006(?) - for postgrad & researchers. Used to be hubs (SOSIG, EVIL, SCIGATE, ... funded by RDN (reseource Discovery network which has effectively become Intute. Intute is single point of access to service.
Intute 101
Provides access to resources that have been catalogued by subject experts.
Subject experts provide: Titles, Descrioptions, subject classification - enhances keyword searching and browsing.
- A&H - arts humanities
- H&LS - health, life Sci
- SER - Sci Eng tech
- SS - Social Sci
Approach is to provide "best of the web" - feel that majority of useful resources are done
Intute has "Collection level" metadata records - collections can vary widely w.r.t. amount of contebnt. Unlikely to catalogue in dividual images.
113900 records
Useful element is classification. Pull this out into "Repository serach project"?
Link checking (weekly) - notify subject expert.
£50000 annual subject experts in MF area (piecework), on top of core staff (management, technical, etc) of about. Total Intute budget 1.5M/year. Longevity - declining funding until 2012. Business plan in development.
MF mentions CLIC project survey of image collections. Draft consortium agreement for Intute partnership intends to release metadata under CC licence.
Also training suite to help students recognize quality materials.
Inture factual metadata and description contributed by expert, containing judgement.
Also run a harvester seeded by Intute resource catalogue; harvest links within the target site; also indexing full text of web site. Keyword search bounded by each of 4 areas, or the enture corpus. This is second level search facility beyond the initial Intute.
Peronalization: MyIntute. Save records, save searches. Can be tagged. To create reading lists. Also, javascript on intute site can insert links into 3rd party page (<script> element adds content at point of that element). Local tag clouds.
"Internet repository search" is separate software, but want to make it available through intute.
Metadata is made available through OAI-PMH and Z.39.50 and SRU/SRW. OAI-PMH is better route as it gives fully structured. But public OAI-PMH doesn't expose target web site links, but links to Intute descriptions. (Like Google for usager tracking?)
DS: How many intute users? What % in UKHE use Intute? MF: data not available at that level. MF: still much work to be done - service aimed at teachers, librarians, and other mediators. Weakest in addressing academic researchers directly. Oxford recently conducted requirements gathering from researchers, complementing RIN activity.
Requirements of requirements gathering suggests less emphasis on databases, more on "soft" resources like people, events, etc.
DS: how many registered MyIntute users? MF: not sure - 2000-10000 maybe?
Intute Include: provide Javascript code that people can paste into their web sites to access Intute searches with local branding. Also, software can be inserted into local server. Thus institutions can not have to maintain their own resource gateways, but use Intute instead, but still keep users within their own site.
"Intute" the name ... latin vocative singular "be aware", ish.
"Intute" (core service) subsumes searches over the Intute metadata repository. (Run by MIMAS in manchester.) Never launches searches across the web.
Intute Repository Search
- Slides: http://imageweb.zoo.ox.ac.uk/drupal/files/Presentation%20for%20Defining%20Image%20Access.ppt
"Intute Repository Search" is a new project. Harvester of Institutional repositories May-Aug last year initial scoping; now building demonstrator. Like OAIster, but selecting metadata from UK intitutional repos, and provides added values over those through use of good metadata developing a new metadata profile ("Scholarly works metadata profile") Pushing back against source repositories to imrove their metadata. Partners Intute (management - CW, VL) + Intute instituions, UKOLN (technical), SHERPA (stakeholdder req, advocacy, dissemination).
Demonstrator - start with standard eprints type of objects; then looking to other areas (learning objects, geospatial data, research data). Keyword search with links back to repositories. Use metadata to find full text documents, then download and index. Intending to use Cheshire 3.
Julie: ePrints captures most metadata anyway, but can't be exposed using DC.
MF: Impact of ORE. Links between images and datasets and articles...
Subject search.
Text mining tie in with ????
Harvesting from 36-ish UK repositories. keyword search gives result:
- authors
- description
- subject keywords
- Content (or "Scholarly work"?) type
- link
At the moment, metadata is poor so need to glean other information.
Data webs interactions
Ask Phil for reactions to data webs ideas. Difficult to see an obvious interaction; interesting ways in which they interrelate. Establishing connection between images and collections. Semantic information.
Balviar: not just about discovery; also disclosure.
Driver project - europe wide?
Prefers a distributed approach.
Balviar: JISC PoV Scholarly Works is the focus, but other types on digital object are coming online. Raises user interface issues, e.g. concerning what kind of response to keyword search.
@@@ My notes ...
DS: Intute stronger on Intute interface design (my point 1)
Data webs as a model for including third party holdings in the Intute web interface. Is granularity of resource a problem?
@@@ my point (3) ?
Balviar: focus project outcomes on repository-related developments, less on original Intute.
Intute programmatic interface has Z39.50, SRU, OAI (limited metadata, not resource URIs)
I think there are many difficult problems to be addressed.
Comments
So we have a landscape of improving metadata for institutional repos, and quality monitored access -- how can we make this a part of the wider emerging web, especially the Web of Data?
"Scholarly Works Application Profile" was "ePrints Application Profile"
....
Maybe test "use case" against Intute framework capabilities?
Julie on Scholarly Work application profile
(also ePrints application profile)
Issues with using simple DC
E.g. can't say what's full text and what's metadata.
For images:
- what functionality is to be supported?
- what entities are there?
- ...
Intended for data exchange, not database design.
Simple DC can be generated from full profile. (But what about other direction?)
Balviar on related JISC image DC profile
Image profile - same organization as Scholarly Work profile; based as TASI. (Starting to recruit for that.) Plus feedback group (DC community, repository devlopers, UK repository managers with images, repositories research team at UKOLN, Schoilarly Works app profile, Intute repository search project, AHDS, DEFImage, scoping study for app profile for learning materials (e.g. LOM).
Two profiles: one for images, and one for time-based media (both TASI, latter with BFCS?). DS: points out biological images are up to 5 dimensional. Also remote sensing, satellite imaging, astronomy.
Also exists a geospatial application profile. AgMap very rich - JISC planning DC-lite version app profile of it.
Metadata schema registry
IEMSR
Emma Tonkin
Phase 3.
Registry of concepts, vocabularies and profiles. RDF-based data model.
Back-end database "manipulated by" SPARQL.
(Danian Steer did Java client for it)
DC registries bunch describing everything in terms of RDF.
Adoption into registry is a way of giving these some authority (as opposed to just publishing on a web site). First part to allow people to find what's out there. Second part, application to create a new descriptiopn of a vocabulary or profile (Damian's bit); is currently being updated. Also exists a machine readable form, but human readable re-use is not yet full done.
Means to describe metadata vocabularies *other* than in RDF, even though RDF is used.
Almost any human-applied classification is, in effect informal - even when a formal controlled vocabulary is being used.
Schema development as a Computer Supported Collaborative Working problem.
Thinks: agile, test-led s/w development
"Cappucino gap" - consensus about meaning of terms. Also Karger / Schraefel "pathetic fallacy" data being presented to users as represented in computers.
Someone should write the iterative model for metadata devlopment.
Dan: plans to hook registry to harvested instance data? A: yes, not sure of status. Also, automated analysis of real use.
Balviar - JISC images working group
Don't worry too much about them. Remit is under JISC collections, a JISC content company.
Licence large amounts of content on behalf of UKFE/HE community. Negotiate better rates in bulk.
Have various working groups: geospatial, moving images, images, learning materials, ebooks. Community members, content developers, development representation.
Interested in ciommunity created content as alternative source of content against background of static or shrinking budgets. Maybe fund digitization in return for right to use the results.
Occurs to me that this might be a source of useful test data for us
ITN material - ITN newsreel material - newsfeeds, unused material - rights cleared for UK HE/FE community. Athens accessible.
Some image material from Getty.
Can be seen at edina - EMOL ...
Balviar suggests project should present to images working group, towards end of period.
Project planning meeting
DS, JZ, DanBri, GK
Next steps:
Get stuck into repository metadata review.
Defer evaluation of repository software -- may not be needed due to ubiquity of OAI-PMH
Schedule meeting to discuss Imperial repository content
Aim for Dan to help us in the following ways:
- review and comment on repository metadata findings
- review and comment on software tool findings
- help with drafting final report
- join us on Imperial College visit
ACTION: GK, DS - send Danbri email at each of 3 email addresses. danbri will respond.
ACTION: GK, sort out ImageWeb mailing list (MX DNS).
ACTION: DS, ontogenesis meeting program to DanBri
ACTION: DB, talk to TASI about the Image Application profile work in progress - get their side of the story.

