Meetings/20070220/NEBC-Workshop

From ImageWeb

Jump to: navigation, search
back to Defining Image Access

NEBC – EBI Workshop

Monday 19 – Tuesday 20 February 2007

Contents

Venue

  • Monday 19th - Seminar Room, CEH Oxford
  • Tuesday 20th - Gridroom, CEH Oxford

Attendees

  • Dawn Field - NEBC, CEH Oxford
  • Bela Tiwari - NEBC, CEH Oxford
  • Tim Booth - NEBC, CEH Oxford
  • Stewart Houten - NEBC, CEH Oxford
  • Ashley Morley - NEBC, CEH Oxford
  • Susanna-assunta Sansone - EBI, Cambridge
  • Chris Taylor - EBI, Cambridge
  • Philippe Rocca-Serra - EBI, Cambridge
  • Norman Morrison - NEBC, Manchester University
  • Dave Hancock - NEBC, Manchester University
  • Allyson Lister - CISBAN, Newcastle University
  • Andy Jones (Tuesday only) - FUGE, Manchester University
  • Giles Velarde - O’mixed.org Developer, Manchester University
  • David Shotton - Zoology Dept, Oxford University
  • Graham Klyne - Zoology Dept, Oxford University
  • Tanya Gray - GSC Programmer, CEH Oxford


Agenda

Monday 19th
10.00 – 10.30 Arrive. Book in at reception. Coffee.
10.30 – 12.45 Session one
Welcome and introductions Dawn
Introductions All
Review of minutes from last meeting and brief updates Dawn
Updates
Logistics (Integration of EBI and NEBC) Dawn
NERC Omic Strategy Meeting Dawn/Bela
NEBC Website / Wiki Dawn
Update on MSI paper Norman
Update on San Diego OBI meeting (NM, SS, DF, PR-S)
Environmental Ontology Norman
MIBBI (towards “MIBBI-compliance) Chris
Update on MIBBI / Envbase websites using GCat Tim
Update on GCat Tanya
MAGE-Tab and NEBC MIAME Excel workbook Bela/Susanna
Shared interests with NEBC David Shotton and Graham Klyne
EnvBase and Handlebar Tim
12.45 – 1.30 Lunch
1.30 – 4.30 Session two
Handling multi-omic data
Repositories
The CISBAN Data Portal Allyson
Background to omixed.org a multi-omics community hub Giles
BIOMAP Bio-Tab Susanna
Investigation/Study/Assay (I/S/A) All
Use of I/S/A in MIGS Dawn
Review of Investigation / Study / Assay as previously presented All
4.30 Close
Tuesday 20th
10.00 – 10.30 Arrive. Book in at reception. Coffee.
10.30 – 12.45 Session one
Summary of NEBC strategy for next 2 years Dawn
Towards the description of multi-omic studies Andy
Discussion
12.45 – 1.30 Lunch
1.30 – 3.00 Session two
Bio-Linux development and distribution Stewart
Discussion
Next meeting, AOB wrap up
3.00 Close


Monday 19th - morning session

Welcome and introductions

Dawn

  • Dawn Field - NEBC, CEH Oxford.
  • Susanna-assunta Sansone - EBI, Cambridge
  • Philippe Rocca-Serra - EBI, Cambridge
  • Dave Hancock - NEBC, Manchester University; (developer?)
  • Norman Morrison - NEBC, Manchester University
  • Tanya Gray - GSC Programmer, CEH Oxford; (developer?)
  • Andy Jones (Tuesday only) - FuGE, Manchester University
  • Allyson Lister - CISBAN, Newcastle University; (developer?)
  • Chris Taylor - EBI, Cambridge, proteomics data standards/MIBBI
  • Giles Velarde - O’mixed.org Developer, Manchester University
  • Ashley Morley - NEBC, CEH Oxford; tech admin
  • Tim Booth - NEBC, CEH Oxford; data manager
  • David Shotton - Zoology Dept, Oxford University
  • Graham Klyne - Zoology Dept, Oxford University
  • Bela Tiwari - NEBC, CEH Oxford

Not here yet

  • Stewart Houten - NEBC, CEH Oxford

Review of minutes from last meeting and brief updates "We" is NEBC + EBI + Manchester Last meeting was 1st meeting of group Review of checklists - Chris Taylor Handlebar = sample inventory manager (Handles barcodes)


Updates

Logistics (Integration of EBI and NEBC) - Dawn

NERC Omic Strategy Meeting - Dawn/Bela

NEBC Website / Wiki - Dawn http://nebc.nox.ac.uk

ACTION: review web site and offer comments from perspective


Update on MSI paper

Norman

MSI paper - environmental metabolomics community http://msi-workgroups.sourceforge.net/

HAs been submitted - relevance, has ??? the environmental metabolomics community

Plant / in-vivo mamalial / ... / ...


Update on San Diego OBI meeting

(NM, SS, DF, PR-S)

San Diego OBI meeting Review of progress http://obi.sf.net

New OWL file is coming out.

OBI milestones available as Google calendar (Allyson).

People are exchanging skype names for teleconferences.

Susanna says something about a policy for crediting people ... (what's the context of this?)

Discussion veers off into classification within an ontology - something about in vitro - suggests a rule of thumb: if something requires more than 30 seconds discussion, don't commit the ontology either way.

Some of the discussion sounds very like elements of the ontology I developed for BioImage. Maybe there's something here we can use.

Discussion leads to community-specific tags - maybe here is a role fopr Alistair Miles' "vine" analogy.

Allyson: ID vs Accession (AC) - "back to (her) background". ID here is like RDFS:Label. Seem to be missing point of human interface vs machine interface?

Dawn: trimvirae - FUGE / OBI / MIBBI(?)

Chris: "the bit biologists can access directly" ... Makes the point of biologists being interested in their tool rather than the ontology, or whatever.


MIBBI

Chris

(towards “MIBBI-compliance)

Re-submissions of paper, recent reviews, scope, plans http://mibbi.sf.net

MIBBI is "Minimum Information for Biological and Biomedical Investigations"

Requirement to state how a trial is conducted. MIAME introduced for Microarray experiments. 2002-2004 Proteomics joins in, modular approach (mass spec, etc) for different technologies. Subset of available information. What is the minimumk required? What is practical and useful to collect? Impacts experimental design.

Communities start out with local guidelines, iterate until a sensible consensus is achieved. Guidelines issued for creating new such checklists. Goal was to "raise the bar" for experimental metadata capture. Full range of checklists can be dififcult\to discover, as each community has their own. But they also overlap, as many experimental technologies are shared. MIBBI is a portal to such checklists.

W.r.t. MIAME, raises the problem of consensus vs vagueness of specifications.

Hence MIBBI web site: http://mibbi.sf.net

Note MISFISHIE - ref. in situ hybridization.

Where there are overlapping items, different communities may still have different requirements regarding level of detail. Hence MIAMA and MIAMA/Env?

Currently, submission via Excel-based forms

Aiming to automate submission process via GCAT

First function of web site is to have a list of checklists

("Get the google ranking up" - suggest getting linkage from W3C LS group; advantages of linkage from W3C site because of validation links)

Lack of evidence for value of MIAME.

Medical / clinical studies accept value. Funders recognize the value, writing into funding requirements - like improved data sharing. Concern about diversity of checkloists. Looking to loose concept mapping. Users may be scared off by full-on ontologies (think: Protege).

??? Rich tags?

Mention involvement in IETF with msg headers/URI schemes? Addressing concensus vs diversity.

Composition of guidelines - little scope for machin e handling. But existing guidelines may offer structure to work with, to make composition easier.

GCAT to turn simple XML (schema?) into a web form. Chris mentions Pedro from Manchester. GCAT trying to offer an alternative to spreadsheet for data entry.

Investigation -> Study -> Assay (has-a) relations. Then subclass these (is-a) for details.

(Think Alan Rector normalization of ontologies - relate top of each tree to each other using non-isa relations, then do everything else by subclassing?) Suggests defining workflow at highest level only.

The high level structure Chris describes is not about modelling - but just a high-level organization for guidelines.


Update on MIBBI / Envbase websites using GCat

Tim

Making MIMI searchable…sooner than later? http://darwin.nox.ac.uk/mibbi/gcat

Tanya has written GenCat.

http://darwin.nerc-oxford.ac.uk:8080/mibbi/

Orbian (?) implementation of XForms, implemented in Javascript. Open source from commecial company. XML / XSLT pipelining engine backend. Impression is that it is flakey.

Discussion of offline/batch utility for Excel->CSR conversion. Also, Access can front-end (say) Postgres via ODBC (no surprise).

QUESTION: is there an online XML schema / XSLT transformation builder? Form designer to drop out XForm source?

Chris: won't create an XML schema, because are pushing users to use FuGE and OBI.

PEDRO os offline tool to create interfaces from XML schema / description?

Talk about OLS (?)


Update on GCat

Tanya

Search, browse, batch upload, etc http://darwin.nox.ac.uk/sandbox/gcat

GCAT is a generic tool to convert XML schema to XForm. Built on Exist XML database, uses Orboan XForms, programmed in Java.

Genomes and metagenomes - but much data doesn't have common requirements. Put together a community ... wanted to ba able to work from community's own schema, published as XSD, rather than having to\ manually re-work and track as it evoklves.

XML pipeline / XML pipeline language used to compose required function.

Dawn: talks about prioblems of mapping to (say) EMBL genome database (?) -- sounds like ripe territory for a data web?

xid value + schema version can be resolved to a definition of that element.

UI term capture - all entries are CV constrained, but new terms can be added on-the-fly (as proposed).



Monday 19th - afternoon session

Environmental Ontology

Norman

Who, where, what, when to meet – NIEeS working group? (URL to be registered)


Shared interests with NEBC

David Shotton and Graham Klyne: http://www.bioimage.org/team.do

  • Bioimage database
  • Ontogenesis network
  • Central database model not scaling for our work - universals and particulars; images are particulars; non-exhaustible supply
  • Working with Journal publishers
  • Flydata - gene expression in Drosophila. Array determination of pan-genome gene expression; study 10% by in situ hybridization. Worlflow from microarray to gene choice, primer design, make in situs and make images. Reasons for gene selection impact annotation of images. We will build a bespoke decision support system. Also simplify ...


Shared interests

  • Linking experimental data acquisition to ontologies and data storage
  • Use of ontologies
  • Ontologies for information about experimental procedures
  • Links to genomic and other 'omics databases
  • Web user interfaces for researchers' data acquisition
  • Rich tags? User-assigned labels and formal identifiers

Dawn mentions mashups...

  • Mashups; providing data sources in away that can be mashed.
  • Third party annotation.

Allyson mentions the mouse atlas; gene expression maps. David mentions Edinburgh developmental gene analysis atlas

IDEA: we could mashup with an ontologically-labelled anatomical atlas

IDEA: SciFlick - to Flickr as Connoea is to del.icio.us?

David mentions SABO work.

Norman mentions "psych" ontology (PATO - Phenotypes And Traits Ontology - see OBO).

Norman mentions the "ESP" study. Used as basis of game to get people tagging.


The CISBAN Data Portal

Allyson

(ex-EBI) (Works with Phil Lord?)

Archive front-end for experimental data; lightweight toolkit.

Centre for Integrated Systems Biology of Aging and Nutrition.

A multi-omic FuGE-based repository and web portal that supports LSIDs http://www.cisban.ac.uk/cisbanDPI.html

Implementing FuGE milestone 3 - Object Model

Functional Genomics Experiment - Object model.

http://fuge.sourceforge.net

Changes to FuGE: uses LSIDs; enhanced versioning. Only additions to UML: no deletions or changes; more queries; simple front-end web view for experimentalist users; pre-loading users and workflows.

Got users to sign up to a data policy at outset, requiring all primary data to be captured in the archive. Andromda tool used - generates XSD but no marshalling/unmarshalling - use JAXP/JAXP2 to generate this from XSD. http://www.andromda.org/

LSID authority; LSID->object, LSID+time->object

Describable associated with Endurant: no change to intrinsic value over time. May be associated with any number of "snapshot" instances that do change over time. Not directly resovable. Like concept of web resource vs representation. Fundamental changes create a new endurant.

Purpose here is to serve as an archive of the raw data set, which may nit change; but metadata may change, such changes may need to be applied. Endurants are to provide a framework for editing and change-tracking.

The principle of changing endurant here seems to be related to interporetation of experimental results. Not entirely clear, but nobody actually dispuites this.

Andy Jones will talk about FuGFE tomorrow.

Discussion of queries where part of the data is missing. Versioned or unversioned queries are possible (i.e. run against earlier versions of metadata associated with an entity).

On track for "sandbox version" by end of month.


Background to omixed.org a multi-omics community hub

Giles

Introduction: User community for omixed.org at Manchester (Introduction, related projects: MeMO etc) http://buggyvelarde.org/?page_id=2

Biological community - Orig. Streptobase Bioinformatic - MeMo(?)

Login, get connection parameters, cut&paste to maxdLoad2 (to upload?)- could use StreptoBase to browse data.

Command line interface: so 80s!

SOAP interface: seems to be popular.

UIs get URIs from SOAP service, then pulled entire dataset.

Taverna: pulling data by query to ...?

Browse database by following links.

Link mode - can link items across several tables ****(!)

Enumerate mode

(More)

There are powerful search/browse/query option s here.

All very well for those familiar with the schema; but most users want easier basic queries not requiring knowledge of the schema.

Now working with Taverna; work-in-progress to incorproate the Manchester workflow ontology (OWL-S?)

Flex implementation of UI; written in Flash (could use JS).

UI driven of two web serves: maxdBrowse and DOT to generate SVG, thenm rendered by the client. Simple search is autocomplete; advanced search is hidden from casual users.

Giles mentions Javascript in SVG

Biologists don't like maxdLoad schema. Better to start simple and add features as needed.

Taverna helps to scale applications.

??? what about simple HTTP (REST) for browsing?


Omixed

Dave Hancock

(???)

One experiment, multiple databases; proteomics, genomics, transciptomics Lead to discussion of "hacking the maxdLoad2 model", then to loadable schema file.

  • maxdLoad reads this to build database, etc.
  • maxdBrowse reads same file to build web interface.

Recycle lots of code from maxdLoad

Separate platform-specific measuring technology from the science. Everything up the the 'omics is independent

Moving toward single database containing investigation/study details...

See "Generic Exoperimental Model" slide: Everything below the green dotted line is measurement technology independent, and can be shared.

This model seems to disallow assays feeding back into technology-indpendent protocol elements -- but see AssayAnalysis

"It's basically FuGE with different labels"

So they are developing omixed:

  • everything web accessible - reading and writing
  • full programmatic access: web services drive all operations, including admin
  • zero configuration for user
  • can use Taverna to drive procedures

Web 2.0 stuff: tagging, wikis - easy interaction.

  • User generated content (enough monkeys...)
  • Tagging as ad hoc ontology development
  • Tagging enbales persistent searches
  • Informal peer review (reputation systems)

(Nice ideas, but lack the coherence of Alistair Miles' approach.)

Norman: Carole Goble et al, myExperiment has just been funded, lots of hype; repository of workflows. Collaborative filtering for workflows!

PG&P is funding consortium (Post Genomics and Proteomics).

Omixed server is currently in development. Related systems (backend): OMERO, Intermine?

But the really interesting work will be at the front end.

ActinoGEN consortium; development of MEMO

Start with simple structure based on FuGE; whateverb internal structure, send it that way. (Not clear to me if this is translation for redistribution, or on-the-fly.)

XML file to describe database schema with additional validation rules; generates SQL statements to create the database for maxdLoad2. The same format is being adopted for omixed.


Tuesday 20th - morning session

MAGE-Tab and NEBC MIAME Excel workbook

Bela/Susanna: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=17087822

Bela describes how fairly ad-hoc use of Excel worksheets better suits bioinformaticians needs for capturing experimental data.

Spreadsheets allow users to capture annotations that can subsequently be uploaded via maxdLoad.

Chris Taylor notes Pedro users similarly found Excel was easier for data capture.

David: facility in maxd web service to build a spreadsheet

Phillippe talks about MAGE-tab work at EBI which is a similar approach.

I wonder if any thought has been given to using Google spreadsheet (http://docs.google.com) - it is claimed to handle Excel spreadsheets! Do Ashley's NEBC spreadsheets load in OpenOffice? Phillippe shows the EBI spreadsheet running in OpenOffice. Bela: yes, NEBC runs in OO too.

GK: Why spreadsheets? Advantages of going online (e.g. via Google) Dawn: Familiarity. David: High data density on screen. David: Allowing users to exploit patterns in data to easy work (my interpretation). Bela: Biologists resist learning to save (1-3 days) of their time doing boring work. Phillippe: repeats point about exploiting repetitive patterns in data

Phillipe: talks about using background well known web services to enhance data in spreadsheets Giles: mention of mono project to patch open office (to access background services?)

Bela: asks about joining forces

Susanne notes that MAGE-tab is for simple use cases

Dawn: biologists need well-structured XML in the background, but dropping out tabular form may be useful.

(Thinks: data model, storage representation, user interface - the boundaries are being blurred here.)

(Thinks: Mulla Nasrudin's key ... http://storytelling.temi.co.uk/2006/11/03/mullah-nasrudin-stories/)

EnvBase and Handlebar

Tim

EnvBase and Handlebar Role in standards compliance / MetaBar http://envgen.nox.ac.uk/projects/handlebar.html

(Dawn: picking up on "spreadsheets are what biologists like")

(Initial slide: overview of old view of environmenttal genomics data)

Shows EnvBase - http://nebc.nox.ac.uk/public_catalogue.php

Edited by Pedro -- allows us to pop-up drop-down boxes

Dawn: (re. Pedro and spreadsheets) - want a pantheon of options to suit different working styles.

Capturing information about samples... creating an inventory of samples based on unique codes.

Lots of redundancy - highly denormalized. Web interface for creating barcodes; print locally or batch print at NEBC and mail to researchers. Handlebar knows about a number of sample types, each corresponding to an SQL table. Pre-generates an empty Excel spreadsheet(?) with barcodes filled in.

8-digit barcodes (project number, sample number)

BioTechniques article next month.

System used by Bremen (Max Planc inst?) on two projects; feedback:

  • security/permissions weak; curent system uses audit trail rather than permissions (wiki model?); not good enough for actual researchers; researchers want personalization, not to see other peoples stuff.
  • possible issues with barcoding address space for large multi-project insitution
  • can type invalid data into spreadsheet; SQL validation errors on upload, hard to interpret
  • want consistent interface to capture sample data and metadata - extend handlebar for submitting environmental data through same interface as entering bar code data. (e.g. add unit conversion?)

Megex (?) megx - http://www.megx.net/

Metabar re-implements large part of Handlebar in java, + security features and environmental data entry, labelled by (position,time).

??? spreadsheet validation: does open office have an easily generated programming interface?


Towards the description of multi-omic studies

Andy

Towards the description of multi-omic studies: the past, present and future of FuGE Paper in Nat Biotech consultation: http://www.nature.com/nbt/consult/index.html FuGE: http://fuge.sourceforge.net/

Challenges of data sharing - MAGE, PEDRO, "standard" models for microarrays and proteomics

Common parts are modelled differently, duplicate software, data integration difficult.

Efforts to merge MAE and PEDRO - "super-models" were even more complex, and still limited. Monster models is not the way forward. But significant advantages to only-once handling of "upstream data".

So what's common? FuGE attempts to captuyre this.

Three uses:

  • data format for simple lab workflows
  • supplement existing formats
  • framework for new formats

Object model in UML + XML schema. Java mapping, database schema, persistence layer: work in progress.

Auto-generation by AndroMDA from UML; http://www.andromda.org/

(See slides for details...)

  • FuGE
    • Common
      • Audit
      • ...
    • Bio
      • Data
      • Investigation
      • ...

Packages: (Abstract classes / templates) (Separately, there are generic versions that can be instantiated.)

  • Protocol: lab book method, sop, etc
    • Software
    • Equipment
    • Ordered seuqence of Acttions: can reference other protocols
  • ProtocolApplication
  • Investrigation package

Experimental workflow: materials (inputs, outputs), treatments (ProtocolApplications)

Ties together external formats: give existing data some metadata to connect with rest of workflow. How is a "foreign" data file fitted into a workflow? What is the experimental context? Ontology point to describe file format and associated information. No intent to pull formats apart.

Framework for new formats: subclass from basic FuGE classes.

FuGE v1 due in April 2007 - candidiate in review by PSI (?) (Proteomics Standards Institute)

Object model drives:

  • XSD
  • Database schema
  • Java code
  • ...

Formats extending FuGE

  • MAGE v2 (MGED)
  • GelML
  • analysisXML
  • spML
  • NMR ?
  • migration for mzData ?
  • upstream workflow description for all groups
  • MIARE (RNAi), flow cytometry, immunohistochemistry - ad hoc rather than formalized process

FuGE as a wrapper: FuGE investigation package "most of the work"

Data management issues

  • Automated approach - e.g., CISBAN
  • Manual coding - e.g. CPAS
  • Systems suppoirting single formats: ArrayExpress (will support MAGE2)...

Opn issues

  • how to use in house (focus so far on sharing)
  • how to avoid divergence of data standards (e.g. best practices for extending FuGE? Best practice for ontology usage?)

FuGE accepted by MGED, PSI, MSI

GK: revisit the issue from yesterday about workflows assuming array data as final output rather than part of an ongoing investigative process. Andy: FuGE makes no such assumption (the model presented seems fairly clear on that), but model "investigation structures" based on FuGE may.


BIOMAP Bio-Tab

Susanna Generic portions of I/S/A http://www.ebi.ac.uk/net-project/projects.html

Curent situation at EBI:

  • ArrayExpress (MicroArray group)
    • MageTAB
    • MiameExpress
    • Future: one schema based on MageTAB (assumes MAGE2 support) (puzzled - mageTAB is a spreadsheet format?)
  • Pride (Proteomics team)
    • Pride harvest (Excel based)
    • Repository and warehouse (BioMart) http://www.biomart.org/
    • Future: leverage with FuGE
  • What about larger domain
    • Collaborators spanning multiple communities

Net project: http://www.ebu.ac.uk/net-project

  • BioMAP - investigations covering multiple technologies and data types
  • Ontology
  • NutriBase
  • (ANO)


BioMAP - investigations covering multiple technologies and data types

  • Several assays share bthe same "study" (i.e. upstream treatment prior to assays?)
  • Study shoukld annotate more consistently for use with/by different assays
  • Many communities producing investuigations
  • Many bioinformatics centres facing data management challenge
  • Commitments to place data in the public domain

Plan to centralize storage of study data for multi-assays based investugations:

BioInv index (study); MeDa (metambolomics); ArrayExpress (Trascriptomics); Pride (proteomics)

Also: ChEBI - chemicals and metabolomics names and annotation - links to BioInv and MeDa

Batch submission using tabular format - other tab formats alongside MageTAB

Use side files: investigation file (contacts, data; S/A: type, name, relations; protocols: type, name), study file (reference: investiation and protocols; samples: source, treamment; factors, values, ...), assay files. All these converted to TAB formats for submission to EBI.

So, the whole framework looks like a base schema for combining diverse assay data with a common format for investigation/study details.

FuGE-ML (XML?); FuGE-TAB (I find it difficult to see how the latter might usefully work.)

Discussion of possibility/desirability of FuGE-TAB; Andy: no obvious way to have a schema for TAB files. Chris Taylor: JCount-? - older format, no spec, just worked from instances.

Back to turning everything into tables; I'm reminded again of Mulla Nasrudin's key

Andy says something similar: EBI had problems with XML in the past, so doesn't want to touch XML again, even when it might be genuinely appropriate.

Discussion moves to granularity of metadata - ease of capture vs ability to refine data anlysis; Bela notes differing reuirements for intra-lab (more likely to use fine-grained metadata to refine analysis) vs public use. I wonder if there's a role here for 3rd-party or separately stored annotation of public data, so new metadata can be published subsequent to original data.

Phillippe says use XML for data, but still wants a presentation layer that's just a table.

I wander noff and take a lopok at the current state of Vital (http://www.cs.kent.ac.uk/projects/vital/overview/index.html) - I note that since I last looked, an IO() capability has been added - e.g. for file input. Probably the tabular data handling isn't good enough.

Andy: all biologists can use XML; most can't use random software that doesn't already exist. (Nasrudin's key again?)

Discussion continues, more about Excel and XML (say the latter quickly?). (I think Wadlers law of language design comes into play http://www.informatik.uni-kiel.de/~curry/listarchive/0017.html)


Tuesday 20th - afternoon session

Bio-Linux development and distribution

Dawn (for Stewart)

Bio-Linux development and distribution New collaboration with National Grid Service http://nebc.nox.ac.uk/biolinux.html

Developing BioLinux as a platform to access NGS - National Grid Service. This is an early work in progress.

Minutes and slides will be posted publicly.

NEBC wiki (http://darwin.nerc-oxford.ac.uk/pgp-wiki/index.php/Main_Page ?)


Investigation/Study/Assay (I/S/A)

All RSBI: http://www.mged.org/Workgroups/rsbi/rsbi.html (click on CMAP image)

http://www.mged.org/Workgroups/rsbi/rsbi.html

Reporting Structure for Biological Investigations Working Groups (RSBI WGs)

Toxicogenomics (TWG), Environmental Genomics (EGWG), Nutrigenomics (NWG)

STANDARDIZATION OF MULTI-OMICS INVESTIGATIONS

RSBI represents communities where efforts are already underway to promote reporting standards and to develop databases for storing biological investigations employing multiple OMICS technologies.

These communities have joined forces to support and contribute to several projects and initiatives developing standards for the annotation and the exchange of experimental data and metadata.


Use of I/S/A in MIGS

(recent discussions with CAMERA)

Dawn http://gensc.sf.net CAMERA: http://camera.calit2.net/

http://gensc.sf.net or http://darwin.nox.ac.uk/gsc/gcat

Towards a richer set of information describing our complete genome collection

The aim of the Genomic Standards Consortium (GSC) is to support the community-based development of a genomic standard that captures a richer set of information about complete genomes and metagenomic datasets.

The GSC is currently working together towards the "Minimal Information about a Genome Sequence" specification.

To promote discussion and support the capture of preliminary data an XML schema has been built from the checklist and implemented as the Genome Catalogue database.

The GSC is also working towards the development of controlled vocabularies for describing genomes and this work feeds into the OBI project (An Ontology for Biomedical Investigations).


http://camera.calit2.net/

We are proud to announce that the CAMERA (Community Cyberinfrastructure for Advanced Microbial Ecology Research and Analysis) website is now available to the public for a Beta testing period. CAMERA is a user-driven site dedicated to providing the scientific community with metagenomics data and software analysis tools. This public release of the CAMERA system marks the beginning of a two-month Beta test period for CAMERA software capabilities, data, and website.

These are people who want to comply, having lots of legacy (meta?)data. Undergoing review with NEBC.


Review of Investigation / Study / Assay as previously presented

All by o'mixed (DH, SS, CT and all) and integration with thinking of RSBI / MIBBI See slides from previous presentation on omixed.org

David: Omixed happy with this hierarchy; users can apply their own labels.

David: what comes after Assay? Where do samples fit in?


NEBC strategy for next 2 years

We were unable to stay on beyond this stage of the meeting, so any remaining notes are garnered from web sites, etc., rather than a record of the meeting.

Dawn


Discussion

Wrap-up

Related topics

Pedro - XSD-driven form engine; saves XML data

Pierre - Database-agnostic query engine (for Pedro data?) - generates command line, web service, web-CGI, curses(?) query interfaces, with Java stub implementation for accessing the actual data wherever it may be found.

Personal tools
Oxford DMP online
MIIDI
Claros