FlyWeb/Coreference

From ImageWeb

Jump to: navigation, search

Contents

FlyTED Gene Name Mapping Analysis

This analysis is of the automatically generated mapping files at: http://milos2.zoo.ox.ac.uk/trac/FlyWeb/browser/FlyWeb/Trunk/apps/A/mappings

Source code of the mapping generator and analysis programs is at: http://milos2.zoo.ox.ac.uk/trac/FlyWeb/browser/FlyWeb/trunk/GeneNameMapping/src/uk/ac/ox/zoo/flyweb/scripts

Analysis run on Wed Jul 23 19:30:57 BST 2008.

Are there any FlyTED names with no mapping to FlyBase?

  • Mfn/Marf
  • scpr-b
  • Acp63.F
  • Cct-2
  • prx2540
  • Nrx4
  • so-c
  • sapr
  • mst35ba
  • alh
  • CG32394N
  • prosalpha1
  • tfb1
  • jupiter
  • CG7570/hale
  • CG17736/schuy
  • cirl
  • mst87f
  • Mst-35Ba
  • mi-2
  • CG12819/sle
  • CG14016/tomb
  • Rno
  • est-2
  • rpn6
  • aurora b
  • Hex T2
  • so-n
  • Pros28.1b
  • mat1
  • swapsi
  • CG10998/r-cup
  • SlgA
  • CG10034/tj
  • TektinA
  • acer
  • des/ifc

Found 37 missing mappings from flyted to flybase names.

Are there any FlyTED names with more than one mapping to FlyBase?

  • swa -> FBal0001840 FBgn0000376 FBgn0003655
  • sd -> FBgn0003489 FBgn0003345
  • CG11765 -> FBgn0033520 FBgn0033518
  • mle -> FBal0012332 FBgn0002774
  • sdt -> FBgn0002638 FBgn0243505
  • fdl -> FBal0125063 FBgn0045063
  • rbf -> FBgn0004903 FBgn0015799
  • dsh -> FBal0003138 FBgn0000499
  • bam -> FBgn0000158 FBgn0000235
  • mei-S332 -> FBal0012188 FBgn0002715
  • twe -> FBgn0002673 FBgn0015558
  • l(1)G0255 -> FBal0100328 FBgn0028336
  • pip -> FBgn0004399 FBgn0003089
  • And -> FBal0003276 FBgn0004511 FBgn0000094 FBgn0011273 FBgn0000259
  • spin -> FBal0150807 FBgn0086676
  • sip2 -> FBgn0031878 FBgn0029113
  • l(2)02045 -> FBal0007990 FBgn0010504

Found 17 ambiguous mappings from flyted to flybase names.

Are there any FlyBase IDs with more than one mapping to FlyTED?

  • FBgn0042189 -> CG17376 CG18461
  • FBgn0000405 -> cycB CycB
  • FBgn0002838 -> ms(3)K81 CG14251
  • FBgn0038390 -> Rbf2 rbf2
  • FBgn0011708 -> Syx5 syx5
  • FBgn0001234 -> Hsr-omega hsr-omega
  • FBgn0052548 -> CG32548 CG6306

Found 7 ambiguous mappings from flybase IDs to flyted names.

All done.

FlyTED Further Investigation

FlyTED gene names not found in flybase, further investigation by manual search at FlyBase web interface (which looks to do case-insenstive and sub-string matching)...

  • Mfn/Marf

"Mfn/Marf" No matches; "Mfn" no matches; "Marf" matches FBgn0029870

  • scpr-b

"scpr-b" matches FBgn0037888

  • Acp63.F

"Acp63.F" no matches; "Acp63" matches FBgn0015585

  • Cct-2

"Cct-2" no matches; "Cct2" matches FBgn0035231

  • prx2540

"prx2540" matches both FBgn0033520 (prx2540-1) and FBgn0033518 (prx2540-2)

  • Nrx4

"Nrx4" no matches; "Nrx" matches both FBgn0038975 (Nrx-1) and FBgn0013997 (Nrx-IV)

  • so-c
  • sapr
  • mst35ba
  • alh
  • CG32394N
  • prosalpha1
  • tfb1
  • jupiter
  • CG7570/hale
  • CG17736/schuy
  • cirl
  • mst87f
  • Mst-35Ba
  • mi-2
  • CG12819/sle
  • CG14016/tomb
  • Rno
  • est-2
  • rpn6
  • aurora b
  • Hex T2
  • so-n
  • Pros28.1b
  • mat1
  • swapsi
  • CG10998/r-cup
  • SlgA
  • CG10034/tj
  • TektinA
  • acer
  • des/ifc

FlyBase Endpoint Investigation

Find FlyBase Genes without a symbol

The SPARQL query:

Select ?gene where { ?gene rdf:type flybase:Gene . OPTIONAL {?gene flybase:symbol ?symbol }. FILTER ( !bound(?symbol)) }

We run the test on the FlyBase endpoint.

Yielding this result:

{
  "head": {
    "vars": [ "gene" ]
  } ,
  "results": {
    "bindings": [
      {
        "gene": { "type": "uri" , "value": "http://rodos.zoo.ox.ac.uk/2008/flyweb/id/flybase/CR41445" }
      } ,
      {
        "gene": { "type": "uri" , "value": "http://rodos.zoo.ox.ac.uk/2008/flyweb/id/flybase/CR33987" }
      }
    ]
  }
}

This query URI was also used, copied directly into a Firefox broswer address bar:

http://rodos.zoo.ox.ac.uk/sparqlite/sparql/flybase-genenames-20080711-hash?query=prefix+rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns%23>
+prefix+flybase:<http://rodos.zoo.ox.ac.uk/2008/flyweb/ontologies/flybase/>
Select ?gene where{?gene rdf:type flybase:Gene. OPTIONAL {?gene flybase:symbol ?symbol }. FILTER ( !bound(?symbol)) }&output=json

The above data can be found directly from Flybase using the following query, indicating that the unexpected result is not an artefact of the D2R mapping used:

select feature_id,name,uniquename,type_id from feature as f 
where f.organism_id = 1 and f.type_id=219 and f.is_obsolete = false 
order by name desc limit 10;

(See Flyweb/Gene_Name_Synonyms for details of explosing Flybase)

Find FlyBase Genes without annotation symbols

We run the test on the FlyBase endpoint.

This resulted in a long list of genes that are not associated with an annotation symbole. With our random test on the FlyBase web site, we think our result is consistent with the public FlyBase database. The results can be found at milos2.zoo.ox.ac.uk/storage1/data/FlyWeb/

Find FlyBase Genes with multiple annotation symbols

We run the test on the FlyBase gene name SPARQL endpoint.

We used 'FBgn0085444' and 'FBgn0031971' to query the FlyBase Web site, and it did return more than one annotation symbol for each query, for different reasons.

We run the following query on the FlyBase database:

select feature.feature_id, feature.name, dbxref.accession, feature.uniquename from feature 
 join feature_dbxref on feature.feature_id=feature_dbxref.feature_id 
 join dbxref on dbxref.dbxref_id=feature_dbxref.dbxref_id 
 where dbxref.db_id = '1' and feature.uniquename = 'FBgn0085444' limit 10;

And we did get more than one annotation symbol associated with each of the gene name.


We also noticed that there is only one 'main' annotation symbol associated with each gene in the returned results. We ran the following sql query on the postgres database, and we found out that the 'main' annotation symbol is associated with the value of feature_dbxref.is_current='t', and the others are associated with an 'f' value.

select feature.feature_id, feature.name, dbxref.accession, feature.uniquename, feature_dbxref.is_current from feature 
 join feature_dbxref on feature.feature_id=feature_dbxref.feature_id 
 join dbxref on dbxref.dbxref_id=feature_dbxref.dbxref_id 
 where dbxref.db_id = '1' and feature.uniquename = 'FBgn0085444';

Genes without full name

SPARQL query URI

Multiple full names for gene

Sparql query URI

Multiple name synonyms are listed by Flybase, which correspond to the results returned by our end point.

Query to Flybase database:

select feature.feature_id, feature.name, feature.uniquename, synonym.name, feature_synonym.is_current from synonym 
 join feature_synonym on synonym.synonym_id = feature_synonym.synonym_id 
 join feature on feature.feature_id=feature_synonym.feature_id 
 join cvterm on synonym.type_id = cvterm.cvterm_id
 where cvterm.name = 'fullname' 
 and   feature.uniquename = 'FBgn0000449'
 limit 10 ;

Exactly one value for 'synonym.name' has the corresponding 'feature_synonym.is_current = 't', and this also corresponds to the main name displayed by the Flybase web site.


Find genes without synonyms

Sparql query URL

Two genes 'CR41445' and 'CR33987' were returned in this query. Further investigation showed that query on the flybase database returned consistent results as the Web site, but our sparql endpoint did not. we need further investigation for this.

select feature.feature_id, synonym.name, feature.name, feature.uniquename from synonym 
 join feature_synonym on synonym.synonym_id = feature_synonym.synonym_id 
 join feature on feature.feature_id=feature_synonym.feature_id 
 where feature.uniquename = 'FBgn0053987';
Personal tools
Oxford DMP online
MIIDI
Claros