Text mining experiments

From ImageWeb

Jump to: navigation, search

Back to Spider Project

Contents

Open Calais

http://www.opencalais.com/calaisAPI

Initial experiments (PubMed)

Experiment with Our Paper

Actions

Copied all text from original paper. Submitted to OpenCalais using our opencalais test form.

Results

Relations: 
 PersonProfessional
Organization: 
 MRC Biostatistics Unit, 
 Brazil5 Division of International Medicine and Infectious Diseases, 
 General Assembly, 
 Brazilian Ministry of Health, 
 United Nations Human Settlements Programme, 
 Instituto Brasileiro de Geografia,
 Brazilian National Research Council,
 Oswaldo Cruz Foundation,
 MC MR,
 Oxford University, 
 World Health Organization, 
 Urban Health Council of Pau, 
 Brazilian National Commission for Ethics, 
 Weill Medical College, 
 Foundation for Statistical Computing, 
 United Nations, Rosan Barbosa, 
 Universidad de Buenos Aires, 
 The Johns Hopkins University, 
 Cornell University, 
 New York, 
 Environmental Systems Research Institute,
 Royal Tropical Institute
MedicalCondition: 
 Weil's disease, 
 visceral leishmaniasis, 
 infectious disease, 
 Leptospirosis, 
 Visceral leishmaniasis, 
 Leptospira infection, 
 leptospirosis, 
 Leptospira Infection, 
 infectious diseases, 
 Zoonoses, 
 Meningococcal disease, 
 Dengue, 
 dengue hemorrhagic fever, 
 dengue, 
 Infectious Diseases
Country: 
 United States, 
 The Netherlands, 
 Greenland, 
 Ecuador, 
 Argentina, 
 Thailand, 
 Brazil, 
 Barbados, 
 United States of America, 
 Norway, 
 Malta, 
 Bangladesh
City: 
 Mumbai, 
 Sao Paulo, 
 Nairobi, 
 Rio de Janeiro, 
 Washington, D.C., 
 Lima, 
 Florianopolis, 
 London, 
 Baltimore, 
 New York, 
 Population, 
 Universidad de Buenos Aires, 
 Guayaquil
ProvinceOrState: 
 New York, 
 Alaska, 
 Bahia
RadioStation: 
 Chang AM, 
 Assis AM, 
 GR RF FS AM, 
 FS SM AM
EmailAddress: 
 aik2001@med.cornell.edu
Continent: 
 North America
 IndustryTerm: 
 software system, 
 study site, 
 basic services, 
 Slum community site, 
 reagents/materials/analysis tools,
 polymerase chain, public health tool, 
 community study site,
 sewage drainage systems, 
 adequate sewage systems, 
 important site, 
 refuse collection services, 
 food, drainage systems, 
 slum community site, 
 rainwater drainage systems, 
 sanitation infrastructure, 
 community site, 
 open drainage systems
URL: 
 http://portal.saude.gov.br/portal/arquivos/pdf/tabela_obitos_dm_brasil.pdf, 
 http://portal.saude.gov.br/portal/arquivos/pdf/tabela_meningites_brasil.pdf, 
 http://www.un.org/millennium, http://portal.saude.gov.br/portal/arquivos/pdf/boletim_dengue_010208.pdf, 
 http://portal.saude.gov.br/portal/arquivos/pdf/tabela_lv_letalidade.pdf
Company: 
 Vasconcelos SA, WHO Collaborative Laboratory, ArcGIS 3D, Johns Hopkins University Press, John Hopkins University Press, Everard CO, Oxford University Press, Earthscan Publications Ltd., Company for Urban Development
Currency: USD
Person: 
 Guilherme Ribeiro, 
 Érika Sousa, 
 Leila Gouveia, 
 Earl Francis Cook Jr., 
 Lee Riley, 
 Analéa Lima, 
 Amaro Silva, 
 Souza Philippi, 
 Elves Maciel, 
 Panamericana de la Salud, 
 Salvador Leptospirosis, 
 Maurício Barreto, 
 Reinaldo Barreto, 
 Ana Carla Duarte, 
 Osmar Paixão, 
 Art Reingold, 
 Jorge Costa, 
 Simone Nascimento, 
 Santos Faversani, 
 Alicia Chang, 
 Ricardo E. Gurtler
Technology: 
 artificial intelligence, 
 Cad, 
 antibodies

Compare

- with manual extraction: http://imageweb.zoo.ox.ac.uk/pub/2008/plospaper/latest/documentinfo.html

Places

False negatives : Salvador, Pau de Lima False positives : Lima (from pau de lima?)


People

False positives: Salvador Leptospirosis is not a person! False negatives: TODO, several

U-Bio

http://www.ubio.org/index.php?pagename=xml_services


Actions

Submitted URL of plos paper to FindIT and TaxonFinder using our u-Bio test form.

Results

TaxonFinder

Leptospira,
Leptospira interrogans,
Leptospira kirschneri,
Rattus norvegicus,
Strina

Looks like "Strina" is a false positive - it is an author name in one of the references. We could eliminate this by submitting only the article content (eg from the article XML), rather than the whole page.

FindIT

Rattus norvegicus, 
Leptospira interrogans sensu stricto,
Leptospira sp,R[attus] norvegicus,
Leptospira,Ã?rika<,
Strina,
Marília<,
Leptospira serogroups<,
L[eptospira] kirschneri serovar Grippotyphosa,
L[eptospira] interrogans serovar Copenhageni,
L[eptospira] interrogans serovars Autumnalis,
Leptospira interrogans serovar copenhageni,
L[eptospira] borgspetersenii serovar Ballum,
Pereira,
Barros,
Souza,
Horta,
Bahia,
Barbosa,
Poisson,
Leila,
Pereira da Sá,
Censo demográfico 2000,
Sera,
Carolini,
Pellegrini

NB expands L[eptospira] eg ",L interrogans serovars Autumnalis". Also some false positives.

Oxford DMP online
MIIDI
Claros