Text mining experiments
From ImageWeb
Back to Spider Project
Contents |
Open Calais
http://www.opencalais.com/calaisAPI
Initial experiments (PubMed)
- The focus of these experiments was on extracting places (Geocoding / geomashup)
- The sample data was a selection of field studies from PubMed.
- Results - see Geo_Mashup#Extracting_Place_names_and_dates
Experiment with Our Paper
Actions
Copied all text from original paper. Submitted to OpenCalais using our opencalais test form.
Results
Relations: PersonProfessional
Organization: MRC Biostatistics Unit, Brazil5 Division of International Medicine and Infectious Diseases, General Assembly, Brazilian Ministry of Health, United Nations Human Settlements Programme, Instituto Brasileiro de Geografia, Brazilian National Research Council, Oswaldo Cruz Foundation, MC MR, Oxford University, World Health Organization, Urban Health Council of Pau, Brazilian National Commission for Ethics, Weill Medical College, Foundation for Statistical Computing, United Nations, Rosan Barbosa, Universidad de Buenos Aires, The Johns Hopkins University, Cornell University, New York, Environmental Systems Research Institute, Royal Tropical Institute
MedicalCondition: Weil's disease, visceral leishmaniasis, infectious disease, Leptospirosis, Visceral leishmaniasis, Leptospira infection, leptospirosis, Leptospira Infection, infectious diseases, Zoonoses, Meningococcal disease, Dengue, dengue hemorrhagic fever, dengue, Infectious Diseases
Country: United States, The Netherlands, Greenland, Ecuador, Argentina, Thailand, Brazil, Barbados, United States of America, Norway, Malta, Bangladesh
City: Mumbai, Sao Paulo, Nairobi, Rio de Janeiro, Washington, D.C., Lima, Florianopolis, London, Baltimore, New York, Population, Universidad de Buenos Aires, Guayaquil
ProvinceOrState: New York, Alaska, Bahia
RadioStation: Chang AM, Assis AM, GR RF FS AM, FS SM AM
EmailAddress: aik2001@med.cornell.edu
Continent: North America
IndustryTerm: software system, study site, basic services, Slum community site, reagents/materials/analysis tools, polymerase chain, public health tool, community study site, sewage drainage systems, adequate sewage systems, important site, refuse collection services, food, drainage systems, slum community site, rainwater drainage systems, sanitation infrastructure, community site, open drainage systems
URL: http://portal.saude.gov.br/portal/arquivos/pdf/tabela_obitos_dm_brasil.pdf, http://portal.saude.gov.br/portal/arquivos/pdf/tabela_meningites_brasil.pdf, http://www.un.org/millennium, http://portal.saude.gov.br/portal/arquivos/pdf/boletim_dengue_010208.pdf, http://portal.saude.gov.br/portal/arquivos/pdf/tabela_lv_letalidade.pdf
Company: Vasconcelos SA, WHO Collaborative Laboratory, ArcGIS 3D, Johns Hopkins University Press, John Hopkins University Press, Everard CO, Oxford University Press, Earthscan Publications Ltd., Company for Urban Development
Currency: USD
Person: Guilherme Ribeiro, Érika Sousa, Leila Gouveia, Earl Francis Cook Jr., Lee Riley, Analéa Lima, Amaro Silva, Souza Philippi, Elves Maciel, Panamericana de la Salud, Salvador Leptospirosis, Maurício Barreto, Reinaldo Barreto, Ana Carla Duarte, Osmar Paixão, Art Reingold, Jorge Costa, Simone Nascimento, Santos Faversani, Alicia Chang, Ricardo E. Gurtler
Technology: artificial intelligence, Cad, antibodies
Compare
- with manual extraction: http://imageweb.zoo.ox.ac.uk/pub/2008/plospaper/latest/documentinfo.html
Places
False negatives : Salvador, Pau de Lima False positives : Lima (from pau de lima?)
People
False positives: Salvador Leptospirosis is not a person! False negatives: TODO, several
U-Bio
http://www.ubio.org/index.php?pagename=xml_services
Actions
Submitted URL of plos paper to FindIT and TaxonFinder using our u-Bio test form.
Results
TaxonFinder
Leptospira, Leptospira interrogans, Leptospira kirschneri, Rattus norvegicus, Strina
Looks like "Strina" is a false positive - it is an author name in one of the references. We could eliminate this by submitting only the article content (eg from the article XML), rather than the whole page.
FindIT
Rattus norvegicus, Leptospira interrogans sensu stricto, Leptospira sp,R[attus] norvegicus, Leptospira,Ã?rika<, Strina, MarÃlia<, Leptospira serogroups<, L[eptospira] kirschneri serovar Grippotyphosa, L[eptospira] interrogans serovar Copenhageni, L[eptospira] interrogans serovars Autumnalis, Leptospira interrogans serovar copenhageni, L[eptospira] borgspetersenii serovar Ballum, Pereira, Barros, Souza, Horta, Bahia, Barbosa, Poisson, Leila, Pereira da Sá, Censo demográfico 2000, Sera, Carolini, Pellegrini
NB expands L[eptospira] eg ",L interrogans serovars Autumnalis". Also some false positives.

