DefiningImageAccess/Standard/Microformats
From ImageWeb
Contents |
Microformats
Microformats for embedding semantic data in HTML.
Description and commemnts
Microformats attempt to exploit existing constructs in HTML to encode semantic information with minimal additions.
At first glance, Microformats and RDFa are very similar, and they do share some characteristics, but they derive from nearly opposite viewpoints about how to introduce semantic markup. Microformats is based on a number of articulated principles:
- solve a specific problem (that is: address common, well understood requirements, not generalities)
- start as simple as possible
- design for humans first, machines second
- reuse building blocks from widely adopted standards
- modularity / embeddability
- enable and encourage decentralized development, content, services
(from: http://microformats.org/about/)
Not so widely articulated is the notion that other, more mainstream, approaches to semantic markup typically using namespaces identified by URIs have a tendency to promote "Tower of Babel" effects by encouraging people to make up new vocabularies than to re-use those that already exist. There is a tension here between decentralization and easy of extension versus commonality of vocabularies, and Microformats takes a position towards the "commonality" end of the spectrum. By comparison XML, RDF and RDFa take a position towards the "decentralization" end (indeed, Dan Brickley once described RDF as "a strategy for principled decentralisation" (http://danbri.org/words/2005/07/30/114).
(I tend to the view that the technical specification should not bake in social policy issues, which is what I think is at the heart of the centralization/commonality tension. #g.)
A new acronym associated with, but not specific to, microformats is POSH, which stands for "Plain Old Semantic HTML" (http://microformats.org/wiki/posh; there are some good links from the POSH checklist in this page). This amounts to an exhortation to use HTML the way it was always meant to be used, keeping structure separate from presentation markup.
Related information:
- http://tantek.com/presentations/2007/04/microformats/
- http://tantek.com/presentations/2005/03/elementsofxhtml/
- http://microformats.org/wiki/elemental-microformat, http://microformats.org/wiki/compound-microformat - a distinction is made between "elemental" and "compound" microformats.
- http://www.semantic-conference.com/2up_BW/Ogbuji-Uche-bw.pdf - another introduction, from a non microformat cheerleader; has some interesting perspectives on "semantic transparency" that I'm not sure I entirely understand.
- http://evan.prodromou.name/RDFa_vs_microformats - a comparison of Microformats and RDFa, which appears to me to be very fair and thoughtful.
Software:
- https://addons.mozilla.org/en-US/firefox/addon/4106 - Firefox plugin called "operator", can be used to view and debug embedded microformat information
- http://blog.codeeg.com/tails-firefox-extension-03/ - a Firefox extension called "tails", which can recognize and display microformats in a web page.
Relevance to data webs
A driving motivation for data webs is that they aim to use data that is "out there" on the web, rather than requiring wholesale changes to the way that data is captured and published. Microformats (using the term generically to also include RDFa) have a similar philosophy in their approach to making HTML pages machine accessible: with a small(ish) amount of additional markup, a human-readable page can also contain machine readable information. They, too, try to build on what's already "out there" in the web.
One area in which microformats are finding widespread adoption is in web pages that are machine-generated from a database (there are apparently vast numbers of hCard records published on the web). So, if we are working with people who make web sites that primarily offer a user interface for (say) searching and displaying bioinformatics information, then there might be possibilities to augment the web page templates so that applications can pick up the machine-processable elements of information. This should be reasonably easy for existing web sites (e.g. FlyBase, FlyMine). We might then implement a harvester and subscribe that machine-readable data into a data web.
Where to start? I see some easy possibilities for building demonstrators: (a) adapt the EPrints page templates so that displayed metadata includes microformat (including RDFa) markup. (b) create Semantic Media Wiki template for bibliography entries that export the bibliography as (say) RDFa using some existing bibliographic RDF schema (e.g. There is an OWL ontology for BIBTex). We'd need to check that Media Wiki allows the additional XML markup to be generated, but otherwise it should be easy enough to do. (c) develop a harvester that collects RDF from microformats (including RDFa) and re-presents that as a SPARQL endpoint. (d) implement a GRDDL transformation that extracts RDF data from FlyBase XML reports.
See also
- DefiningImageAccess/Standard/RDFa - RDFa for semantic data in XHTML
- DefiningImageAccess/Standard/GRDDL - GRDDL for semantic resources in arbitrary XML
Attributes
- Idea: Idea:=Microformat export from Semantic media Wiki
- Idea: Idea:=Microformat export from EPrints
- Idea: Idea:=Microformat for bibliographies
- Idea: Idea:=Microformat tool to convert from BIBTex and similar formats (hmmm... at this level, I'd really prefer RDFa)
- Idea: Idea:=Microformat harvester and SPARQL endpoint

