Brian McConnell’s latest coding effort, World Wide Lexicon Toolbar,
meets my criterion for a piece of critical infrastructure: after two
days with it I can’t get along without it, and I plan to avoid any
browser that doesn’t have it installed.
Brian is a highly adaptive programmer. With roots in the telecom
industry and several start-ups on his resume, he also wrote
Beyond Contact: A Guide to SETI and Communicating with Alien Civilizations
for O’Reilly. The
World Wide Lexicon project
he’s been working on for the past several years is again something
totally different.
Install the add-on
(currently experimental) in Firefox 3.5 or higher
and visit a page in some language other than your default. Before your
eyes, headings and text change into your native language. You can get
similar effects by submitting the page to a popular translator such as
Google (which is one of the tools used behind the scenes by the WWL
toolbar), but the instantaneous effect of the toolbar makes you feel
closer to the people whose sites you visit around the world.
There are several languages that I know well enough to get the gist of
a page, but where I miss some of the details and get frustrated by
gaps in my vocabulary. Therefore, I set the WWL toolbar to “Bilingual
view,” so each block element of the original text is shown together
with its translation. The bilingual view is considerably less
attractive, because it swells the size of each block element, but I
can tell already that it will improve my language skills quickly.
WWL is designed for volunteer translations. If it becomes more
popular, people will submit translations that are much more accurate
than the machine-generated ones the WWL must fall back on currently.
What’s the process behind this new dimension to web browsing?
McConnell let me in on some of the magic.
Volunteer translations
McConnell invented WWL several years ago with the core notion of
encouraging people to translate web pages they thought should get a
wider audience. When he first told me about the idea, I was skeptical
that he would get many volunteers. But then I heard of other volunteer
translation efforts. For instance, there’s a whole subculture of
people who write subtitles for popular Hollywood films. This runs
afoul of copyright law, of course (and so do the copies of movies
they’re attached to, probably) but they show the lengths to which
crowdsourcing has progressed in the translation area.
FLOSS Manuals,
a project I do volunteer work for, also finds dozens of people willing
to translate its open source documentation.
McConnell’s first set of tools were designed to facilitate on-the-fly
translations. Web designers could enhance their web sites by
downloading from the WWL site some JavaScript that made each text
element on the page editable. (I
blogged
about this in December 2007.) The paste-in displayed a little pencil
icon, signaling to viewers that they could do instant
translations. All they would have to do was click on an element, and a
text box would pop up where they could enter their translation. The
web site would then register the translation with the central WWL
site.
World Wide Lexicon API
The WWL API covers the entire life cycle of a translation: registering
a translation, rating translations for quality, searching for a
translation of a particular page into a particular language, and
retrieving a translation. Queries can specify a minimum rating.
Toolbar
The latest achievement of the WWL project is the toolbar officially
released yesterday. It determines the user’s native language through
settings in the browser. When each page is visited, the toolbar uses
the domain name and various tests on the text to make a guess about
its language.
The toolbar then issues an API query to see whether any human
translations exist. If so, it displays the translations with a light
yellow or green background.
If no one has made a human translation (which is usually the case so
far) the toolbar resorts to well-known machine translation
services. It can make use of
Google Translate,
Apertium, and
Moses,
each of which offers an API, and will also query Babelfish when its
API is ready. Machine translations are displayed with a light blue
or grey background.
The progressive translation used by the toolbar is also interesting.
It starts with the first 10 or 20 elements, then translates heading
tags (<H1>, etc.), then the larger texts, and ultimately every element
on a page. (I displayed one page that embedded a Google ad, and the
translator recognized and translated that text too.) McConnell is
working on making the various translations run in parallel. Because
translation changes the sizes of elements, the toolbar makes various
accommodations to display the page as attractively as it can.
In short, WWL is a cool combination of mash-ups, existing services,
crowdsourcing, and Ajax. I’m sure that in a year’s time I’ll think
back to its appearance today and be shocked at how primitive it was.
But it will remain a transformative tool for me.