International Federation of Library Associations and Institutions
Login
 
Pref Your subscriptions Home Help
diglib@infoserv.inist.fr
Digital Libraries Research mailing list
   
 
List info
 
 
Subscribers: 3422

Owners
IFLA administrator
Moderators
IFLA administrator
áterry.kuny@xist.com
Terry Kuny
 
 
Subscribe
 
 
Unsubscribe
 
 
Archive
 
 
Post
 
 
Shared web
 



Advanced search
1995 01 02 03 04 05 06 07 08 09 10 11 12
1996 01 02 03 04 05 06 07 08 09 10 11 12
1997 01 02 03 04 05 06 07 08 09 10 11 12
1998 01 02 03 04 05 06 07 08 09 10 11 12
1999 01 02 03 04 05 06 07 08 09 10 11 12
2000 01 02 03 04 05 06 07 08 09 10 11 12
2001 01 02 03 04 05 06 07 08 09 10 11 12
2002 01 02 03 04 05 06 07 08 09 10 11 12
2003 01 02 03 04 05 06 07 08 09 10 11 12
2004 01 02 03 04 05 06 07 08 09 10 11 12
2005 01 02 03 04 05 06 07 08 09 10 11 12
2006 01 02 03 04 05 06 07 08 09 10 11 12
2007 01 02 03 04 05 06 07 08 09 10 11 12
2008 01 02 03 04 05 06 07 08 09 10 11 12
2009 01 02 03 04 05 06 07 08 09 10 11 12
2010 01 02 03 04 05 06 07 08 09 10 11 12
2011 01 02 03 04 05 06 07 08 09 10 11 12
2012 01 02 03 04 05 06 07 08 09 10 11 12
2013 01 02 03 04 05 06 07 08 09 10 11 12
2014 01 02 03 04 05 06 07 08 09 10 11 12
2015 01 02 03 04 05 06 07 08 09 10 11 12

  previous   Chronological   next       previous   Thread   next  

Making of America 7th anniversary Lars Aronsson
  • From: Lars Aronsson <lars@aronsson.se>
  • To: diglib@infoserv.inist.fr
  • Subject: Making of America 7th anniversary
  • Date: Mon, 22 Mar 2004 23:58:18 +0100 (CET)
On March 21, 1997, I read on this list the first announcement of the
"Making of America" digital library at the University of Michigan,
http://infoserv.inist.fr/wwsympa.fcgi/arc/diglib/1997-03/msg00054.html

At that time, MoA contained 200,000 pages in digital facsimile and was
aiming for 650,000 pages by mid-year. As of May 2003, MoA claims 3.2
million pages from 11,063 volumes of books and magazines,
http://www.hti.umich.edu/m/moagrp/

Ever since I started to "put old books on the Internet", I had been
looking for a more scalable solution than typing and proofreading, and
MoA showed that digital facsimile was the way to go. As far as I know
(and knew), MoA was the first project to demonstrate the feasibility
of this method on a large scale. I made sure Peter Suber added the
date to his Timeline of the Open Access Movement,
http://www.earlham.edu/~peters/fos/timeline.htm

My rule of thumb is to equate 20,000 pages with one linear meter of
shelving. This puts the originally announced MoA at 10 meters, which
is not much compared to any "real" library. Still, in the age before
Google (launched 1998), this was an enormous event on the Internet.
Today's MoA is 160 meters of shelving and has been joined by a large
number of other digital libraries in different countries. For
example, the German GDZ (http://gdz.sub.uni-goettingen.de/en/) has 1.8
million pages (90 meters) from 4,670 volumes online. Even small
countries like Denmark, Norway, and Sweden each have some 200,000
pages (10 meters) online.

When MoA and its digital facsimiles entered the scene, there was an
ongoing war between two kinds of e-text projects: on one side the
enthusiastic volunteer teams such as Project Gutenberg, and on the
other side the academically serious initiatives such as the University
of Virginia Electronic Text Center. The main difference was their view
on how time was best spent: On typing another book or on proofreading
the first book one more time. Either way, you just cannot know
whether you have got everything right. Digital facsimiles provided a
solution: Whenever you are in doubt about the correctness of an
e-text, you can consult the image of the printed page.

Those who thought that volunteer projects were dead once "digital
libraries" had entered this new, industrial phase, were so wrong.
Since December 2000, Project Gutenberg's Distributed Proofreaders
(www.pgdp.net) use this method for producing e-texts for Project
Gutenberg, currently at a rate of 4 or 5 thousand pages per day
(scanned and fully proofread) or 200-300 books per month. In the last
week, 1129 volunteers helped proofread books at PGDP using the web
browser as their only tool. Since PGDP started, 2.5 million pages
(125 meters) have been fully proofread in this way.

The Christian Classics Ethereal Library (www.ccel.org) was earlier to
pick up the digital facsimile technology. They haven't reached the
same production rate as PGDP, but use the technology a little
different. Whether proofread or not, the facsimile images are always
available for checking the text again.

The same combination of online proofreading from digital facsimile
images is used by the German volunteer inititative to digitize the 4th
edition of Meyers Konversations-Lexikon, a 16 volume encyclopedia from
1888 (www.meyers-konversationslexikon.de), 16,000 pages (0.8 meters),
where the last volume was scanned just the other day.

Project Runeberg (www.runeberg.org) was my four year old Scandinavian
e-text project when I learned about the Making of America website.
It then took more than a year before I quit my day job, bought a new
scanner, and started to find out how to use it. By January 1999 I had
digitized 20,000 pages (1 meter) and had a working set of data
structures, scripts and routines for the whole chain from buying old
books, chopping off the spine, feeding the pages through the scanner,
to producing static frames-free HTML pages, each containing one
facsimile image and its raw OCR text. This is an improvement because
Altavista and Google index the flat HTML pages and drive lots of
visitors to the website. By June 1999 this was 35,000 pages (1.7
meters), but then things were slowed down until after the burst of the
dotcom bubble. The last couple of years have provided more time for
volunteer projects, and the collection reached 50,000 pages (2.5
meters) in March 2002 and 100,000 pages (5 meters) in June 2003. The
latter includes the complete two first editions of the biggest Swedish
encyclopedia "Nordisk familjebok", 20 + 38 volumes. The original idea
was to let volunteers submit proofread chapters by e-mail, but in
April 2002 a web form for online proofreading (wiki-style) was
introduced, and was an instant success. By March 2004, the collection
contains 160,000 pages (8 meters) and the biggest Danish encyclopedia
is on its way in.

The biggest difference between e-text and digital facsimile is that
every book page takes 100 kilobytes instead of 2 kilobytes. Some sort
of broadband connection is necessary for this to be enjoyable. The
main benefit is the photographically exact image of the printed page
that removes all uncertainty that proofreading errors can bring. A
high quality e-text is necessary for some situations (blind readers,
modem download, PDA storage, searching) and careful proofreading is
the most time-consuming (and thus expensive) part of producing the
e-text. Since digital facsimile technology allow distributed online
proofreading using volunteers, it provides the best method for
producing e-texts.

With scanners such as the Canon DR-2080C and the ABBYY FineReader OCR
software, projects starting at a few thousand dollars can produce
digital facsimile editions at costs below $1 per page, ranging to
$0.25 per page for very large projects such as the Making of America.

My congratulations to the people in Ann Arbor and best wishes for the
next 7 years.


--
Lars Aronsson (lars@aronsson.se)
Classic Nordic Literature online since 1992 - http://runeberg.org/

  Powered by Sympa 3.3.3