|
Project
Gutenberg (PG) began in 1971 when Michael Stern Hart1 was
given one hundred million dollars worth of computer time by the operators
of the Xerox Sigma V mainframe at the Materials Research Laboratory at the
University of Illinois. Mr Hart suggests that he happened to be in the
right place at the right time as there was more computer time than people
knew what to do with and the operators were encouraged to do whatever they
wanted with that fortune in “spare time” in the hope that they would
become more proficient at their jobs. After due reflection, Mr Hart
decided that one of the most effective uses of computers would be the
storage, retrieval, searching and reading of material stored in computer
libraries. He then proceeded to key in the American Declaration of
Independence and produced the first electronic text (etext) in the PG
library. The rest, as they say, is history.
Creation
of an etext of the Declaration of Independence was
followed by the American Bill of Rights, the US
Constitution, the Bible, Shakespeare (a play at
a time), and then by general work in the areas of
literature and reference. From December 1971 to December
1993 one hundred etexts were produced. This was no mean
feat when one considers that the list includes
Shakespeare, the Bible and other considerable
works. All had to be keyed in and then checked by proof
reading and comparison with the printed work.
Appropriately, and not coincidentally, etext one hundred
was The Complete Works of William Shakespeare.
Now, with
the advent of computer scanners (which enable one to “read in” printed
pages and convert them to editable electronic text) and the increase in
popularity of the Internet, there are over three thousand six hundred
etexts available in the Project Gutenberg library and Mr Hart recently
announced that, for the first time, more than twenty new etexts were
posted to the library in one week. A prodigious effort by the many
volunteers involved in converting printed works into etext. The aim is to
reach ten thousand texts.
One might
think that the pool of printed works will run dry, however this can never
happen because every year new works become available as the copyright on
them runs out. Furthermore, volunteers have begun the work of converting
to etexts the literary gems of other languages, thus opening further rich
veins of literary ore for plundering.
Electronic
Data
The
premise on which Michael Hart based the Project Gutenberg concept was that
electronic data stored in a computer can be reproduced indefinitely by
passing it from computer to computer. Once a book or any other item
(including pictures and sounds) has been stored in a computer then any
number of copies can be made. Everyone in the world, or even not in this
world (given satellite transmission) can have a copy of a book that has
been entered into a computer. When people holiday on Mars, later this
century, they might have a copy of Homer's Iliad beamed up to them.
The book that they always meant to read. They would only need to specify
the required language.
It was
decided to store etexts in the simplest, easiest to use form available:
the “plain vanilla” or ACSII2 format, the basic characters one
reads on a normal printed page. Italics, underlines, and bolds would be
capitalized as they are not supported by many basic text readers. This
decision was made because 99% of the hardware and software in use all over
the world can read and search these files. Any other system of etext
storage will fall short of an audience of 99%. Furthermore, etexts stored
in this format are easily converted to many other formats, such as that
used in word processing and that used to represent text on Internet web
pages (i.e. HTML3).
Michael
Hart has said that he wants people to be able to use PG etexts to look up
quotations they have heard in conversation or in movies, or which they
have read in other books. He envisages a compact disc (CD) containing all
PG titles, which will constitute a library containing all these quotations
within the individual etexts. One could easily search the entire library
without any program more sophisticated than a plain search program found
on every personal computer.
The text
of an average book will fit on a standard 3.5inch floppy disk, available
on most personal computers. However, pictures such as those in the book
Alice in Wonderland present special problems for electronic
reproduction because of the computer disc space which they take up.
Nevertheless, Project Gutenberg is very interested in including pictures
and other graphics and will continue to take advantage of developments in
computer technology to add to the richness of its library of free, readily
available literary and reference works.
Scope of
the library
The
cataloguing and indexing of the library is still under review and is, in
itself, a major undertaking. However, works may be broadly classified as
follows:
Light literature such as Alice in Wonderland, Through the
Looking-Glass, Peter Pan and Aesop's Fables.
Heavy Literature such as the Bible and other religious
documents, Shakespeare, Moby Dick and Paradise Lost.
References such as Roget's Thesaurus, almanacs, a set of
encyclopedia and dictionaries, philosophy and natural history.
There is
no substitute for a good book
Many
people point out that there is no substitute for the look, feel and smell
of a book and that it is easy to browse through it, mark relevant passages
and look at the illustrations. This is perfectly true, and one might say
that the use of etexts has until now, been largely restricted to using
them to find specific references, since one needs a sit at a computer to
view them. Until now, that is.
Sometimes
we must wait for technology to catch up before we can make use of an
existing situation. The Internet existed in only a crude form when Mr Hart
started keying in the Declaration of Independence. We had to wait
for computers to become cheap and ubiquitous for the production of PG
etexts to explode. In the same way, technology is only now making
available portable electronic readers with which we will be able to read
etexts, or have them read aloud to us via text recognition software,
wherever we can now read a book. As one sits on Mars and use a voice
command to open The Iliad to a bookmarked position one might issue
the command “mouldy old paper” to have the reader exude the smell one most
associates with old books.
It is
part of Michael Hart's genius that he saw the potential of Project
Gutenberg and persisted with the concept for over twenty years before
technology turned the project into something beyond, dare I say, even his
wildest dreams. There is no substitute for a good book. It is just that
its present form may not matter all that much to future generations.
Volunteers
The
continuing success of Project Gutenberg depends on volunteers. As Michael
Hart has frequently pointed out, PG is made up entirely of volunteers who
produce etexts, proof read them, post them to the PG Internet site, post
copies on “mirror” sites around the world, maintain the computer hardware
and software involved in the project, correct errors in the text as noted
by end-users, do copyright checks and attend to the many administrative
tasks involved with any major co-operative project.
Volunteers
choose which texts they wish to work on and hence which etexts are posted
to the PG site. Since any book out of copyright4 may be used,
there is a bewildering choice of titles. Any title chosen is subject to a
copyright "clearance" after which it will usually accepted for posting.
Some volunteers prefer to proof read work prepared by others. Or, one may
become involved in “helping” Mr Hart put the finishing touches to texts
before posting, such as adding headers and footers or making minor
formatting changes.
When you
are reading your etext of The Iliad whist holidaying on Mars, spare
a thought for the prodigious amount of work which has been undertaken by
Michael S. Hart and the PG team to bring it to you just when and where you
want it.
Project
Gutenberg on the Internet
The
official PG site may be found at http://www.promo.net/pg/. A regular
newsletter is produced and information is provided about volunteering.
The
Bikwil site has a link to PG at http://www.bikwil.com/LinksEtexts.html.
It is rumoured that Tony Rogers exhibits unseemly enthusiasm about
the PG site.
For a
list of Australian texts on PG, try http://www.gutenberg.net.au/.
Conclusion
When
Johann Gutenberg invented the printing press he unleashed an unstoppable
process which facilitated communication between members of the human race
and the passing of knowledge and ideas in ways previously undreamed of.
The invention of the computer and the expansion of the Internet have
extended the capacity to pass on such knowledge and ideas. Project
Gutenberg, as the repository of the condensed knowledge and ideas of some
of the greatest minds in human history, contributes in no small way to
this process.
Acknowledgments
Much of
the background information in this article was drawn from the Project
Gutenberg Internet site at http://www.promo.net/pg/.
[ Col runs the Australian Project Gutenberg site
mentioned above, and he can be contacted via email at colc@gutenberg.net.au.
]

1 Michael S. Hart, Professor of Electronic Text at Benedictine University
(Illinois, U.S.A.) and Visiting Scientist at Carnegie Mellon University
(Pennsylvania, U.S.A.), founded Project Gutenberg in 1971 and is currently
its Executive Director. In a November 1998 article in Wired
Magazine, Hart was chosen among The Wired 25: A Salute to Dreamers,
Inventors, Mavericks, and Leaders. (See
http://www.honco.net/9904/contributors.html#TOPICS4
dated July 1999).
2 ASCII is an acronym for American
Standard Code for Information Interchange, a standard for storing
characters and numbers in computers.
3 HTML is an acronym for Hyper Text
Markup Language.
4 In the U.S.A., books are generally out
of copyright seventy-five years after publication. As a rule of thumb,
books published before 1923 are eligble. Full details are provided on the
PG site.
|