all repos — h3rald @ 6b243ded05a7593922d198634c681075de1c12e2

The sources of https://h3rald.com

contents/articles/project-gutenberg.html

 1
 2
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
 100
 101
 102
 103
 104
 105
 106
 107
 108
 109
 110
 111
 112
 113
 114
 115
 116
 117
 118
 119
 120
 121
 122
 123
 124
 125
 126
 127
 128
 129
 130
 131
 132
 133
 134
 135
 136
 137
 138
 139
 140
 141
 142
 143
 144
 145
 146
 147
 148
 149
 150
 151
 152
 153
 154
 155
 156
 157
 158
 159
 160
 161
 162
 163
 164
 165
 166
 167
 168
 169
 170
 171
 172
 173
 174
 175
 176
 177
 178
 179
 180
 181
 182
 183
 184
 185
 186
 187
 188
 189
 190
 191
 192
 193
 194
 195
 196
 197
 198
 199
 200
 201
 202
 203
 204
 205
 206
 207
 208
 209
 210
 211
 212
 213
 214
 215
 216
 217
 218
 219
 220
 221
 222
 223
 224
 225
 226
 227
 228
 229
 230
 231
 232
 233
 234
 235
 236
 237
 238
 239
 240
 241
 242
 243
 244
 245
 246
 247
 248
 249
 250
 251
 252
 253
 254
 255
 256
 257
 258
 259
 260
 261
 262
 263
 264
 265
 266
 267
 268
 269
 270
 271
 272
 273
 274
 275
 276
 277
 278
 279
-----
title: "Project Gutenberg: The What, When and Why"
content-type: article
timestamp: 1134215728
tags: "writing|internet"
-----


<p>
    I always liked reading Shakespeare, and I always wanted to have a copy of every one of his plays, tragedies and
    sonnets on my bookshelf ready for consultation, but such things always seemed unrealistic because I had neither the
    space for them nor the time to find them all nor the money to spend on them when I did find them.
    <br />
    Now I can store the complete works of William Shakespeare directly on my mobile phone, and they take up as little as
    1.4 MB compressed...
</p>
<h3>Origins</h3>
<p>
    Even if you never heard the word ?e-book?[1] before, you can probably guess its meaning: <em>electronic book</em>,
    or a book in digital format. What you probably don't knoe is that people started copying books into digital format
    nearly as soon as computers were available to the public, and maybe even before: the first <em>e-book</em> was
    created in 1971.
</p>
<p>
    That year, a student at the University of Illinois named Michael Hart was given the equivalent of $100,000,000 (or
    $100,000, or $1,000,000 - there is no official estimation) in <em>computer time</em>. Basically, since he was
    friends with some of the operators at the Materials Research Lab, he was given an operator account on the Xerox
    Sigma V mainframe, which later became one of the 15 nodes that developed into the global network that eventually
    became the Internet. At that time, having that much computer time at your disposal was indeed a great privilege, and
    Hart felt that he had to use that time for something useful that could in theory generate a profit - not an easy
    task when you consider that only a limited amount of people in the world had access to a computer, and that those
    computers weren?t even connected together.
</p>
<p>
    Foreseeing an era where computers where interconnected and regular people had access to them, Michael Hart thought
    that virtually all texts and books could be made available in digital format, for free, to anyone who wanted to read
    them. Certainly, such a <em>project</em> seemed quite unrealistic and excessively time consuming at the time;
    nevertheless, he decided to start copying the first book himself, the Declaration of Independence of the United
    States, which he was carrying in his backpack.
</p>
<p>
    Project Gutenberg[2] was born with that one single text, and it has grown through the years. Today, there are more
    than 16,000 e-books available to download and read.
</p>



<h3>What is Project Gutenberg?</h3>
<p>
    By that name, Michael Hart probably wanted to define the project?s scope and vision: an idea as revolutionary for
    the diffusion of literature as the invention of moveable type printing[3] in the 1450s.
</p>
<p>
    The mission of the project can be summarized as follows[4]:
</p>
<div style="text-align: center;"><em> "To encourage the creation and distribution of eBooks."<br /> </em></div>
<p>
    In order to achieve this, Project Gutenberg is set up such that <em>anyone</em> can contribute to it, in many
    different ways. It is run completely by volunteers, hundreds of people around the world who share the same ideals
    and believe that literature should be freely available to everyone at virtually no cost.
</p>
<p>
    The Internet serves this purpose magnificently: it is possible to download all of the over 16,000 free e-books from
    the Project Gutenberg website[5] in different formats and many different languages[6]!
</p>
<p>
    However, having such a large amount of books available within a few clicks can make people forget about how time
    consuming the process of making one single e-book is: originally, after acquiring a paper copy of the book,
    Gutenberg?s volunteers had to transcribe it themselves, typing every word from the beginning to the end. Then the
    book had to be checked for mistakes before it was accepted into the Project.
</p>
<p>
    Producing a single e-book can therefore take many people and many hours from beginning to end, and presumably this
    was one of the reasons why Project Gutenberg was criticized for being more of an utopian ideal than a tangible
    reality: every year since its creation people have doubted the project, accusing Hart of pursuing an impossible
    dream, and prophesying that fewer and fewer people would join the team and that there was no future for Project
    Gutenberg.
</p>
<p>
    Oddly enough, they were all wrong: not only is the Project still active today, but the number of books released
    every year has grown consistently over time, from a few dozen in the early days to thousands per year now.
</p>
<p>
    More and more people became involved, partially because they share the same ideals and partially because it has
    always been easy to get involved[7]: Project Gutenberg strives to remove all the institutional barriers which could
    potentially interfere with members? motivation; they try not to impose any restrictions, and they don't support
    perfectionism. It is believed[8] that there shouldn?t be any <em>proper</em> or <em>standard</em> way to release
    e-books, but instead many different ways, to appeal to many tastes: the Project doesn?t support any particular
    standard for releasing ebooks, although it normally takes the simplest path. Therefore, the majority of the books
    are available in <em>Plain Vanilla ASCII</em>, i.e., texts are written using only ASCII characters, and bold,
    italicized or underlined words are capitalized instead. While this format has the most limitations, it is also the
    most portable.
</p>
<p>
    At this point, you might wonder why they don't just scan the original books, and make them available as image files
    or PDF files. While it would be much faster, it also has disadvantages, such as large file size and an inability to
    be displayed at particular resolutions; a scanned book probably wouldn't be readable on a PDA, mobile phone, or
    other equally small device.
</p>
<p>
    Nonetheless, scanners do play an important part nowadays in the process of making an e-book: texts are no longer
    copied manually if a printed edition already exists. Instead, they are scanned with OCR[9] and then proofread twice
    before being accepted. The (un)official procedure recommends scanning at least one page a day, having it proofread
    once by someone in charge of doing so (a ?junior? proofread), and then again by a more experienced member. This has
    undoubtedly sped up the process.
</p>


<h3>Not All Books Are Equal (for now)</h3>
<p>
    By looking at some of the titles available on Project Gutenberg, you?ll notice that most of them are
    <em>classics</em> or relatively old works: for example, you won?t find the latest <em>Harry Potter</em>[10]
    available for download.
</p>
<p>
    Since <em>all</em> of the books at Project Gutenberg are free to download (more details of the license will be given
    later on), and therefore not subject to fees or copyrights, only books in the public domain[11] can generally be
    included in the Project.
</p>
<p>
    Public domain includes all those works of art whose intellectual property cannot be legally claimed or exploited by
    any person, institution or legal entity, and therefore belong to all mankind. In the case of books, copyright can
    expire <em>only if</em> some particular conditions subsist:
</p>
<ul>
    <li>The work was created and first published before January 1, 1923, or at least 95 years before January 1 of the
        current year, whichever is later.</li>
    <li>The last surviving author died at least 70 years before January 1 of the current year.</li>
    <li>Neither a <em>perpetual copyright</em> is granted by the Berne Convention nor has a particular government (US or
        EU) passed a copyright term extension.</li>
</ul>
<p>
    Now we can see why there are not very many <em>new</em> publications available in the project, and that?s really
    frustrating for Michael Hart and other volunteers:
</p>
<em> "In the USA, no copyrights will expire from now to 2019!!! It is even much worse in many other countries, where
    they actually removed 20 years from the public domain. Books that had been legal to publish all of a sudden were
    not. Friends told me that in Italy, for example, all the great Italian operas that had entered the public domain are
    no longer there... Same goes for the United Kingdom. Germany increased their copyright term to more than 70 years
    back in the 1960's. It is a domino effect. Australia is the only country I know of that has officially stated they
    will not extend the copyright term by 20 years to more than 70."</em>[12]
<p>
    After all these considerations, we can take a closer look at Gutenberg?s license[13] which comes in two different
    versions: <em>informative</em> and normative (?legalese?, as they call it), the latter of which is the real
    document. Luckily, the non-legalese version is simple and complete enough: basically PG releases books which are
    either in the public domain or ? if copyrighted ? the author gave express permission to re-distribute them. The
    difference lies in the fact that if you remove PG?s trademark and license from a book which is in the public domain,
    you can re-distribute it freely on your own, but if the book is copyrighted and permission to distribute was given
    <em>only</em> to PG, you?ll have to contact the author to obtain permission.
</p>
<p>
    Furthermore, anybody can use the PG trademark when distributing <em>verbatim</em> copies of a book, with no changes
    (re-formatting is allowed); if you want to charge money for the copies you distribute, you have to pay royalties to
    PG.
</p>


<h3>Satellite Sites and Similar Projects</h3>
<p>
    Michael Hart was ? and still is ? an authentic pioneer in his field: he had the idea to create the largest free
    library on the Internet to <em>?Break Down the Bars of Ignorance and Illiteracy?</em>. A lot of people thought he
    wouldn?t achieve anything, but his dedication and perseverance were simply so exemplary that more and more people
    got involved, a few satellite sites were created and similar projects were started in all over the world sharing the
    same goals.
</p>
<p>
    Hart is obviously aware of the fact that there are also some sites <em>selling</em> e-books, but he explains that
    neither those sites nor any other free online library should be considered a competitor to Project Gutenberg: they
    all contribute to the diffusion of e-books.
</p>
<p>
    One of the most important <em>satellite site</em> of PG is ?Distributed Proofreading?[14] which is now considered
    the main source of PG books: every month more than 100 books are proofread by hundreds of volunteers who can
    register on the site for free and then get added to the project. The key concept of this parallel organization is
    that a single book can be proofread by more than one person at the same time, and thereby speeding up a project
    which would be otherwise very difficult to coordinate.
</p>
<p>
    Another site which helps the main project is HWG, the HTML Writers Guild[15]. It aims to convert PG?s plain text
    ebooks into more feature-rich HTML documents: by using a mark-up language it is possible to add footnotes and it can
    be analyzed easily by automatic tools.
</p>
<p>
    Although Project Gutenberg releases well-known books in many languages, a few sites officially affiliated with the
    project were created to focus particularly on their regional literature and works. That?s the case for both
    Australia[16] and Germany[17], for example; they both focus on their own national heritage. Regarding the latter,
    they recently claimed their own copyright for their e-books, and thus a new foundation is in the process of being
    created: Project Gutenberg Europe[18] which aims, among other things, to address the myriad copyright issues and
    laws of the EU.
</p>
<p>
    Last but not least, there?s an interesting discussion[19] about similarities and differences between Project
    Gutenberg and Wikisource[20] a Wikipedia[21]?s sister project aiming to create a free repository of texts which are
    either in the public domain or licensed under the GFDL[22].
</p>
<p>
    Wikisource people obviously noticed that their project was quite similar to PG, but with an important difference:
    their texts were formatted and freely editable by any user who was able to spot a mistake or inaccuracy; PG doesn?t
    offer this. In this context, Project Gutenberg was sometimes blamed for allowing inaccurate material to be included
    in the project: this was due to the fact that even if PG uses Distributed Proofreading website to proofread e-books,
    this is often not comparable to a wiki system. However, in PG's defense, wiki articles, being much more open, are
    subject to much more vandalism, and therefore must be more closely watched. One can imagine a high school student
    changing <em>Hamlet</em> to read "To be or not to be, who gives a crap."
</p>
<p>
    However, the members of Project Gutenberg have proposed a sort of mutual cooperation between PG and wikisource:
    wikisource should maintain a broader scope, focusing not only on literary works but also on quotations and other
    kind of texts, and at the same time provide some revised edition of some book to Project Gutenberg.
</p>

<h3>The Future of Project Gutenberg</h3>
<p>
    Project Gutenberg demonstrated the ability to grow considerably during its over 30-year existence. During that same
    time, copyright laws were extended, and some new technologies tried to <em>intimidate</em> the Project, which seems
    to remain relatively unchanged. However, last year a long-awaited DVD containing all the Project's e-books was
    released, showing the world that PG can keep up with the progress of technology to a certain extent.
</p>
<p>
    One aspect that makes PG a successful project even today is its ability to adapt: CD-ROMs and a DVD were released,
    OCR was almost immediately taken into consideration, and since last year, all e-books have been released in both
    plain text and HTML format: there are still no fixed standards or rigid guidelines, but common sense seems to
    prevail over chaos, and for now, the system works.
</p>
<p>
    So far, Michael Hart showed the entire world that a single person can do <em>a lot</em> when pursuing a noble goal.
    Call him an idealist, call him a dreamer, but he surely created something able to gratify and motivate him and his
    fellow volunteers forever:
</p>
<em>?I can't think of anything more rewarding to do as a career than Project Gutenberg. It is something that will reach
    more people than any other project in all of history. It is as powerful as The Bomb, but everyone can benefit from
    it.?</em>[12]

<h3>Notes &amp; Further Readings</h3>
<ul>
    <li>[1] Ebook, Wikipedia page - <a href="http://en.wikipedia.org/wiki/Ebook">http://en.wikipedia.org/wiki/Ebook</a>
    </li>
    <li>[2] Project Gutenberg, Wikipedia page - <a
            href="http://en.wikipedia.org/wiki/Project_Gutenberg">http://en.wikipedia.org/wiki/Project_Gutenberg</a>
    </li>
    <li>[3] Movable type, Wikipedia page - <a
            href="http://en.wikipedia.org/wiki/Printing_press">http://en.wikipedia.org/wiki/Printing_press</a></li>
    <li>[4] Project Gutenberg FAQ0 - <a
            href="http://www.gutenberg.org/about/faq0">http://www.gutenberg.org/about/faq0</a></li>
    <li>[5] Project Gutenberg Official Website - <a href="http://www.gutenberg.org">http://www.gutenberg.org</a></li>
    <li>[6] Project Gutenberg?s catalog - <a
            href="http://www.gutenberg.org/catalog/">http://www.gutenberg.org/catalog/</a></li>
    <li>[7] Project Gutenberg?s volunteering page - <a
            href="http://www.gutenberg.org/info/volunteer">http://www.gutenberg.org/info/volunteer</a></li>
    <li>[8] Project Gutenberg FAQ3 - <a
            href="http://www.gutenberg.org/about/faq3">http://www.gutenberg.org/about/faq3</a></li>
    <li>[9] Optical Character Recognition, Wikipedia Page - <a
            href="http://en.wikipedia.org/wiki/Optical_character_recognition">http://en.wikipedia.org/wiki/Optical_character_recognition</a>
    </li>
    <li>[10] ?Harry Potter and the half-blood prince?, Scholastic Inc. website - <a
            href="http://www.scholastic.com/harrypotter/books/prince/index.htm">http://www.scholastic.com/harrypotter/books/prince/index.htm</a>
    </li>
    <li>[11] Public Domain, Wikipedia Page - <a
            href="http://en.wikipedia.org/wiki/Public_domain">http://en.wikipedia.org/wiki/Public_domain</a></li>
    <li>[12] ?The Second Gutenberg Interview with Michael Hart?, Sam Vaknin, Ph.D. - <a
            href="http://samvak.tripod.com/busiweb29.html">http://samvak.tripod.com/busiweb29.html</a></li>
    <li>[13] Gutenberg Project license - <a href="http://www.gutenberg.org/license">http://www.gutenberg.org/license</a>
    </li>
    <li>[14] Project Gutenberg?s Distributed Proofreading - <a
            href="http://www.pgdp.net/c/default.php">http://www.pgdp.net/c/default.php</a></li>
    <li>[15] HTML Writers Guild Project Gutenberg - <a href="http://gutenberg.hwg.org/">http://gutenberg.hwg.org/</a>
    </li>
    <li>[16] Project Gutenberg Australia - <a href="http://gutenberg.net.au/">http://gutenberg.net.au/</a></li>
    <li>[17] Project Gutenberg Germany - <a href="http://gutenberg.spiegel.de/">http://gutenberg.spiegel.de/</a></li>
    <li>[18] Project Gutenberg Europe - <a href="http://gutenberg.nl/">http://gutenberg.nl/</a></li>
    <li>[19] Wikisource and Project Gutenberg, Wikisource page - <a
            href="http://wikisource.org/wiki/Wikisource:Wikisource_and_Project_Gutenberg">http://wikisource.org/wiki/Wikisource:Wikisource_and_Project_Gutenberg</a>
    </li>
    <li>[20] Wikisource main page - <a
            href="http://wikisource.org/wiki/Main_Page">http://wikisource.org/wiki/Main_Page</a></li>
    <li>[21] Wikipedia main page - <a href="http://www.wikipedia.org/">http://www.wikipedia.org/</a></li>
    <li>[22] GNU Free Documentation License - <a
            href="http://en.wikipedia.org/wiki/Wikipedia:Text_of_the_GNU_Free_Documentation_License">http://en.wikipedia.org/wiki/Wikipedia:Text_of_the_GNU_Free_Documentation_License</a>
    </li>
</ul>