New World Library

Google digitizes University of Texas' books in its campaign to bring more books online.

by / July 5, 2007

Any day now, trucks will begin arriving weekdays at the University of Texas (UT) at Austin to begin hauling away thousands of books. The books will be ferried to undisclosed locations to be scanned and put into a database posted on Google.

By scanning books from libraries, Google is creating the largest digital database of books in the world through the Google Books Library Project. The effort began in December 2004 and makes books searchable online, the same way Google makes Web sites searchable. For copyrighted books, users see only bibliographic information and book snippets. Books in the public domain, however, can be downloaded from cover to cover.

Because of the promise of digital archives and preservation of books, libraries nationwide are participating in the program.

However, the project has been bogged down by controversy and copyright infringement lawsuits. While Google says the project is only meant to make books more accessible to the public and to generate interest at the benefit of authors and publishers, critics argue Google's scanning of copyrighted material - although it's not publicly accessible - is entirely illegal. 


Full Speed Ahead

Despite ongoing court cases, Google is moving forward with its project, scanning and digitizing thousands of volumes of books each day from libraries worldwide, including the New York Public Library, the Complutense University of Madrid and the National Library of Catalonia (and four affiliate Catalonian libraries).

With millions of books available electronically on Google Book Search, the company's Web site purports that it's "expanding the frontiers of human knowledge."

"Google's mission is to organize the world's information and make it universally accessible and useful," said Adam Smith, product management director at Google. "That mission would be incomplete if we did not include books. There's an incredible wealth of knowledge held on bookshelves in libraries and publishing houses, and we want to help people find it."

Google is also piloting the World Digital Library with the Library of Congress. The project is an online collection of rare books, manuscripts, maps, posters, stamps and other library materials. Google has contributed $3 million to the project.

The Austin UT Library, which is part of the UT Libraries system, has the fifth largest academic library system in the United States containing more than 9 million books. The Austin UT Library joined the Google Library Project in January 2007 with an initial six-year contract agreement to digitize 1 million books, although all 9 million books will be considered for digitization. Library officials are ironing out details of book handling with representatives from Google, who are particularly interested in UT's Latin America collection, regarded as the nation's most comprehensive.

Similar to other library partnerships, books will be scanned by Google and approved by the university. In return for their involvement in the project, the UT Libraries will receive digital files of scanned books, which will help with long-term preservation of the volumes. All books, no matter how carefully they are handled, will deteriorate over time, said Doug Barnett, chief of staff for the UT Libraries.

"There are many ways we go about safeguarding the books, and there are different levels of safeguards depending on how valuable or rare or fragile the item is, but to the degree we let anybody use them, there's always a risk of damage. And in terms of very long term, even carefully handled, the materials will eventually deteriorate," Barnett said. "It's not a complete solution by any means, or the only way the libraries are approaching this, but it does have the value of helping preserve the information. Even if the item itself was to somehow be lost or destroyed, there would be a digital copy of it."


Increased Exposure

Librarians are eager to participate in the Google Library Book Project because it coincides with their goal of making books more visible and accessible, Barnett said.

Google is also working with more than 10,000 publishers to copy books and give limited previews, which is expected to help the book industry by giving more exposure to new and out-of-print books.

"We feel like there are incentives for many parts of the book industry to become involved here, both libraries in terms of sharing information, but also authors and publishers, by making people more aware of work and providing opportunities to find them in the library or buy them," Barnett said.

Books from the UT Libraries scanned for the project are expected to be in the Google Book Search next year, but some of the titles previously digitized are already available at the beta version of the Google Book Search Web site <>. Books on the Web site have already been integrated into standard Google searches.

When a search is entered in Google Book Search, several books are typically revealed, with bibliographic data, title, author, publication data, length and subject. If a book is out of copyright - for instance literary classics like Moby Dick and Sense and Sensibility - the book can be read online and downloaded. Key terms that are entered in the book search are highlighted throughout the book to assist in information search.

True to old library books, some of the scanned books contain marks, such as underlines and notes from previous users. For copyrighted books, an entry similar to a card catalog is shown with basic information about the book. Either way, the search engine directs users to places they can buy or borrow the book.  


Legal Challenges

Google is not the first book digitization project. Carnegie Mellon University hosts a project called the Universal Library, which has scanned nearly 1.5 million books; the Open Content Alliance, an association of technology, nonprofit and governmental organizations, and several major college libraries, has scanned more than 100,000 books; Microsoft's Windows Live Book Search service is considered a response to Google's project; the Library of Congress has the American Memory project; has digitized hundreds of thousands of books it sells; and there are many smaller projects.

Yet because of the Google project's magnitude, two lawsuits filed in a federal court in New York have challenged it - one suit coming from several writers and the Authors Guild, and the other from a group of publishers, which include McGraw-Hill, Penguin Group, Simon and Schuster, and Pearson Education.

The lawsuits contend that Google's copying of complete volumes under copyright constitute infringement, even though Google does not make the full texts of copyrighted material available through its Web site. The publishers who filed suit are actually collaborators of the initiative, who acknowledge that the Google Books Library Project will help promote and sell books, but remain opposed to the copying of copyrighted work without consent.

The Authors Guild is also concerned with the vulnerability of massive digital databases of books held by Google and other universities.

"In our view, Google needs a license to do what it wants to do," said Paul Aiken, executive director of the Authors Guild. "There are additional concerns with security since authors and publishers have not been contacted to audit security measures to make sure the database is reasonably hacker-proof and the data center is secure."

The Authors Guild is also concerned that Google could set a precedent for other scanning ventures with a widespread proliferation of copyrighted text available for viewing, Aiken said.

Google argues the Google Book Search, which is designed to comply with international copyright laws, helps people find books and increases the

incentive of publishers to publish them.

"Because Google Book Search makes all of the knowledge contained within the world's books searchable online, it exposes readers to information they might not otherwise see, and it provides authors and publishers with a new way to be discovered," Google's Smith said. "Many of our publishing partners are reporting increased sales and traffic to their sites since joining the program."

Libraries across the country have known for some time they need to adapt to the digital age and have developed their own digitization projects before Google's. Stanford University founded HighWire Press in 1995, which provides electronic access to more than 1,000 scholarly journals. When Stanford digitized its card catalog a few years later, its book circulation increased by 50 percent. Around the same time, the library at Princeton University digitized its card catalog. Other libraries nationwide are digitizing their collections.

Some publishers who are partnering with Google have talked with the company about making their books available for purchase online through Google Book Search.

Regardless of the outcome of the lawsuits, the Google Book Project signifies a major shift for libraries into the digital world. Google will help reacquaint a new generation with books in libraries and direct students and other researchers to information that is more reliable than Web sites, according to the American Library Association.

However, the association warns that Google's search index could allow viewers to forgo library research, and instead rely on information snippets provided by Google Book Search. Regardless, librarians across the country, including Barnett and his staff at the UT Libraries, are waiting to see how Google's project will fit into the future of libraries and research. 

"I don't think this is the only library of the future, but this sort of digital collaboration is part of the future," Barnett said. "Certainly working with various technology and various partners to make information easier to find and more widely accessible is a major part of the future."

Chandler Harris Contributing Writer