As librarians, we’ve long championed fair-use causes, and last week’s US District Court ruling in favor of Google’s book scanning project drew great interest from librarians. I think it’s exciting, but I also see a new and serious challenge to libraries ahead.
In a legal challenge brought by the US Writer’s Guild against Google, US District Judge Denny Chin ruled (PDF) that Google’s attempt to scan books was fair use. Chin’s main rationale can be boiled down to the fact that Google is not selling these books and does not make the entire book available in any practical way that might allow users to read a complete work online. Moreover, the books being scanned were often given to Google for scanning by organizations that had purchased the book and thus had every right under fair-use to scan these books and cite portions of them online. In Google’s case, this means the company can render relevant segments of a book to users in the form of in-page search results.
The final details are still to be worked out in a settlement between the two parties.
It was a good day for the public as well as Google. But this will definitely impact the library, especially the relevance of our (increasingly irrelevant) catalogs.
I can’t say this strongly enough: Libraries have missed the boat, big-time, when it comes to our approach to information retrieval over the past 15 years or so. Our anachronistic approach to marking up books, journals and other resources with metadata has been wildly inefficient for too long. Take the Internet model for retrieving knowledge, which uses algorithms augmented with elements of artificial intelligence. These retrieval functions integrate a given users social history, incorporate learnings from how users interact with results and even understand spelling, synonyms, grammar and natural language. The indexes these algorithms search, in turn, weight everything more or less democratically, based largely on the strength of interconnections between files (via hyperlinks) and keywords.
Now consider the library: where human agents try to pre-think the users intent by applying metadata with controlled vocabularies in unforgiving, and yes, arrogant fashion. We don’t use the users’ words. We use our words. Woe be to the user brought up on Google’s much more forgiving (i.e. helpful) search tool, who tries to use their own vocabularies to find relevant materials. When we do allow for keyword access, it is so weighted down the algorithmic chain (below our “authorized” terms) that users are often forced to page through results before they find what they’re looking for. If they can even find our catalog to start with (but I get ahead of myself here).
Even many of our most advanced discovery systems still don’t get this fundamental flaw in our approach. One discovery system in particular that I have experience with doesn’t account for how contemporary searchers often enter search strings with the assumption that they will be understood as phrase searches. At least, that’s the tendency they have leaned toward since Google’s algorithm reinforced this habit many years ago when it added proximity searching based on the order of words entered into their search engine. In this particular discovery system I’m referring to, if you search for a title without quotes, there is a very good chance (based on a number of factors) that your intended item won’t come up anywhere near the top ten…or even top 100 items.
So to summarize, libraries are still plodding down a dead end. Rather than forcing people to learn how we’ve organized things, we need to use machines to understand how our users (and the world) organizes things.
So how does the Google book scanning ruling relate to this dismal state of affairs?
Try an experiment: go to Google and type in the title of a currently playing movie. Chances are, Google will provide you with immediate information at the top of your results on which local cinema is showing that particular movie, and what time it is playing.
Now try searching for a recent book title. You’ll likely get Amazon.com, perhaps the publishers site, etc. What you won’t see is a link to your local library, let along library catalog.
So from this, consider the very near future (like next month) when you type in keywords into Google and begin seeing results from books. Click the book and you’re treated to alot of metadata, especially social metadata, like Goodreads reviews, links to author websites and in some cases, the ability to read the book if it’s open access.
Fortunately, there is a small link to “Find in a library” on the left-hand side once you’ve clicked into the Google Books interface from your initial search results. It’s just below the very obvious [Get Print Book] button which takes you to the big online book sellers. If you click on the library link (not the Add to your library which is something else), you’ll get the OCLC record with links to your nearest libraries. So it’s not totally a disaster, but hey, how did we get to this place where users looking instant access to any given information, don’t get a link to their local library’s request form (or directly to their local library’s e-version). But instead have to fork over mullah to learn something new. At least they still have YouTube and Wikipedia, I suppose.
We really need to get on the bandwagon here folks. A huge victory for fair-use has just been won, but until libraries start pushing our way to the front of the line (and the top of the results), we’re going to always come up last. And that’s not where our users are looking.