Google is Bad. Let's be like Google

Yesterday I attended a plenary on the topic of seamless library service access at the University of Michigan Library. I was excited at the prospect of discussion about making digital library services less chunky, just as I am excited about discussion about making digital publishing less chunky. The speaker was ProQuest’s VP of Discovery Services, John Law, and his talk was titled “Attracting and Keeping Net-Gen Student Researchers.”

The first part of the talk was a pretty dry but informative overview of an ethnographic study ProQuest did to better understand the research habits of its primary users: students. Primary research included:

  • field observation of mostly undergraduate student researchers, either in person in their “natural habitat” or remotely
  • online chat-based focus groups with end-user researchers
  • focus groups with librarians
  • end-user researcher surveys

Some results included the fact that participants overwhelmingly thought that libraries have the best information over the Internet at large, but they often start their searches in search engines because they don’t now where to begin on library sites. Library sites usually have no clear/compelling starting places, and it’s difficult to identify useful and appropriate resources without prior knowledge. Users generally satisfice (what Law called a compensatory behavior) by going to search engines instead, sometimes even returning to their or another library through search results. (And when I say he talked about search engines, I mean that he was talking about Google. No other search engines were really discussed. Bing was mentioned in passing, but not in the context of its seeming ability to be relatively good at getting deep web stuff.)

To sum up this portion of the talk, he cited the Ithaka Report to say that usage of the library as gateway is decreasing, and library disintermediation from the research process is increasing.

I was totally with him up to this point: Libraries are not being used as they used to be. Check. People are using other online tools, tools thought by librarians to be inferior. Check. Librarians want people to use their tools. Check.

I thought we were on the cusp of talking about getting web tools and library tools to work better together. But we were not.

Law views Google as competition. Instead of making library databases and catalogs and what-have-you play nicer with search engines, he has headed up the creation of a new product called Summon (taglined “Web Scale Discovery”).

It’s:

  • is a single, pre-harvested unified index, including full text when available
  • makes results available without authentication
  • is customized for a given library’s content
  • is vendor and resource neutral (but still dependent on vendor buy-in)

He referred to it as being “like Google” but for library websites.

I do think the product would do a few things well. Its core mission is to provide a way for people to search across content types in a timely fashion, replacing slow and clunky federated search. It also provides the opportunity for the user to do some vetting before they even look at a resource, if useful metadata is provided. This would allow users to look at only a few really pertinent results and then go off to use whatever resources are at their disposal. It would also be a great discovery tool for figuring out what databases and tools are useful for a given topic, especially since once people figure out what tools to use for that topic, they tend to return to them again and again. It’s a true gateway model, in that sense.

However, as you can probably gather because I only write information science-related posts when I’m angry, I have a few problems with Summon. I think Summon makes some assumptions, which I will put in quotes so someone doesn’t think these are ideas I espouse:

  • “Google is bad for research.” In an increasingly cross-disciplinary world, where traditional media and traditional publishing are no longer the sole, or even primary, sources of research, I think it’s problematic to say that Google is not a research tool. Discouraging people from using Google is just going to make your library look even more out-of-touch than people already think it is.
  • “People start their research at the library.” Some people do. I know I start with search engines and Wikipedia to get a grasp on what terms and names I should look for when I go into a research database. During the Q&A period I asked Law about what suggestions he had for addressing users starting their research in Google, and trying to get them to Summon. His reply, which I think was a good one in general but not specific to the tool, was that libraries should have keyword rich landing pages that pull in hits. I guess I was hoping that you could make canned searches in Summon and make landing pages with them.
  • “Librarians should decide what resources are legit and which aren’t, even on the open web.” Someone in Q&A asked about whether open web content could be included in Summon. The answer was yes: whatever librarians deem important. I do think the opportunity to include open web content with scholarly sources is a good one, but, as I said before, decisions about what’s important on the Internet are made by completely different metrics, especially as resources become more cross-disciplinary. I imagine this capacity for the tool would generally be extremely limited to a few scholarly sites, populated on some sort of as-used basis, or ignored all together in the face of the enormity of the information that would have to be vetted by librarians to make it a truly integrated search. (This was tried once. It was called the Internet Public Library, and it was outgrown by the post-2000 web and made obsolete by, yes, search engines.)
  • “We can change user behavior.” This is the We Know Best Approach. Instead of following users’ lead and meeting them where they’re starting, we’re trying to force them to start where we want them to start. That doesn’t work, and it’s generally a waste of time and money.

Ultimately, I think the “Google is bad, but let’s build a tool like Google” business model is not going to work out. Because, as long as there is Google, people are going to use Google. I think a better use of library resources is figuring out how to get scholarly work visible in search engines. Take that same metadata and work with search engine companies to get it into search engines. If a resource is restricted, let those listings tell a person if they can access the resource or not via their institution, given IP range or current authentication or a cookie or a plugin or whatever their library’s method is. And let them get straight at the content from there. The library doesn’t have to be invisible, but it also doesn’t have to be an unmoving wall.

Social media and academic presentations

Daniel MacArthur of the Genetic Future blog at scienceblogs has broached the topic of using realtime online technologies (liveblogging, twitter, flickr, etc.) at science conference in a few posts, and updates with some information about a peer creating a set of slides and icons to indicate how information contained in a presentation can be disseminated:

A while back I pondered the possibility of creating icons for conference presenters to add to their first slide to alert bloggers/tweeters in the audience about whether the presented data was “blog-safe”. This was provoked by a recent episode illustrating general confusion among bloggers (in this case, me) and scientists about the use of social media at conferences.

Fellow Australian-turned-UK-resident-scientist Cameron Neylon has now put together a handy set of slides for presenters to label both “blog-safe” and “no-blogging” presentations. The slides have a ccZero license and so are freely available for download and modification; the original icons can be found on Cameron’s Flickr account and Christopher Ross’ website.

Coming from information science, my default assumption at conferences or talks is that presenters want their information disseminated as far as possible, and services like twitter and the practice of liveblogging seem the obvious way to go for real time info, provided quotes and data are accurately attributed at the time of publication. I’ve twittered about the last two professional conferences I’ve attended, and subsequently provided my notes on those conferences as publicly available Google Docs (see my BookCamp Toronto 2009 and Internet Librarian 2009 notes).

But I understand some disciplines, particularly biomedical sciences, depend on keeping their data and findings within a limited sphere of people and publications. For example, this winter I attended a talk presented as part of the Health Informatics Grand Rounds series, which is sponsored by a variety of health science departments and institutions at the University of Michigan. The talk was done by John Wilbanks, Creative Commons VP and Science Commons ED, and was about mechanisms for sharing and storing data sets online, and how such mechanisms would affect how researchers think about what information belongs to them, how they collaborate with colleagues, etc. Sharing research data is a no-brainer for me, but I’m also not trying to beat my competitors to the cure for cancer, or secure research funding for the special thing that only my lab (at least I think it’s only my lab) does.

Developing a set of symbols, or a written statement, that tells viewers what can and can’t be discussed outside the original presentation forum is, I think, a step in the right direction, and will get people talking about the issue. But, for better or worse, I think we’re moving closer and closer toward a world where people assume that information they see and that they find important can and should be further disseminated.

What do you think?