Easy Search:  Features Set and Search Asssistance

Easy Search Overview

 

The University of Illinois Library has developed a suite of metasearch software modules that are being used to power several access and retrieval systems. This metasearch software provides the search functionality behind the "Easy Search", "Books", "Journals", and "Other" search tabs on the Illinois  Library Gateway at:

http://www.library.uiuc.edu/ 

and a custom portlet-based Engineering Library gateway at:

http://search.grainger.uiuc.edu/top/ 

There is also a "native mode" version which gives the user more flexibility in choosing multiple subject target resources at:

http://search.grainger.uiuc.edu/searchaid2/searchassist.asp.

In general, the metasearch feature set is referred to as "Easy Search" (ES). The ES suite presently provides asynchronous, broadcast search over some 75 target resources (actually more since any Ebsco, Proquest, OCLC, Wilson, CSA, and ISI databases can be targeted). XML gateways and html parsing (screen scraping) are used in retrieving results from the resource targets. The target searches do not utilize Z39.50 searching. A middleware layer that connects to target resources based on the NISO MXG (Metasearch XML Gateway) search capability has also been implemented. This provides a REST-based SRU function that returns results information in XML to the invoking application.

 

Easy Search Features

 

The Illinois metasearch system employs a recommender or resource discovery approach. Users are presented with result matches from target resource searches and links from these result matches into the native interface at the point of completed search. In this way, a user can click through to a target system's native interface display of the completed search and they can then modify the search argument, browse additional pages, link to full-text, etc.

 

The Illinois Gateway went operational in September, 2007. Since that time there have been over 2.2 million searches performed in the Easy Search suite and over 2.5 million user clickthroughs to target resources. The software is undergoing continual development, with new functionality being added during the course of several grant investigations. These grants include an IMLS National Leadership Grant (http://search.grainger.uiuc.edu/IMLS/) on portal design and development, an NSF NSDL (National Science Digital Library) grant (http://search.grainger.uiuc.edu/nsdl/) on metasearch XML gateway development, and a Mellon Foundation grant to the DLF Aquifer American Social History project (http://www.dlfaquifer.org/) which funded a metasearch scenario. The IMLS and NSDL work include a deep transaction log analysis and the identification and implementation of assisted search and resource integration techniques. The focus is on developing useful assisted search techniques, including suggestive prompts, to assist users with search query formulation and refinement.

 

The ES software offers a variety of assisted search and user search argument processing techniques, including: stopword removal, specified term substitution from a table, spelling suggestions, direct link suggestions generated from a database, selective removal of target labels after zero matches, suggestive prompts for author searches, and suggestions to limit searches to phrase and title searches.

 

All user-entered search terms, user clickthrough actions, and all system prompts and suggestions (such as suggested spelling changes and author search options) are recorded in custom transaction logs. As part of the grant projects, we've spent a great deal of time looking at the ES transaction logs to better understand user information seeking behavior and to help us determine needed improvements in system functionality. There are a lot of very good searches being done - but there are also some searches that we can improve algorithmically. Our focus is on developing useful automated assisted search techniques, including suggestive prompts, to help users with search query formulation and refinement. The goal is to make Easy Search "smarter" and more responsive to user information seeking needs.

The transaction logs database has yielded a rich amount of information. The base set of 800K user entered search arguments for 2007-2008 provides the following percentages for various search characteristics:

10.4% contain Boolean operators (10.15% are AND, 0.2% OR, 0.1% NOT)
7.9% have Commas
0.1% contain Parentheses
4.9% contain Quotes
20.5% contain Prepositions
9.8% ofeer Spelling Suggestions (31.5% of these are clicked)
0.06% use a plus(+) symbol
0.05% are in a question form
40.4% of searches are Follow-Ups from the pull-down menu of keyword, title, author search. Author is 10% of these
6.7% offer Author redo link (17% are clicked)
3% of searches are from keyword phrase/keyword title/title phrase links 3.3% of searches offer Direct suggests links (60.5% are clicked)

A summary of the transaction log analysis findings and, in particular, the usage of the interactive search assistance techniques, was presented at the Fall DLF Forum in November, 2008.

See http://search.grainger.uiuc.edu/searchaid/dlf_forum_mischo.ppt.

Another presentation on Easy Search was made at the Elsevier Joint Development Partners meeting in November, 2008. See http://search.grainger.uiuc.edu/searchaid/scopus_presentation_mischo.ppt.

 

 

Easy Search Enhancements

 

There are a number of new features and functions implemented in the latest release of Easy Search. More recent changes will always appear in the test system at:

http://search.grainger.uiuc.edu/searchaid3/searchassist.asp. There is also another test system which employs an experimental two column approach with facets at:

http://search.grainger.uiuc.edu/searchaid3/searchaid3columns/sidebar/searchassist.asp.

 

The new functionality has been tested with users and features have been added as user feedback has been received or key problems identified from the detailed transaction log analysis.

 

We have added Springer E-Books, the Center for Research Libraries (CRL), and the Discover SFX A to Z list as search targets. The Springer search results display two different result links, one for matches at the book chapter and the other for matches at the book title level. We plan on applying this capability of performing multiple searches within the same resource to other targets, including the Voyager Catalog, in the near future.

 

One of the most requested added features has been the ability to perform follow-on user queries from within the Easy Search results page. We've added a results page search box that retains the user's most recent search and includes the capability of selecting a keyword, title, or author search via a pull-down list.

The transaction log analysis has shown that there are a high percentage of searches, over 49%, of what can be categorized as specific item or "known-item" searches. We are also seeing a higher number of words per entered query than has been typically reported in the literature on web search engines and OPACs, approximately 3.7 words per query. 

 

Search Assistance Features

 

The assisted search techniques we have introduced fall into several categories. The "Do you mean" spelling suggestion feature has been present from the beginning and has been heavily used. Our transaction log analysis revealed that approximately 11% of all searches contained spelling errors or typos. This is consistent with observations made in several "next generation" bibliographic search systems which have also identified spell checking as a critical interactive search feature.

 

In addition, we have added several new assisted search features, all of them based on our transaction log analysis of search terms and retrieved search results. The ES system looks at all user entered search queries and attempts to identify search term semantic and syntactic patterns. This is critical in user single-entry search box systems such as ES--as opposed to systems offering fielded search. One specific pattern check is for user-entered author names. If a user enters a string such as "Smith, Robert A.," or "Robert A. Smith", Easy Search will do a keyword search but will also ask the user if they want to re-do the search as an author search. If the suggestive prompt link is clicked on, it will then resend the search in the custom author search format expected by each of the targets.


The author search pattern algorithm checks for the following term patterns:

1. Robert A. Smith
2. Smith, Robert A.
3. Smith, Robert
4. Smith R. A. (ignores capitalization)
5. Smith RA (looks for capitalization to distinguish from 3.)
6. Smith, RA (looks for capitalization)
7. R. Alan Smith


The system also performs a database lookup on both complete and partial user-entered search arguments looking for frequently searched or custom search terms and offers the user direct links to specific resources. For example, if the user types in "facebook" at the Easy Search prompt, in addition to searching "facebook" as a topic in multiple targets, it will offer a direct link to Facebook. In the same way, if the user types "IEEE conference on wireless computing" in the single-entry search box, the system will identify the word "IEEE" in the search string and provides an additional direct link to the IEEE Digital Library. We have populated the direct links database with frequently entered terms and queries drawn from the first and second semester transaction logs.

We have also developed additional functionality which intervenes during the search process to offer search refinement suggestions. We are offering a suite of automated assistance functions that are designed to help users reduce search results (this is the well-known "Too Many Retrieved" problem). If a keyword (or later a title) search retrieves a large number of matches, the user will receive a context-sensitive prompt to reduce the number of results by repeating the search as a keyword phrase, a title word search, or a title phrase search. They can also add additional terms in the results page search box and in certain situations will be prompted to do so.

We are also testing several mechanisms to deal with the "Too-Few Retrieved" problem, including the display of broader subject term facets, the selective elimination of words within multiword phrases, automatic truncation of words, and searching non-selected databases. We are testing a "dark target" background search capability to identify unselected targets that might provide relevant information for users. Here, a number of unselected targets are searched in the background and their result matches are displayed only if the selected subject targets retrieve zero or a small number of matches and the dark target searches resulted in hits.    


Also implemented is the NISO MXG (Metasearch XML Gateway) search capability. This provides a REST-based SRU function that returns XML to the invoking application. For example, the URL:

http://search.grainger.uiuc.edu/searchaid2/saresultsug.asp?version=1.1&query=illinois football&db=acse

returns the result:

 

<zs:searchRetrieveResponse>

<zs:version>1.1</zs:version>

<zs:resultSetId label="Academic Search Premier">

http://search.ebscohost.com/login.aspx?direct=true&db=aph&bquery=(illinois+AND+football)&type=1&site=ehost-live

</zs:resultSetId>

<zs:numberOfRecords>832</zs:numberOfRecords>

</zs:searchRetrieveResponse>


The resultSetID tag returns the direct URL that will take a user to the result set at the point of completed search.

The REST-based approach allows other developers or third parties to build custom bibliographies or include links to results within pathfinders and other applications. The MXG functionality is being used in the DLF Aquifer American Social History search system and will be utilized by the NSDL Core Integration team and the NSDL Pathways projects to complement content contained in their subject vertical collections.


We will continue to develop enhanced features for the Gateway and the Easy Search system.