“Design, query and evaluate information retrieval systems.”
I’ve met a number of librarians, MLIS students, and library users who have a nostalgic yearning for the card catalogs of yore — the rows and rows of oaken cabinets displayed under an arched ceiling; the long narrow drawers; days of languorous research sifting through cards with your index finger, a stub-length pencil in hand. There is a tactile romance associated with the card catalog that is lost in the modern library. Amongst the most charming legacies of the old card catalogs were the handwritten notes scratched onto the index cards by patrons and researchers – each one hinting at a hidden story or secret. When the San Francisco Public Library shifted to an online catalog, the artists Ann Hamilton and Ann Chamberlain transformed thousands of these defaced catalog cards into an art project now plastered onto the walls of the Main Branch.
Photo © Ann Chamberlain and Ann Hamilton
While it is easy to become nostalgic for the past, we have to admit that we live in the Golden Age of Information. The ability to check a library catalog or journal database using our own, natural language and retrieve relevant, sortable and searchable results in seconds – from home – ranks amongst the greatest developments in the history of the written word. Regardless of specialty or career track, all information professionals need to have a strong understanding of the architecture behind this information access in order to use it effectively.
There are a handful of concepts and terms essential for information professionals to understand when designing or evaluating information retrieval systems. One of these questions to consider is the balance of precision versus recall. Precision refers to the accuracy of a search query. Recall refers the breadth of results – the number of retrieved items. From the perspective of an information retrieval system’s end-user, using more specific search terms leads to more precise results, at the expense of recall. By including more (or more specific) terms, they reduced the number of retrieved items. If the information retrieval system is designed to emphasize precision, the searcher could miss a great deal of valuable results filtered out because it did not meet the specific criteria. Conversely, a system that emphasizes broad recall runs the risk of overwhelming the searcher with endless results that are not always relevant.
Amongst the tools available to the designer to swing the pendulum of precision vs. recall is vocabulary. By adopting a strict, controlled vocabulary, the designer can emphasize high precision in retrieved results. However, a system with a highly controlled vocabulary can be challenging for inexperienced users, unfamiliar with the language required; it is also inflexible, especially as new terms emerge and definitions change. Information retrieval systems that employ free-text or natural language (such as virtually every search engine for the World Wide Web) for its search mechanism are simpler to use, more adaptable to changing language, but hindered by inconsistency and confusion over homonyms.
At the heart of all information retrieval systems is the database. A computer database is a set of files structured in such a way that it can communicate with a retrieval interface. As database construction is a practical pursuit, it is best learned by doing – building one from scratch, creating the terminology and definitions from the ground up. I had just that opportunity in LIBR-202, Information Retrieval with instructor Kristen Clark. Working with a partner, Jessica Lohr, we created a simple but effective database to organize a collection of kitchen dishware. Categories had to be created for styles, colors and uses using consistent terminology. I’ve attached two related documents as evidence of my ability to construct a database. The first is a paper detailing our design process, our take on database design theory, and the decisions we made in creating the database. The second is a query run on our database (built with Inmagic’s DB/Textworks), with a complete listing of the database contents.
The ability to operate information retrieval systems is essential to all information professionals regardless of expertise or career niche. Not only is the need to retrieve information a frequent responsibility, it is also impossible to design a database system or evaluate one without practical use. During LIBR-210, Reference and Information Services, I participated in a series of exercises designed to test our ability to query information retrieval systems (while couching our responses in the language employed in a reference interview environment). I am attaching two of these exercises as evidentiary support of my ability to query information retrieval systems. These provide an interesting contrast. Both feature a number of overlapping questions. With the first, I was required to use paid subscription databases to determine answers to reference inquiries, and with the second exercise I had to use public web search engines. I compare and contrast my experiences with each in my documentation. The results of these queries reflect the points made about precision vs. recall and controlled language vs. free-text searching as described above.
Information professionals also need to be able to critically evaluate information retrieval systems. This skill can be relevant in a number of different applications. For example, a staff librarian needs to be able to analyze subscription-based databases in order to determine the most worthwhile choices for his or her institution. Given the thousands of dollars in subscription costs at stake, the ability to make critical judgments in user interface and search structure, with a technical understanding of what is ‘under the hood’ is a necessary professional skill. Without a concrete understanding of the informational architecture, the librarian could only offer at best a layman’s criticism.
In LIBR-256, Archives & Manuscripts with David de Lorenzo, I critiqued a variety of archival databases. I submit the resulting document as evidence of my ability to evaluate information retrieval systems.
Exhibit E-1: Assignment #1: Meal Planning Database | Available upon request
Exhibit E-2: Dinnerware Database Report | Available upon request
Exhibit E-3: Reference Exercise #1 | Available upon request
Exhibit E-4: Reference Exercise #2 | Available upon request
Exhibit E-5: Reference Resources Review | Available upon request