Author Archives: puu

SERP: sorting and relevancy

Sorting screenshot

Standard sorting options are offered, although in what we attempted to present as a more language-friendly way. By default, results sort by “best match”, which is another controversial label. Originally, it was “relevancy”, but some staff felt that was too technical a term. Amazon uses “relevance”, Zappos is “Relevance”, Barnes and Noble “Top matches”, and Best Buy “Best match”. Some search types, mostly browsing style, make the label seem ridiculous (e.g. a blank search doesn’t have any relevance).

“Title” and “author” are common, but then we use “newest to oldest” and “oldest to newest” rather than “publication date” or some variation on that. Offering both ascending and descending date order gives the user more control as both offer merits based on the search; a user would want to see a series published oldest to newest and a nonfiction search with the newest materials first (newest to oldest). Unfortunately, each time the sorting is changed, the page must refresh rather than simply the ordering of the result set. More dynamic interface reactions is highly desirable as long as it isn’t at the cost of accessibility and usability.

In my opinion, relevancy is where our catalog shines compared to other library catalogs. We built boosts in for specific fields so a main title has more “weight” than a subtitle, which then has more weight than an additional title. Likewise, the main author entry has more weight than an additional author. This type of boosting should be fairly common for catalogs, and are easy to manipulate because most of the data is in the MARC record. However, most library catalogs do not account for a major facet in relevancy: popularity.

Boost code screenshot

Popularity can be difficult to capture, unless you have more control over the indexing than a typical catalog administrative interface allows. And it becomes much more complex with digital content, which do not have “physical copies” but rather “digital copies” that MARC cannot easily account for and are mostly not accounted for. To create a useful popularity boost, two numbers are needed: number of requests and number of copies. And number of copies should include the number of “future” copies, or those on order. For instance, when Prince passed away a couple weeks ago, we had 2 copies of a “best of” album. Within a day, the number of request skyrocketed, and the library acquisition team immediately saw the need and ordered 30+ more copies. So even though the album was many years old, we can accommodate the results to best reflect the popularity of each title.

Popularity results screenshot

So how does one measure the popularity of a digital title? Should a physical equivalent be taken into account when considering a digital alternative’s popularity? Probably, and thankfully our users use our digital copies enough to more often than not place the digital version of a title very close to the physical version. As it is, the system treats them as separate titles and not of the same work.

Digital title popularity screenshot

OverDrive is our largest supplier of digital titles, and their API is also the most robust. In fact, they offer many APIs to build tools with, and the one that aids with this particular feature is the Library Availability API. It includes the copies owned and number of holds metadata. Our service harvests this data and populates our index mostly for the use of the popularity boost and the availability limit. So despite our primary library system not having the holdings data, our index does and can be utilized to properly boost digital titles.

Digital title popularity screenshot

Disclaimer

SERP: limits

The search engine results page, or SERP, contains a lot of data and functionality, so I am breaking this down into multiple sections.

Applying post-search options to a query could be called by many names; facets, filters, limits, refine, and narrow/broaden are the most common. From a technical perspective, facets is what we as developers referred to this feature as. However, for the interface we chose “Limit search results”. At the time it felt like the least technical terminology that could be understood by a broad and diverse audience, but ultimately it may simply be preferential. We have had suggestions to rename it as “filter”, and larger websites often use the term “refine” (e.g. Amazon, Barnes and Noble); I would think this is another instance that the label may become irrelevant if the interface becomes intuitive enough.

The limits, like most sites, display on the left side of the results on desktop and have primary headings. We included tooltips for each heading to provide more explanation of each category. The total number of limits in each limit set capped at 60 for those whose size and options were not predetermined. This applied to author, series, special collections, and subject (broken into four categories) limit sets. This only became a potential issue then for very broad searches. When any limit set is “opened” (using accordions), the display initially only shows the first five results, sorted by how many “hits” each option has in the results. If the limit set has more than five options, the user may toggle the view to load to remaining options.

The limits sets are:

  • Format
  • Checked in at
  • Age level
  • Language
  • Author
  • Publication year
  • Series
  • Fiction/Nonfiction
  • Genre
  • Topic
  • Place
  • Time period
  • Special collections

When initially launched, we had one more limit set called Bestseller express, for a service that ended shortly afterwards. The service allowed patrons to pay a small fee to borrow a bestseller book to bypass the waitlist on the copies that were free to borrow. However, it was determined the library would simply buy more copies of bestsellers rather than offer any charged service. Genre, Topic, Place, and Time Period are all broken down Subjects from the 6xx MARC fields.

Top limits

When determining the limits, three types were identified as the most often sought: formats (Format), location (Checked in at), and audience (Age level). These three limit groups are open by default, displaying the first five results in each. Format is straight-forward, although we had a recurring feature request to break down the econtent formats. A draft idea of this was to have the already existing “eBook” format (when selecting at the top level would select all sub formats), but then have a subset of formats (e.g. EPUB, Kindle, PDF) underneath that could be unchecked or checked. This feature would be very helpful for econtent users, since they are limited by their device’s format support. We have 23 listed formats (listed here by size in the total collection):

  • Book
  • Government document
  • eBook
  • Online government resource
  • eJournal
  • Music CD
  • Audiobook download
  • Printed music
  • DVD
  • Audiobook on CD
  • Large print
  • Magazine
  • Microform
  • Online book
  • Video download
  • Mixed media
  • Newspaper
  • Map
  • Braille
  • Audiobook on tape
  • Videotape
  • CD-ROM
  • Music on tape

Because Hennepin is a federal depository, we have a substantial government documents collection represented by the “government document” and “online government resource” formats. “Online book” is a slight catch-all primarily used for full text reference material. And the last four listed are the embers of “nearly extinct” formats.

“Checked in at” is a more nuanced limit set. Rather than based on owning library, this is based on availability. This approach seemed more useful because if a title is unavailable (especially because HCL does have “floating” collections), it’s owning library is mostly irrelevant. If the user wants an unavailable title, the copy’s “home” location has little bearing to the user if requested; it will be sent to whatever library the user chose during the request process. Another layer of complication with this limit was reference, or in-library use only, materials. Because this limit is based on availability by location, materials that can only be used in the library are misleadingly “available”. A toggle was added to include in-library use only materials (a checkbox at the top of the limit’s list labeled “include in-library use copies”), but users found it mostly confusing during testing. It is only applicable when a location is selected (otherwise the results will include all titles). The limit options include each library location as well as “online” and “anywhere”.

The third primary limit set is age level, and this is determined by the collection code. Collection codes were constructed using a pattern that allowed us to use the codes intelligently in the data indexing, generally audience + fiction/nonfiction (when applicable) + format. From that, we get our four primary age levels: adult, teen, children’s, and “easy”. The last one, “easy”, has been debated, since it isn’t an age level but rather a reading level. It includes picture and easy reader books. In the context of the other three levels, it can be deduced what might be found using this option, but it would be better to either make this limit “reading levels” or “age levels” and not mix them.

Subject-based limits

Limits screenshot Limits screenshot

The most confusing limits are the subject-based sets of genre, topic, place, and time period. The confusion lies not only in the mixed “quality” of the catalog data captured there itself, but also in the naming of the limits. For instance, “Place” was often misunderstood as what library the title was available at, and “Time period” as the publication date. Once opened, usability participants found the options confusing. We originally were going to put “subject” in front of each of these limit sets, such as “Subject: Time period”, but this felt too jargon-y and not as something that would clarify to most users the difference between time period and publication date.

The issue with the data itself comes from some ambiguity in the cataloging rules themselves. Form and genre can be intermingled, so one might find “downloadable books” next to “historical fiction” or “Minnesota music” when browsing the “Genre” limit. This muddies the effectiveness and clarity of these limits for the user, which makes them not terribly useful.

Other limits

Again, because of the way collection codes were made, we could break down fiction vs. nonfiction. However, it only applies to books. To extend this limit to include DVD (e.g. documentary vs. feature films) would have taken more algorithmic work when indexing data and may have been impossible to do with the “genre” breakdown currently being used (i.e. does not include “documentary”).

Another modified limit set is “publication year”. Rather than giving the user two inputs to enter a year range, we chose five publication date options: current year (which does include the previous year based on some ambiguity in publication dates), last 3 years, last 5 years, last 10 years, and over 10 years. Unlike the other limit sets that employ checkboxes (to allow multiple options to be selected), this limit set uses radio boxes so only one may be selected at a time. Unlike standard radio box functionality, however, a user may remove the limit by clicking on the radio button again.

The series limit has not proven as useful as one would hope, but it does offer the user the ability to narrow the results to any series that are present. So if searching for “goosebumps”, the series limit offers: Goosebumps, Goosebumps HorrorLand, Goosebumps most wanted, Give yourself goosebumps, and so forth. More confusion with this limit in that it IS still a limit and will not display the full “Goosebumps HorrorLand” series if selected, only those that would be present in the search results. Most usability participants expected otherwise.

And the final limit to discuss is Special Collections. This limit often “disappears” when search results have nothing that is defined as from a special collection, so to use the previous example, “goosebumps” would not have this limit. However, a search for “minneapolis” will present the limit with several non-predetermined options such as: “Minneapolis History Collection”, “Book Arts and Fine Press Collection”, “North American Indians Collection”, “WWII Collection”, and so forth.. This limit is an initial foray to tie in less standard collections into the catalog that already had MARC records in our system. The intent was to expand the catalog’s ability to ingest digitized collections (once they had been digitized) and to research how best to index more in depth metadata than a MARC record such as that presented in EAD (encoded archival description) finding aids.

Mobile

Mobile limits screenshot Mobile limits screenshot

For mobile users, limits are more cumbersome to use. For both desktop and mobile users, each time a limit is selected, the page reloads. However, on a mobile device, that then means the limit menu “disappears” and must be re-opened if more than one limit was wanted. The limits slide in from the left and squish the responsive design, which allows the user to see the results still but not easily (especially on phones in portrait mode).

Ideally, the results would respond more dynamically to limit requests, so the page did not require a refresh to update the results. Users expect it to happen without disrupting their flow, either with requiring repeat of steps or waiting to select or un-select options. This is often done with client side technologies such as AJAX, but we didn’t have time to research how to manage this with such large result responses.

Disclaimer

Search box

Search screenshot
Mobile search screenshot

“Catalog”

The word “catalog” is a library device that refers to a system of indexing an inventory of materials that are available in a collection and displaying rudimentary characteristics of an indexed item. Prior to the redesign, “catalog” was a link to the native catalog’s interface and acted as a main access point to the primary feature of the library’s online presence.

Ideally, the search box on a library site is synonymous with “catalog”, and hence a visible label shouldn’t be necessary (still recommended to use a < label > field but hidden to provide context for screen reader users, e.g. < label class="sr-only" >Search< /label >). However, most libraries have at least three primary searches: catalog, website, and online databases (most of them via a single interface). To make a single search for a library, the preferred interface is a bento box. I would suggest it needs to be more algorithmically intelligent than that, but that is currently beyond the scope of this project.

Because of our three-search situation, we ended up with a tabbed search feature with the label “Books, movies, music”. I have several ideas for moving away from tabs, which have poor visibility and clarity (the “website” label in particular, since its scope is quite ambiguous to most users that consider the catalog and other modules such as events to be part of the library website, or they think it’s similar to a Google search engine to find websites). The “catalog” label represents the top three popular generic types of materials.

Indices

All of the search options are keyword, so no “title starts with” or other obscure options most search engines buried long ago.

By default, the index is set to “All”. This index contains an incredible amount of data, which I will go into detail in a later post. To summarize, it includes all the bibliographic data (everything in a MARC record from 010-899 a la SolrMARC), item data (primarily holdings data), and external data from vendors such as Syndetics, NoveList, and OverDrive. We then apply general boosts to specific fields, and a special popularity boost that makes an incredible difference. Users may search by call number, ISBN, keywords, or simply submit a blank search (this will return the most popular titles in the system due to the popularity boosting).

Because we did not provide pre-limiting options (the most desired by users were location and format), we instead included the fixed limit options as specialized query parameters. For example, “harry potter dvd” would boost the dvd format limit.This followed in line with the idea of natural language processing, but pre-limits would have helped to define the scope of the tools available (e.g. the user would know “dvd” was an option because it would have been listed). Synonyms were intended to be added, so a search for “harry potter movie” would have also boosted movie formats such as dvd. Fixed-value limits that can be queried like this include: format, age level, language, and fiction/nonfiction.

The other index options are mostly standard: title, author, subject, and series. Series is based on book series and had some lack of clarity or at least functionality that users expected. What is indexed are the official names a series has (MARC 4xx and 80x-83x), but users wanted not only the series titles but the titles themselves (MARC 20x-24x). So a search for “game of thrones” would return a result. However, in order for A Game of Thrones to come back in the results, the user currently has to know and enter “song of ice and fire”.

Authors originally did not include cross referencing as well, which was added during the beta preview to staff (e.g. searching “robert galbraith” would return “J.K. Rowling” titles). However, the cross-referenced data is not displayed in the record anywhere, so users at times struggle to understand why an entry would be in the result set. We never had an opportunity to dissect cross-references to determine if we could find a reasonable display, since it contains the vast amount of data including spelling variations for multiple languages.

Advanced search

More search options screenshot

When considering more complex search functionality, the initial concepts included an advanced search page. However, looking at industry equivalents such as Amazon and Barnes and Noble, finding an advanced search page was a painful task. If one existed, it was buried. Our competitors relied on smart algorithms in their basic search to manage the bulk of the usage, and that is what library users also have come to expect.

Our algorithm for searching is good in some aspects, but the search itself lacks some basic features such as spellcheck and auto-suggest and we could store even more data and synonyms. An advanced search was determined useful, but the search and more search options became a part of every page rather than have its own home. Labeled “more search options” for plain language, it is a downward-opening tab on the “books, movies, music” tab. The tab itself is lost in the low contrast sea of blues and greys, one of our ongoing accessibility problems.

The more options box appends four more index-specific input fields, a brief instruction for using wildcards, the ability to “clear all text”, and an additional search button (both search buttons would submit the full form). The basic search does also have a clear-text-from-input-field “x” that was introduced with mobile devices to aid in clearing fields with a single tap, but the “clear all text” button removes all the field’s values.

If a different index was selected for the basic search, such as “title”, as soon as the user engages one of the more search options fields, the main search field reverts back to “all”, locks it in, and migrates the value that was there into its respective input in the advanced search form. This allows the user to explore the options without switching to it without intent.

Users, in usability testing, thought format and location limits would be in the basic search’s index dropdown or in the more search options.

New search

Search with limits
New search modal

One feature native catalogs consistently offer is the ability to perform a “new search” or “start over” (using III’s Millennium interface as an example). However, we observed NO ONE used the button unless explicitly instructed to. If performing a string of different queries, nearly every user simply changed the keywords. So the button got the boot. It didn’t have anywhere to link to anyway, since the search box didn’t have a home of its own.

But what about the searches we asked users to perform that encouraged use of limits? The limits were applied, but did the user intend to keep the limits and simply modify the existing keyword query, or was the user performing a whole new search? And did that new search intend to stay within the selected parameters (e.g. looking for this year’s English cookbooks to a new search for this year’s English dvds)? Who were we to tell the user how to do their searching?

My solution was to ask the user directly. If no limits were applied, changing the query doesn’t matter. We take the cognitive load off the user if they did apply limits and did not remember to remove them when switching gears by checking if limits were applied and if the query itself changed substantially (that is, if the root query was still in the search string, we leave the user alone). However, if limits were applied and the base query changed significantly, a “new search” modal comes up.

The modal presents the limits and offers the ability to uncheck any of them or simply remove them all by clicking the “Search no limits” button. For ease of use, the “Search with limits” is already in focus so power users (staff are most likely to find themselves in this situation when helping patrons find materials) can perform a search quickly hitting the enter key once to submit the search and once more to accept the “Search with limits” option.

When the “new search” button was removed and this put in its place, I was uncertain how users would react to it. Amazon doesn’t do it, and in fact I could find no search interface with limits/filters that gave the user the option at all. Most, when the query changed, would simply clear the limits. The feedback was initially mixed, but many usability participants when given this option expressed delight. Improving the change detection algorithm (written in JS) would make it an even more useful tool.

Ideas based on feedback

Remove the tabs and change to a select dropdown. The tabs in mobile especially caused confusion, so utilizing the dropdown instead provides more expected functionality. Initially, some users (primarily staff) also missed the “title begins with”, which would be a “browse search”. At one point we had considered implementing it, so I integrated the concept into the design based on Northeastern University’s interface: http://onesearch.northeastern.edu

Basic search concept
Mobile basic search concept

The basic search eliminates the specific index search for the catalog and improves the algorithm instead. The dropdown is next to the action button because the thought order is different. Instead of thinking “the title I want is To Kill a Mockingbird”, the scope becomes a secondary feature that one can consider right before submitting a query. Depending on the strength of the single-search (“all”) interface, it would be first, or last. This is arguable, but then again, this is only a concept lacking testing.

To the right is the ability to toggle the more search options or into browse search mode. Placing the links next to the search rather than below at larger screen sizes saves some vertical real estate.

More options concept
Mobile more search options concept

The more search options is extended to include pre-limits as well as the ability to choose what index to add search terms to. The tool would mostly support staff, but some power users expressed interest in materials distributed by specific publishers (such as Christian fiction and manga titles) or the ability to more specifically select publication ranges. These users are very few, but employing the UX concept of edge cases becoming stress cases, the existing functionality would not be obvious (one would need to sort by publication date and slog through the results).

Browse search concept
Mobile browse search concept

For the search browse, a new search box with a query types dropdown would replace the main search. The results would simply “drop” the user in the middle of a large referential index with cross-reference data available primarily from authority records. The links would then execute a keyword query.

These ideas remain conceptual and have not had any testing to provide proof of concept.

Supporting patrons experience homelessness, extreme poverty, and high mobility

The purpose of this research was to determine how web technology impacts user experiences in libraries for often overlooked groups of people. For ease of describing these audiences, I will be grouping users experiencing homelessness, extreme poverty, and high mobility (such as teens couch hopping) as the “disenfranchised”. This loosely ties in those who are experiencing physical and mental accessibility concerns.

This is a slightly modified version of an internal proposal for the library-built catalog so does not cover the full spectrum of modifications that could be made to improve the online catalog experience. I may come back to edit this so the concepts and ideas are more flushed out.

Research summary

Disenfranchised users often have mobile devices and use them as a primary source of data and communication. They are likely going through traumatic and stressful points in their lives.

Chronic homelessness in the U.S. is a very small percentage (7.3%); many who are experiencing homelessness are in a temporary situation and working to get back into a home. See The State of Homelessness in America report for more details.

Based on a survey conducted by Silicon Valley-based Community Technology Alliance in March 2013, 69% of low income and unsheltered people (498 surveyed) owned a mobile phone, and of that 54% had data access.

A study from 2011 surveyed 169 homeless youth in Los Angeles and found that 62% had a mobile phone.

A fantastic study done in San Diego looked into plans as well and found most mobile phones were prepaid with unlimited text and talk. Cricket was the carrier of choice likely due to its physical proximity (physical locations), which offers a basic plan at $40 monthly with 2.5GB data plan. According to Cricket’s phone plans, data access can start at 4G speeds but are reduced to 128Kbps when the allowance is used up.

According to the Census Bureau, those in near poverty have these demographics: they have less than a high school education, more women than men, they are predominantly black and young (under 18).

Minnesota saw a 6% increase of homelessness in 2013. The Wilder report from 2012 (focusing on Minnesota specifically) identified that 42% of homeless adults are white and 38% are black. Most (77%) that are 18 and older have a high school diploma or GED. Violence is often associated with homelessness, either experienced before (as a cause) or during (almost 20% were physically or sexually assaulted). For youth, a common cause is sexual orientation (around 40% for 18-22 year olds). Eviction is the primary cause for homelessness (38%), followed by reduced hours at work.

Technology considerations

One noticeable theme is that many of the recommended features also improve the experience for the general public, so while they should benefit this specific audience greatly, many of these would be beneficial to implement in general.

Search

Change the catalog to keyword based querying, allowing for more natural language queries. This shift makes content more accessible to a wider audiences, specifically impacting those with fewer means and/or less education.

  • Improve natural language processing in searching, e.g. with synonyms (removes the barrier of understanding library jargon, for example a user may type “harry potter movie” and the algorithm can boost the relevancy of DVD/Blu-ray formats for “harry potter”).
  • Build a bento or alternative single search and make all resources findable (removes the barrier of understanding the content breakdown and different search techniques with different search-related products). (I would label the search “all” and the catalog specific search “books and media”.)
  • Build dynamic search suggestions (removes some spelling dependencies and provides immediate feedback before committing to a query). Alternatively, this could load possible queries and the number of results, or even more dynamically act like Google with immediate result updating as the user types.
  • Improved No Results logic (e.g. change AND to OR) paired with spellcheck tool (that checks against author name index as well) to ensure a user never gets a “No Results” page (post-search spelling variations offer search progress instead of dead end).

Performance

The experience of library catalogs on mobile devices has been often cited by users as undesirable for several reasons including lack of a well-designed responsive layout. Especially ignored is the problem with limited data plans.

  • Minify the code and combine all in one file for CSS and one file for javascript (this reduces the amount of data transfer and requests to the server).
  • Image optimization (server tools are available to automate image size optimization, which could be applied to the cover art server as well as the few images used for the website design).
  • Improve the mobile and responsive experience, especially for Android devices. The BootStrap Framework (v3) utilizes functionality for its features such as modals that have provided bad experiences (such as faulty z-indexing layering that results in triggering content beneath the content).
  • Change the method of search engine results page (SERP) layouts so the list display does not load the cover art (to make the transition between the different layouts quick, this was all written with CSS so all the content loads and is simply styled differently; allowing this choice to be stored either in a cookie or in a personalized setting would drastically reduce load time for those who prefer the text only option).
  • Build in search customization/preferences including pre-select limits (e.g. English only language filter), preferred layout, and preferred sort (although the user needs to log in for preferences to take effect, this removes any extra page loads currently needed for post-search modifications).
  • Test My Account pages for someone who may have limited data in a patron record (e.g. lack a phone number, address) to ensure the pages and services respond appropriately.

Extending access (additional considerations)

Larger projects to implement that would increase visibility and provide ease for our more diverse users would include globalization and Linked Data efforts.

  • Globalization, or offering web content in multiple languages, could reach immigrant communities. General studies indicate ESL residents can be a significant portion of struggling community members that also often do not understand the fundamental differences of public libraries in America. If a default global language was selected, algorithm boosts based on language could be an added feature (that could be opted out of) so if Spanish were chosen as the default interface language, Spanish titles would have a higher boost over other languages. The benefit, usefulness, and maintenance problems with this feature would require further investigation.
  • Exposing the catalog’s holding data to search engines via Linked Data could increase visibility to non-library users to find cost-free services for the materials they are seeking. How this might work is a person does a Google search for The Fault in Our Stars book, the content box on Google’s search results page could include not only the summary but the library holdings information.

References

Designs on Mobility: Perceptions of Mobile Phones Among the Homeless (2014)
http://www.melissaccameron.com/project/DesignsonMobility.pdf
http://www.melissaccameron.com/deliverables/designsonmobility.pdf

Cricket Wireless Cell Phone Plans
https://www.cricketwireless.com/cell-phone-plans

Tech Alliance Keeps Homeless Connected through Mobile4All
http://www.sanjoseinside.com/2014/12/26/tech-alliance-keeps-homeless-connnected-through-mobile4all/

A Homeless Man and His iPhone
http://2machines.com/181276/

Here’s What Homeless People Think About Internet Access at Seattle’s ‘Tent Cities’
http://www.geekwire.com/2014/heres-homeless-people-think-internet-access-seattles-tent-cities/

Cell Phone Use Among Homeless Youth (2011)
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3232411/

The State of Homelessness in America 2014
http://www.endhomelessness.org/library/entry/the-state-of-homelessness-2014

Poverty (Living in Near Poverty in the United States: 1966-2012)
https://www.census.gov/hhes/www/poverty/

Homelessness in Minnesota
http://www.wilder.org/Wilder-Research/Research-Areas/Homelessness/Pages/default.aspx

Homeless in Minnesota (2013)
http://www.wilder.org/Wilder-Research/Publications/Studies/Homelessness%20in%20Minnesota%202012%20Study/Homelessness%20in%20Minnesota%20-%20Findings%20from%20the%202012%20Statewide%20Homeless%20Study.pdf

Survey Shows: 5 Key Reasons People are Homeless in Minnesota (2013)
http://www.wilder.org/Blog/Lists/Posts/Post.aspx?ID=63

Hennepin County 2012 Minnesota Homeless Study
http://www.wilder.org/Wilder-Research/Publications/HomelessStudyTables2012/Hennepin-2012-Homeless-Counts-3-13.pdf