Iain Fletcher
The Role and Limitations of Relevancy Ranking

In his series on enterprise search, Iain Fletcher, VP of marketing at Search Technologies looks at relevancy ranking.

This article looks at relevancy ranking, the second of three key areas of search technology innovation.

Early search systems did not attempt to order documents by relevancy. Instead, the searcher would submit a search clue and the system would calculate how many results matched the query (without displaying actual results). The user would then repeatedly add terms to the search clue, typically using the Boolean language, to narrow the scope of the search, until the number of results to be returned was manageable. Only then would the system provide a listing of results to be examined.

Relevancy algorithms changed this process. From the early 1990s onwards, users could instead submit a simple search and immediately view results, ordered to show the most relevant documents first.

Relevancy Criteria

There are hundreds of criteria that can be used. All leading engines use multiple criteria in various combinations. In this article, we’ll limit ourselves to just a few common approaches:

Criteria based on document content:

  • Completeness: Does a document contain all of the search terms, or just some?
  • Proximity: Are the search terms found close together, or spread out across the document?
  • Density: What percentage of the document are words from the query?
  • Title Text: Do query words also appear in the title text or in other important metadata?

“Off-page“ criteria:

  • Link Analysis: How many other documents in the data set hyperlink to the document? (This is heavily used by Web search engines, but is much less effective behind the firewall because there are fewer links between corporate documents)
  • Recommendation: Usually based on an analysis of the behaviour of a peer group, or previous behaviour of the actual user

Application-specific criteria:

  • Best bets: An example - always make document XYZ the top hit if the query contains the phrase red wine. (Just like Google AdWords)
  • Boosting: Artificially boost certain items, for example in Ecommerce, boost items that make a higher margin, or items what we have a lot of in stock, and need to shift

An in-depth look at this subject can be found at Relevancy Ranking 301

Relevancy and Subjectivity

Relevancy is in the eye of the beholder. In the enterprise, user needs are diverse and it is hard to satisfy ‘all of the people all of the time’. A specific relevancy setup will always suit some people more than others. Most leading search products offer the ability to customise relevance and enable different ‘relevance profiles’ to be used by different users.

Data growth makes the task of delivering relevant results harder. The bigger the data set, the more search results will match a given query. In addition, users are generally reluctant to browse far down the results list. Indeed, most give up after just 20 or so results. If there are 1,000 results to list, then such a user will only view about two per cent of the matching documents. Even more strikingly, if there are 100,000 matching documents, then users will view only 0.02 per cent, as this graph

Add the data growth factor to the subjectivity of search and it is understandable why many people are dissatisfied with enterprise search. The problem is compounded by the generally high level of satisfaction that users get from Web search. Yet this is not a fair comparison, as Web search engines have a number of important advantages over enterprise search systems, primarily concerning the nature of the data.

In summary, most modern search systems have very sophisticated relevancy capabilities. Depending on the particular application, these may work perfectly well out-of-the-box. However, where you are serving a community with diverse needs, tuning and customisation can help to raise overall satisfaction levels amongst users.

The next article will discuss browsing and navigation of search results - the third and final key area of innovation in search.

