Citation-Based and Descriptor-Based Search Strategies
This essay was originally published in the Current Contents print editions February 28, 1994, when Thomson Reuters was known as the Institute for Scientific Information (ISI).
In the last two essays, we explained citation indexing and its usefulness in navigating the research literature.1, 2 In this essay we will explore the possibilities of retrieval using key word and cited reference searches.
The Science Citation Index® (SCI®) was originally designed as an alternative approach for retrieval of relevant information; but the concept of relevance is not as simple as it sounds. Relevance, like beauty, is in the eye of the beholder.
Regardless of the initial approach to a search—whether through a key word index or through a citation index—only the citation index will easily permit retrieval of subsequent papers that refer to a specific paper or book that the user has deemed "relevant."
Systems like SCI rely on the judgment of authors and referees who choose references for published papers. In systems like MEDLINE®, the judgment of indexers determines the terms used, and the systems are based on the thesaurus called Medical Subject Headings (MeSH). Since human effort is involved, there is always the problem of consistency from one article to another. And, in traditional indexing, there are economic limitations to the number of headings that can be assigned to each new article.
In any case, thesauri have innate problems in dealing with active, fast-moving fields in which the terminology changes rapidly. Following the example set by the online version of SCI on DIALOG back in 1972, MEDLINE adopted title word indexing several years ago to partially offset this difficulty. Nevertheless, a major complaint about MeSH indexing is that in many cases the generally broader terms retrieve too much information. However, skilled users of MEDLINE can use the standard list of subheadings available to reduce retrieval to a more manageable number of hits. Thus, as an example, compare a search on cancer with a search on cancer epidemiology.
Studies comparing citation-based retrieval with the use of MeSH have been conducted, including an early study by Spencer. She found that in the beginning of a search, use of SCI provided results in a more rapid and efficient manner.3 But to obtain a more comprehensive result, back-up with Index Medicus was necessary. Later studies, including McCain's,4 focus on the complementary aspects of the two systems. McCain found that retrieval by descriptor-based and citation-based searches does not significantly overlap.4 Depending on the subject matter, there are topics for which the use of either a single word or citation may capture 90% or more of the "relevant" literature. While a search on a specific disease can be run by a key word, it is almost impossible to use key words to retrieve every paper that uses or modifies a method or theory.
McCain's study considered 11 search topics—such as interpersonal problem solving, rehabilitation and therapy for aphasia following stroke, and the classical conditioning of drug effects—which were suggested b y researchers. McCain also asked the researchers to identify relevant older contributions that were likely to be cited in more recent work. In either case, the search results were evaluated in terms of relevance and novelty. Interpreting the results, McCain suggests that "subsets of both literatures may be relevant to a given researcher's information needs, serving related rather than identical functions."4
Relevance is a vast subject that deserves a discussion in its own right. Nevertheless, most evaluation studies designed to measure relevance do not capture the significance of "being cited." If you specifically ask whether a particular author or paper has been cited, then any citing paper is relevant. However, a paper on topic A could be cited in a paper on topic B, but the latter might not be deemed relevant in a traditional comparison of A and B (or other papers C and D) since they may not be terminologically connected.
There are countless examples in which two or more subsequent articles will cite a designated paper, but the various citing titles will not necessarily overlap in the terms used to describe their content—neither in the title nor in the key words or abstracts. Whether the citation-based common thread is methodological, theoretical, or otherwise, only the searcher can determine its relevance. Indeed, it is frequently the unexpected connection that may prove to be most relevant—that is, the most interesting. This will vary with the purpose of the search. That is why I often contrast the needs of information recovery with those of information discovery.
If your primary aim is to find the known literature on a topic, then precision of search may be all-important. But if you are interested in finding previously unknown connections, then the system must facilitate your ability to do this without retrieving everything that is published. In traditional searching, this is done by using boolean combinations of terms.
One of the problems with traditional indexing is the inherent delay introduced by using human indexers. To overcome this problem, many journals have implemented author key word indexing. Unfortunately, only about 25% of published articles contain author assigned key words. Thomson Reuters uses these to augment its unique capability to provide derivative indexing called KeyWords Plus®.
KeyWords Plus is called derivative indexing because the terms are derived from the titles of articles cited by the author of the article being indexed.5 KeyWords Plus augments traditional key word or title retrieval to a varied extent—anywhere from 10% to over 100%. For example, using Current Contents on Diskette®, you can search on an article such as "The spectrum of autoimmune thyroid disease with uticaria" from Clinical Endocrinology, and find that the key words UTICARIA, VASCULITIS, THYROID DISEASE, and HASHIMOTO THYROIDITIS are expanded to include the additional KeyWords Plus terms ASSOCIATION and ANGIODERMA. Again, the user is the ultimate filter. When KeyWords Plus® is used in a weekly or monthly file, as with Current Contents®, you can readily filter out the noise from the music. On the other hand, doing an annual search may require further refinement, as mentioned above, by combining one or more words and cited references.
Both descriptor-based and citation-based systems have unique advantages. In the next installment, I will illustrate by example how these systems can work to narrow or maximize search results.
Dr. Eugene Garfield
Founder and Chairman Emeritus, ISI
1.Garfield, E. The concept of citation indexing: A unique and innovative tool for navigating the research literature. Current Contents® (1-4):3-5, 3-24 January 1994.
2.----------. Where was this paper cited? Current Contents (5-8):3-5, 31 January - 21 February 1994.
3.Spencer, C C. Subject searching with Science Citation Index®; Preparation of a drug bibliography using Chemical Abstracts, Index Medicus, and Science Citation Index 1961 and 1964. Am. Doc. 18(2):87-96, 1967.
4.McCain, K W. Descriptor and citation retrieval in the medical behavioral sciences literature: Retrieval overlaps and novelty distribution. J. Amer. Soc. Inform. Sci. 40(2):110-4, 1989.
5.Garfield, E.KeyWords Plus®: ISI®'s breakthrough retrieval method. Part I. Expanding your searching power on Current Contents on Diskette®. Current Contents (32):5-9, 6 August 1990. (Reprinted in: Essays of an Information Scientist.) Philadelphia: ISI Press®, 1991. Vol. 13. 295-9.