Tailoring the Search
This essay was originally published in the Current Contents print editions March 28, 1994, when Thomson Reuters was known as the Institute for Scientific Information (ISI)
So far this year, we have explored the usefulness of citation indexing in conducting searches and its relationship to descriptor-based systems.1, 2, 3 In this essay, we will examine ways to narrow or maximize search results.
The Search Query
Textbook descriptions of searching tend to idealize the searching environment. In reality, most scientists start a search without developing a formalized search query. But often the maxim that "fools rush in where angels fear to tread" may, in fact, produce interesting results.
The Role of the Intermediary
Unfortunately for the scientist, turning an ill-defined search over to an intermediary—such as a librarian—does not always solve the problem. As I have often said, the librarian may be handicapped by a lack of specialized knowledge. The end user is handicapped by a lack of system knowledge. Often the best result is obtained when they work together.
The number of different types of searches is huge. In theory, every paper ever published treats a "different" topic. Actually, most are similar or related to previously published papers or books even though the idealized paper in science is absolutely original and thereby different by definition.
Librarians have to deal with a kind of ambivalence on the part of searchers. While scientists are eager to uncover previous work in their field of study, they also hope the search will prove that their "original" ideas are unique. Unfortunately, the latter often proves not to be the case, and the intermediary may be the bearer of bad news.
Depending upon the searcher's sophistication, it may or may not be surprising that there is an abundance of information available on most subjects. In a college or public library, questions are asked that generally have been asked many times before. On the other hand, the typical Current Contents® reader is a research scientist and should be expected, more often than not, to formulate questions that have not been asked many times before.
Many researchers follow the example of Nobel laureate Peter Medawar who chose to ignore the literature until after the initial phase of exploration was over. In that case, the literature search is done when the resulting paper is about to be submitted for publication. Such researchers will begin to document the paper by recalling—essentially from memory—a dozen or so papers that are relevant to the background of the putative paper. Even for this process of recall, unless the researcher has stored reprints of the to-be-cited papers, indexes need to be consulted for verification.
Verification is a very significant part of the intermediary's work—many physicians and researchers routinely submit their bibliographies to librarians to assure that the references are accurate. As the literature amply demonstrates, this practice is far from universal.
Most journal editors would argue that any reference cited should be consulted in the original. But that does not always occur. It is hard to believe that the 8,000 authors who cited the Lowry method last year consulted the original.4 The same would be true for such methods as the polymerase chain reaction (PCR)5 or the Watson & Crick 1953 primordial paper on the double helix in Nature.6
Ideally, every searcher would check the Science Citation Index® (SCI®) to determine if and where each of the references they used is cited in the current literature. This would, from time to time, turn up current papers that have modified the results or methods reported in those earlier papers. I am not, however, suggesting that if you use PCR or Lowry you read all of those current citing papers.
Actually, except for the few hundred papers that have been cited 1,000 times or more since their publication, most other papers (including methods) that are cited at all are cited only a few times each year.7 Indeed, if you refer to an older paper that is rarely cited, there is all the more reason to check the most recent citing work.
Users have different goals for their searches. To tailor the search to specific needs, the user can employ different means to control the amount of information retrieved. Both high recall (or expanded) and high precision (or narrowed) searches are possible.
High Recall Strategies
In a search designed to expand the search results to the appropriate maximum, any one or more terms can be a useful starting point for a search. Back in the '60s, we and others determined that most keyword searches involved a combination of two or three terms. As reported in JASIS, Irv Sher designed the Permuterm® Subject Index (PSI) so that you could most easily search a combination of two or more terms.8 You start by looking up term A and then, depending on the number of papers indexed, you go on to use term B.
Although some researchers have little difficulty scanning a list of 50 to 100 references displayed on a PC, the same task using a printed index may be a formidable one indeed (that is, unless an intermediary has copied out all the references for you). Having scanned such a list, one then zeroes in on the most relevant paper. In this process, one is often reminded that there is alternative terminology which may be even more useful for the search at hand. Thus, if I were doing a search on the term CITATION, the PSI might remind me to consider REFERENCE as an alternative. In fact, one of the beauties of PSI—not yet achieved by any existing computer search method—is that it displays the full complement of co-occurring terms. Thus, one can see CITATION CLASSIC, CITATION ANALYSIS, JOURNAL CITATION, and CITATIONS displayed.
If your initial search produces relatively little, then you can use several devices to expand the search. Various associative techniques are available. In MEDLINE®, one can use the category search. Each subject heading in MeSH (Medical Subject Headings) has been assigned to one or more categories. Thus, INSULIN COMA might be expanded to HYPOGLYCEMIA.
High Precision Strategies
In a search designed to make the result highly specific, the boolean operators AND and NOT can be used effectively. For instance, searching on CELL OR MEMBRANE will result in a large number of hits—thus expanding the search—but searching on CELL AND MEMBRANE will focus the results significantly.
When you look under the term POLYMERASE in the 1985-89 SCI® you will be overwhelmed with eight full pages listing thousands of entries. However, if you look up CHAIN REACTION in the same index, there are only about two pages of entries. When you look at CHAIN REACTION-POLYMERASE, the combination produces about two columns of entries. By this method, as you scan, you immediately see that the term PCR also occurs with CHAIN REACTION. This process can be extended to any one of dozens of terms that co-occur with CHAIN REACTION, PCR, or POLYMERASE. If you want to know whether PCR has been used in studies of hepatitis B, you immediately see that there are a half-dozen such entries. The analogous process on computer is performed by using the boolean operator AND. However, you must be able to name the terms.
The online and CD-ROM versions of SCI permit you to do hybrid boolean searches. These are unique to citation-based searching. You can combine a cited reference search with a keyword search to find the articles that link PCR and hepatitis B. An initial search on HEPATITIS B in 1991 yields 702 titles. Performing a cited reference search on the seminal 1988 paper on PCR5 reveals 1,770 citing papers in 1991. A search that combines these two sets identifies 28 papers.
In the SCI Compact Disc Edition with abstracts—and in all of the other citation products made available by Thomson Reuters on CD-ROM—there are other methods for narrowing searches. Using the Basic Index mode actually combines four types of indexing: title word, author keyword, KeyWords Plus®, and abstract text. If this search method produces too many hits, one can narrow the search by limiting the search to title words. Alternatively, you could start your search with author keywords and expand to the full basic repertoire as required.
If you are being inundated by literature on a given topic, you may want to arbitrarily limit your results by specifying the English language. This approach could, however, cause you to miss important papers. One option is to segregate the results, scan the outputs separately, and then—based on title or abstract—decide whether it is worth perusing the foreign language papers.
In both descriptor-based and citation-based systems, there are many methods available to either increase or decrease the extent of the search. The methods do vary, as do the systems, and the results may be overlapping but never quite the same. In the next essay, we will look at the interplay of citing and cited references.
Dr. Eugene Garfield
Founder and Chairman Emeritus, ISI
1. Garfield, E. The concept of citation indexing: A unique and innovative tool for navigating the research literature. Current Contents® (1):3-5, 3 January 1994.
2. -------------------. Where has this paper been cited? Current Contents®(5):3-5, 31 January 1994.
3. -------------------. Citation-based and descriptor-based search strategies. Current Contents® (9):3-5, 28 February 1994.
4. Lowry O H, Rosebrough N J, Farr A L, Randall R J. Protein measurement with the folin phenol reagent. J. Biol. Chem. 193:265-75, 1951.
5. Saiki R K, Scharf S J, Mullis K B, Horn G T, Higuchi R, Gelfand D H, Erlich H A, Stoffel S. Primer-directed enzymatic amplification of DNA with a thermostable DNA-polymerase. Science 239:487-91, 1988.
6. Watson J D, Crick F H C. Molecular structure of nucleic acids. A structure for deoxyribonucleic acid. Nature 171:737-38, 1953.
7. Garfield, E. The most-cited papers of all time, SCI® 1945-1988. Part 1A. The SCI top 100"will the Lowry method ever be obliterated. Current Contents (7)3-14, Feb. 12, 1990.
8. ------------------. The Permuterm® Subject Index: an autobiographical review. J. Amer. Soc. Inform. Sci. 27(5-6):288-291, 1976.