Auditing catalogue quality by random sampling

Dissertation | Introduction < Literature review > Proposed technique

Auditing catalogue quality by random sampling

2. Literature review

2.1 Management tools and performance measurement

The desire for a quantitative measure of catalogue quality falls under the heading of performance measurement, the collection and analysis of data on the delivery of library services (Goodall 1988, Griffiths 1997). Individual statistics, or measures derived from several statistics, are called performance indicators and are often compared against targets or benchmarks. These indicators can, and should, summarise qualitative data such as users' opinions of services.

IFLA guidelines for performance measurement in academic libraries required performance indicators to be appropriate, reliable (that is, unambiguous), reproducible, helpful and practical (Poll & Boekhorst 1996, pp. 18-19). To these might be added stipulations that a test be quick, affordable and minimally disruptive. An objective indicator will enable comparisons to be made with similar tests performed at different times and different sites. Taken in isolation, a quantitative measure may be insufficient to diagnose problems, but it is a necessary first step.

There is a statutory requirement for local authorities in the UK to provide a 'comprehensive and efficient' public library service. Reappraisals of spending priorities brought on by economic restraints spurred interest in performance measurement with a corresponding emphasis on a library's public services. The first officially-approved framework for developing performance indicators for public libraries was the Keys to success manual (King Research 1990). It explained the use of performance measurement as a management tool and suggested monitoring cataloguing processes both quantitatively (number of titles catalogued, speed of processing) and qualitatively (the proportion conforming to standards). A recent consultation document, Comprehensive and efficient: standards for public libraries, recommended that all libraries open more than ten hours per week offer access to the online catalogue and provide 0.7 computers per thousand people served but did not mention the accuracy of the catalogue ('23 steps...' 2000).

Indeed, because cataloguing is not usually seen as a service directly affecting readers, lists of performance indicators often consider cataloguing only in terms of public accessibility of the online catalogue. An exception is the toolbox of over 100 performance indicators developed with European Commission funding which proposed checking the proportion of unsatisfactory records from a randomly-selected day's cataloguing as part of ongoing quality control (Ward et al. 1995, p. 123).

Two alternative methods are given in IFLA's guidelines (Poll & Boekhorst 1996, pp. 70-76). The first asks readers conducting a known-item search to list the bibliographic information they have and to report whether or not they are successful. Cataloguing staff then examine the failures to determine how many result from user failure and how many from inaccurate cataloguing. The second measures the precision and recall of subject searches, again by evaluating readers' self-declared success or failure. Although the involvement of readers confuses the interpretation of results, it can also highlight a need for improved user education.

Formal standards for quality management, of which performance measurement is a part, became fashionable in librarianship at the beginning of the 1990s. Cataloguing was addressed specifically by James (1993), who applied BS 5750 (ISO 9000), and by Khurshid (1997), who outlined the role of Total Quality Management, both emphasising that performance criteria should focus on the needs of the user. Jones, Kinnell & Usherwood (2000) introduced theories of quality management to extend the framework of Keys to success for constructing tools for self-assessment.

For management purposes it may be desirable to evaluate individual cataloguers' work. Training commonly involves review of records by senior staff, who may also perform spot checks on cataloguers. Reeb (1984) proposed totalling the errors found during regular reviews and dividing by the number of items checked as a quantitative measure of quality. Matters of judgement were excluded and one point was assigned for both serious and trivial errors, defended by Reeb on the grounds of simplicity and impartiality. A similar procedure was used at the Library of Congress, with three points assigned for a major error and one for a minor error; this distinction was later removed and only errors in access points were counted (Thomas 1996).

2.2 Database quality

The phrase 'dirty data' was coined to describe erroneous information in databases, especially online bibliographic databases, of which library catalogues can be considered specific instances. In a highly theoretical discussion, Fox, Levitin & Redman (1994) enumerated the aspects of database quality as accuracy, completeness, consistency and currency, and proposed recording the magnitude of an inaccuracy as well as its occurrence. Sources of error range from the input of wrong information to data corruption, but most relevant to the present work are keyboarding errors, taken throughout to include both spelling errors and typing errors. The literature has been surveyed by O'Neill & Vizine-Goetz (1988) and Medawar (1995).

A landmark study in this area is that of Bourne (1977) who investigated the frequency of keyboarding errors in bibliographic databases by browsing a composite list of index terms in three arbitrarily-chosen alphabetic ranges from several databases. Misspellings were identified by inspection and excluded questionable terms which may have been technical or non-English words or abbreviations. Errors were found not to be uniformly distributed but clustered around certain high-frequency words; as many as 23% of index terms contained errors, although expressed as a proportion of citations the figure dropped to less than 1%.

An expensive but effective technique for detecting such keyboarding errors is to require all information to be input twice by independent typists. This method has been used in bibliographic databases and could help retrospective conversion from catalogue cards, but would be impractical for original cataloguing.

Concern about the quality of commercial databases grew at the beginning of the 1990s. Basch (1990) reported on the Southern California Online Users' Group meeting which proposed ten criteria for assessing quality in bibliographic, directory and full-text databases: consistency, coverage/scope, timeliness, error rate/accuracy, accessibility/ease of use, integration, output, documentation, customer support and training and value-to-cost ratio. The Finnish Society for Information Services took a similar enumerative approach to evaluating bibliographic databases, with criteria falling into five broad groups: connecting to the system & communications, search language, content quality, practical help and cost (Juntunen et al. 1991, Juntunen, Mickos & Jalkanen 1995).

The Library Association and the UK Online User Group established the Centre for Information Quality Management to act as a clearing house for reports of poor quality in databases (Armstrong 1995). A study (Armstrong & Medawar 1996) was subsequently undertaken to investigate the type and frequency of errors in commercial databases, evaluating keyboarding errors with unsystematic searches for common misspellings such as 'goverment' for 'government', but the error rate was not quantified. It was recommended that the same tests be repeated to discover whether quality improves over time.

As recommended by Jacsó (1993a), Armstrong & Medawar tested the completeness of a database by searching fields whose presence was mandatory and comparing the total number of records reported; discrepancies would indicate the omission of information on which searchers might rely. They argued that errors occurring in free-text fields such as the title are less significant than errors in descriptors which are assumed to be validated. Other techniques for browsing online databases include checking the frequency of 'see also' terms in thesauri, cross-referencing ISSNs and serial titles for false matches and plausibility searches such as publication dates prior to the twentieth century (Jacsó 1993b).

It was recognised that independent labelling could give the user information on the scope of a database and the quality of its records (Jacsó 1993c; Armstrong 1995, pp. 6-9). The Centre for Information Quality Management's published Database Labels provide objective information such as geographical coverage, the number of records in the database, the time-lag when adding abstracts, the number of fields which are indexed, the proportion of records in which a particular field is empty and the use of controlled vocabulary for descriptors (Armstrong 1996). However, Labels are not designed to quantify the occurrence of keyboarding or other errors.

2.3 Quality in cataloguing

2.3.1 What is cataloguing quality?

There is a profusion of literature on what makes good catalogues, or at least good catalogue records, and the apparent conflict between detail and productivity has been debated since the work of the Centre for Cataloguing Research on short entries (Seal, Bryant & Hall 1982, Seal 1983). Following Poll & Boekhorst (1996, p. 12) in defining quality as fitness for purpose, where the purpose of a library is reader satisfaction, quality in cataloguing means satisfying readers' needs for a catalogue that is comprehensive, current and usable, but not necessarily detailed or even perfectly accurate.

One of those who argued that it was desirable to reduce the amount of bibliographic information in catalogue records was Graham (1990), who explored the relative importance of accuracy and fullness in each part of the record, distinguishing between mechanical accuracy (correct transcription and description) and intellectual accuracy (appropriate description, access points and classification). Mandel (1988) gave a model for cost-benefit analyses balancing extra productivity against lost access.

The expense of original cataloguing has stimulated developments such as minimal-level, collection-level and 'core' cataloguing, as well as co-operative bibliographic utilities, discussed by Thomas (1996) in an issue of Library trends devoted to quality and effectiveness. She presents definitions of cataloguing quality encompassing consistency, depth, appropriateness and timeliness as well as accuracy.

Even if perfection is unnecessary and users are satisfied with only partial success in retrieval, high standards of accuracy are undeniably expected of libraries, and libraries themselves need accurate inventories of their collections. Quality control in online catalogues includes error detection and correction through automated verification of mandatory fields and MARC (Machine Readable Cataloguing) coding, use of the system to mark records needing review, and production of reports helping to identify errors (Hudson 1984). Membership of a cataloguing consortium obliges libraries to maintain quality standards, an obligation balanced by enhanced opportunities such as global changes to headings (Horny 1985, Mifflin & Williams 1991). Readers can be asked to report problems by filling in a form (Hanson & Schalow 1999) or emailing from the OPAC and such procedures are valuable because they catch precisely those items being sought, although under-reporting is inevitable.

Dyson (1984) reported on a computerised circulation database at the University of Hull created with minimal quality control for the sake of speed. Misfiling in the database indicated the presence of errors, so a computer-generated random sample was compared against the printed shelflist. Errors were categorised as inputting errors (transcription from catalogue cards), editing errors (failure to follow instructions) and 'others', and considered serious if they occurred in classmark, author or title fields. There was an unacceptably high average of 2.16 errors per record, 91% of which were inputting errors which could be explained by illegible handwritten cards; other errors arose from truncated information on cards.

Romero & Romero (1992) examined the quality of original cataloguing by analysing errors in over 2,000 records from the University of Illinois at Urbana-Champaign. They counted deviations from AACR2, Library of Congress Rule Interpretations (LCRIs) and MARC format, necessitating familiarity with those schemes for analysis of their data. Unfortunately, the number of records with deficient descriptive cataloguing was not given, but there were 365 errors in total, the majority (35%) occurring in notes fields, followed by extent (21%), place, publisher, date (20%) and title and statement of responsibility (12%). The small number of errors in edition and series fields was attributed to the infrequency of their appearance in records. Punctuation accounted for 272 errors, skewed by the requirement in USMARC to insert ISBD-prescribed punctuation manually.

2.3.2 Impact studies

Impact studies test whether an intervention of some kind has had the intended effect: in this case, online catalogues are compared with the card catalogues they replaced, as in Knutson (1990). A small presample was taken to estimate the error rate from which the size of the full, systematic sample could be determined. Only access points were checked, not any part of the descriptive cataloguing; matching pairs of records in the two catalogues were compared, rather than the books to which they referred, so it was possible to check call numbers. Subject headings were checked for obsolescence against the tenth edition of the Library of Congress Subject Headings (LCSH). The majority of errors in the online catalogue were found in the call number and location fields rather than being typographical errors affecting retrieval.

Following this study, Cook & Payne (1991) undertook a similar comparison, finding their online catalogue to be marginally less intact (complete) and substantially more accurate than the card catalogue. Accuracy was defined as being 'present in a manner that permitted accessibility', that is, errors were considered only in access points. A sample was taken by measuring a random distance into each drawer of the shelflist (assumed to be accurate), finding all related cards in the catalogue and printing the associated online record. Keyboarding and MARC errors were included but differences in capitalisation, bracketing and 'insignificant' punctuation were ignored. Only 1.35% of online records were found to be inaccurate, compared to 5.89% of the cards, with subject headings and titles containing most errors.

2.3.3 Co-operative cataloguing

The quality of records contributed to shared cataloguing utilities has been of interest since their inception, with libraries often wishing to produce a 'whitelist' of institutions from which records can be accepted with little verification. Similarly, libraries where cataloguing is outsourced may wish to compare its quality with in-house work.

Ryans (1978) examined the descriptive cataloguing of a run of 700 records contributed by Kent State University to OCLC. Cataloguing was judged against the first edition of AACR in nine areas; keyboarding errors and omissions were also counted as errors. 40% of the records were found to deviate from AACR, with 56% of these containing only one error. The collation field accounted for 28% of all errors, mostly the possibly intentional omission of height or pagination. 22% of errors were incorrectly established subject headings, and a further 22% were main and added entries, some merely containing keyboarding errors. Other fields each accounted for less than 5% of all errors.

Intner (1989) showed that OCLC and RLIN share almost identical error rates by matching and comparing records from both systems. Errors in quality, being deviations from AACR2, LCRIs and MARC format, incorrect spelling, punctuation or capitalisation, were distinguished from errors of fullness. An average of almost 2.5 errors per record was found, but a majority were deviations from AACR2 or LCRIs, especially in publication details, or errors of punctuation. It was estimated that only 5% of the errors affected retrieval.

A different approach was taken by McCue, Weiss & Wilson (1991), who took random samples from a backlog and searched for corresponding records in the catalogues of nine RLIN member libraries and the Library of Congress. These records were then used for independent copy cataloguing by five cataloguers and the number of revisions made was investigated as an indicator of the original records' quality. 70% of records were found to have keyboarding errors, predominantly in the title, although these were mostly insignificant errors in punctuation. Among other modifications, subject headings were most frequently altered, usually to add or update a heading. No major differences in quality between the ten libraries were observed.

The number of modifications made to records was also investigated by Chapman (1993, 1994). A random sample was taken from the BLCMP Union Catalogue, including both monographic and serial records. By comparing the records before and after editing by member libraries, it was possible to count the number of edits in each of seven areas. Separately, each member library annotated 10 randomly-selected edited records with the reasons for editing. While generally satisfactory, around 20% of records were edited in some way, more often to enhance a record by adding a field or adding data to an existing field than to change data. The most frequently edited areas were author headings for monographic records and titles and publication details for serials. 19% of edits were made for reasons of accuracy, including transliteration and punctuation. Other reasons for editing included consistency and the functionality of other components of library software such as circulation or serials control.

A simple way to monitor quality is to canvass the opinions of users, but useful results are likely to be obtained only if there are serious quality issues or users are particularly unsatisfied. Davis (1989) surveyed cataloguers at a random sample of institutions downloading records from OCLC's Online Union Catalog. Only 7.5% of respondents considered the records to be only fair or worse, although academic research libraries held generally lower opinions. The mean estimate was that 8% of records downloaded contained errors (slightly more in non-book material). Questions on specific aspects of the service elicited requests for improved authority control and the removal of duplicates, but more depth could be obtained from interviews.

2.3.4 Cataloguing in publication

Cataloguing in publication (CIP) programmes have made copy cataloguing easier by offering draft records, but they are prone to error when publishers submit incomplete information or alter details between cataloguing and publication. (Similar issues arise with the incorporation in catalogues of records created by booksellers (Beall 2000).) Seal & Overton (1984) investigated the differences between British Library CIP and CIP Revised records, finding that only three records in their sample of 562 were unaltered. Most frequently revised were classification, date and the form and presence of headings.

Taylor & Simpson (1986) compared the accuracy of Library of Congress CIP records and original LC records by taking an exhaustive sample of records created in their institution by copy cataloguing from both sources. Since the records were dated, it was possible to break down the results by year of input, taking into account, for example, headings changed for AACR. Errors were defined as significant solely according to the field they occurred in, regardless of gravity. Fewer non-CIP records contained errors, but of the errors which occurred, a greater number of significant errors was found in non-CIP data.

2.3.5 Searching for misspellings

A note in American libraries ('Ideas...' 1991) reported Jeffrey Beall's quick 'dirty database test' for gauging the proportion of spelling mistakes in a catalogue, giving ten common errors to be sought as keywords. The number of instances of each misspelling was totalled and subtracted from 100 to give a score for comparison with catalogues of similar size. This approach can be considered a (non-random) sample of all possible searches in the catalogue.

Dwyer (1991a, 1991b) extended Beall's test by taking the ratio of incorrect spellings to correct spellings to compensate for the varying sizes of collections. Cahn (1994) developed the idea further, first truncating Beall's ten words, then determining the sizes of databases for a comparative error rate, although no correlation between size and error rate was found. She provided her own list of ten words in a variety of incorrect forms and repeated the test taking into account 'redundant' errors (where the word appears in a correct form elsewhere in a record) and whether the error occurred in a searchable field. The results are greatly affected by the choice of misspellings but Cahn observed that a standard list would be counterproductive as a benchmark because database producers might correct only those errors.

Randall (1999) used Beall's and Dwyer's measurements to evaluate the online catalogue of the University at Albany for keyboarding errors before it was merged into a union catalogue. She searched for sixteen incorrect variations on the stem 'econom-', finding most errors in notes fields, then in title fields. Randall was also interested in whether a clean-up operation would be hampered by the introduction of errors from other catalogues; exploratory checks implied that this was likely because similar errors were present elsewhere. She concluded that perfection is impossible, probably unnecessary and possibly an unjustifiable expense.

Others have suggested the less systematic approach of browsing through catalogue indexes for unique terms which are therefore candidates for verification. Ballard & Lifshin (1992) performed a (time-consuming) visual inspection of 117,000 keywords in Adelphi University's OPAC, calling up the full record for context whenever a keyboarding error was suspected. Like Randall, they found a majority of errors in title and notes fields. They provided a list of the most frequent errors, updated by Ballard (1997, 2000), which would be useful for performing the dirty database test in an academic library.

2.3.6 Intactness of the catalogue

A library in the process of building a computerised catalogue, whether through retrospective conversion or original cataloguing, will be interested in ensuring that no items are missed. When both cataloguing and circulation have been computerised, uncatalogued items should be detected during stocktaking or at the point of issue, but confirmation that the catalogue is complete may still be desirable, perhaps after a system conversion or amalgamation.

One approach was described by Payne (1985) who audited the stock of what was then the City of London Polytechnic for discrepancies between the catalogue and the library's holdings. This was done simply by performing an exhaustive census of the stock - counting each physical item in the range of interest - and counting the number of bibliographic records for the same range. Payne assumed that there would be more records than items and so any difference in the totals indicated missing items. Fewer records, however, would suggest uncatalogued material, although this symptom might be swamped by a large number of missing books.

A more effective and quicker technique than an inventory is that of Kiger & Wise (1993), who described taking a random sample of items from the shelves and searching for the corresponding records by main entry and by barcode. It is unlikely that an item will have both an incorrect main entry and an incorrect or missing barcode and so uncatalogued items will be discovered.

2.4 Automating error detection

Pressures to automate quality control include minimising time spent on database maintenance, making error detection more consistent and accurate and saving money. In fixed fields and authority-controlled fields, input can confidently and automatically be identified as incorrect or at least suspect. This is less true in titles and other descriptive fields, but even these can be verified with more sophisticated algorithms.

Bausser, Davis & Gleim (1986) described quality assurance by data validation: the cataloguing system refuses to accept a record in which it detects an error. They predicted a shift from retrospective error detection in a card catalogue, for example when filing cards, to detection at the time of creation in an online system. Their suggestions included automatic shelflisting to guarantee unique shelfmarks, automated authority control and broad validation of classification. The possibility of mechanical validation of the intellectual content of a catalogue as well as mere description seems optimistic, but Morita (1986) was confident that an advanced system might spot inappropriate subject headings or propose better classmarks.

2.4.1 Spelling checkers and spelling correctors

Spelling checkers and correctors, familiar from word processing software, are increasingly integrated with library systems. They appear a natural choice to detect keyboarding errors, but they suffer from the dual problems of false positives (correct words not recognised) and false negatives (incorrect words not detected).

False positives arise from the inclusion in library catalogues of more proper names, non-English words and abbreviations than ordinary prose. One solution is to run the spelling checker over the keyword index and add to its dictionary every keyword it fails to recognise; this is time-consuming but will make subsequent checks quicker and more successful.

The problem of false negatives, or real word errors, where a word that exists has been substituted for the correct word, is less tractable. An expanded dictionary may actually be a liability from this perspective, as some rare words appear more often as misspellings than intentionally. Real word errors also occur when a word is mistakenly broken into two words by an intrusive space ('jelly fish', 'in finite') and when spellings differ between British and American English.

It would be misleading to pretend that computers are incapable of detecting context (see, for example, Mays, Damerau & Mercer 1991; some spelling checkers examine clusters of letters or word frequency instead of using a dictionary) but it is clear that human supervision, however fallible, will remain necessary for some time. A computer's performance may exceed that of a human when the visual form of errors makes them resistant to human inspection, as with 'milllion' or the confusion of the letters 'l' and 'O' with the numerals '1' and '0' in certain typefaces.

O'Neill & Vizine-Goetz (1989) experimented with a spelling corrector in OCLC's Online Union Catalog. They argued that the larger the database, the higher the proportion of incorrect index terms, observing that there will be 'one entry for the correct spelling and dozens of entries for variant spellings' (p. 314). They extracted words from titles and subject headings in a random sample of catalogue records and found that although 76% were validated by the spelling checker, only 3.5% of the remainder were confirmed as keyboarding errors after human review. Thirty words were reviewed for each word corrected, so the process was time-consuming, especially when no correction was suggested automatically.

Better results might be obtained from systems designed with library catalogues in mind. The experimental OKAPI system incorporated spelling correction in subject searches (Walker 1987) and the same work could theoretically be implemented to match correct input to misspellings in the catalogue. However, there is both an expectation and a need for accurate catalogues, so this is something of a diversion.

2.4.2 Duplicate records

In their discussion of the characteristics of duplicate catalogue records, O'Neill, Rogers & Oskins said: 'Duplicate records that are identical are rare and easy to identify � It is the duplicate records that are similar but not identical that are hard to identify' (1993, pp. 59-60). Legitimate variations in cataloguing and plain error mean that an existing record may not be found for copy cataloguing and an unnecessary near-duplicate created. Despite the complexity of detecting duplicates, the scale of the task is such that it is more suited to computers than humans.

O'Neill, Rogers & Oskins examined thirteen bibliographic elements in a random sample of records from OCLC's Online Union Catalog. Potential duplicates were detected by an automated search and verified by professional cataloguers. Only a third (34%) of duplicate pairs differed in at most one element; 50% differed in two or three elements. Keyboarding errors and incorrect MARC coding were responsible for the differences between many duplicates. Differences most frequently occurred in the date, author or publisher elements, often with author headings established in different forms. Pagination and statement of responsibility were most useful for differentiating near-duplicates that did refer to distinct items. Fixed MARC fields were considered to contain less reliable information than the free-text fields from which they are derived as they have little or no internal redundancy.

Toney (1992) concurred that 'one cannot always distinguish between errors and legitimate differences in cataloguing practices' (p. 21) and that it may be more effort to develop algorithms for automated checking than to find duplicates by hand. The selection of fields for matching is important, as records that differ only in pagination and title might correctly represent two articles by the same author in the same journal in the same year, and Toney noted the diversity of fields examined in previous deduplication studies. Another approach is to construct a single key or 'fingerprint' for comparisons, based on the contents of several fields, like the Universal Standard Bibliographic Code used in the QUALCAT project (Ridley 1992).

2.5 Statistical methods in librarianship

Advantages of statistics in presenting a case are that they are objective and can summarise large amounts of data, while the use of sampling techniques allows decisions to be made on incomplete information. Drott (1969) gave a basic introduction to random sampling (using tables of random numbers) for the production of library statistics. Examples given of the use of sampling are the estimation of the number of missing books or the number of books which have not circulated for some years and the selection of patrons for a survey. Payne's guide to sampling in library research (1990) is also valuable and clear. Hewitt (1972) applied sampling specifically to catalogue quality, observing that 'the sample catalog audit is a neglected or undiscovered tool in research libraries' (p. 24). He investigated misfiling, mutilated cards and blind references in a card catalogue, none of which is relevant in online systems.

Although superficially straightforward, random sampling can be misleading if poorly implemented. Bookstein (1974) listed three common faults in sample design which can invalidate sampling and illustrated them with reference to libraries. The first is the misapplication of tables of random numbers, now less of a problem given the widespread adoption in libraries of computers which can generate random numbers (technically 'pseudorandom', as they are determined by an algorithm, but effectively unpredictable). Next he considered 'frame problems' where it is difficult to delimit the population clearly, giving as an example the fallacy of assuming that catalogue entries constitute the same population as holdings, because of added entries, multiple copies, multivolume sets and other complications. Finally he examined intrinsically biased selection techniques, such as choosing French-language items by taking the first one after a randomly-chosen point in a list of titles; this will be biased because French titles are likely to be clustered unevenly in an alphabetical list.

Kiger & Wise (1993) detailed the use of sampling to determine properties of a library's collection, avoiding equations by presenting ready-reckoner tables for sample size and confidence limits. In their 1996 article they discussed the use of sampling in auditing the University of Tennessee's library for missing books. They considered the catalogue as an inventory of books and took two samples. The 'catalog-to-collection test' involved a sample of catalogue records being tested to see if the corresponding items can be found on the shelves, while the 'collection-to-catalog test' sampled the shelves by taking each book five places to the right of a randomly-selected item from the catalogue and seeking a catalogue record for it by main entry and by barcode.

Copeland (1994) explained the application of random sampling for quality control, checking for errors in each batch of records uploaded to a co-operative cataloguing system. Starting with a systematic sample of 5% of records, the size of the next sample is increased or decreased according to the prevalence of errors. This is a sensible and practical refinement for ongoing quality control.

An alternative form of sampling is sequential analysis, described by DiCarlo & Maxfield (1988), also for auditing the proportion of missing books. Sequential analysis is useful when the actual error rate is less important than whether it exceeds a certain threshold: for example, if an error rate over 6% triggers a recataloguing exercise then it suffices to know that there are at least that many errors, regardless of the exact proportion. Fewer items need to be sampled, in most cases, because sampling can stop as soon as there is enough evidence to make a decision.

To perform sequential analysis, set acceptable boundaries for the parameter of interest, choose a confidence limit (typically 95%) for the estimate and decide after how many items the sampling will stop if no decision has been reached. This information allows the computation of upper and lower limits for each stage of the sampling process; if the number of items with the particular property (missing, miscatalogued, etc.) breaches these limits at any point then sampling can stop and a decision can be made.

2.6 Implications for the technique

The proposed research draws largely on the methods of Kiger & Wise (1993, 1996), which, although designed for an audit of missing books, have natural extensions to gathering unbiased samples for other purposes, including audits of catalogue quality. The literature is lacking in examinations of how to generate a random sample in a library, with many studies apparently glossing over the difficulties or assuming that the library management software has the ability to draw records at random, which many systems lack.

The intactness of catalogues will be checked exactly as Kiger & Wise suggest, but the auditing of quality and accuracy requires a refinement to the simple selection procedure. Each record is to be compared to the item it describes using a checklist divided into types of error and the field in which they occur, similar to that used by Reeb (1984). The choice of errors and fields to inspect is a difficult one.

One approach would be to examine fields prone to error in more detail and to place less emphasis on those which are generally accurate. Unfortunately, it is hard to find any commonality in the results of the studies discussed. Titles are likely to contain errors, according to Cook & Payne (1991), McCue, Weiss & Wilson (1991), Ballard & Lifshin (1992) and Randall (1999), while notes were a significant problem in the latter two studies and that of Romero & Romero (1992). Ryans (1978), however, found neither to be particularly inaccurate, finding instead many errors in extent (collation), in common with Romero & Romero.

These inconsistencies are partly attributable to diverse methodologies, with bibliographic fields grouped in different ways and no consensus on whether to include errors in MARC coding or of punctuation and capitalisation. Variations are excusable since many studies took place within a single institution, but prevent effective comparisons. Another obstacle is that non-mandatory fields such as edition are under-represented because the number of records containing those fields is rarely given.

In any case, there is seldom enough detail to be sure that the sample constructed is truly random, so it may not be valid to generalise from the results of previous studies. It is conceivable that efforts to clean up databases, especially if focused on a particular area, have disproportionate effects on the characteristics of errors.

Graham (1990), O'Neill, Rogers & Oskins (1993) and Armstrong & Medawar (1996) concur that errors have more significance in fixed or controlled fields than elsewhere, partly because such fields are used for access and partly because they will be relied upon as correct. This implies that name and subject headings should be included in the checklist as well as purely descriptive cataloguing.

Users have consistently preferred topical access to known-item searches and tend to use keyword searches more than controlled subject headings (Larson 1991). The increasing number of fields which are indexed for keyword searching requires accuracy even in fields which are not subject to authority control, quite apart from the need for confident identification of a retrieved record. For this reason, and for convenience, all bibliographic areas are treated uniformly in the pilot, except that notes are excluded because of their diversity.

The desire for objectivity means that greater emphasis has been placed on what Graham called mechanical accuracy than the subjective intellectual accuracy of the catalogue. For speed and convenience, errors are divided into only two classes, fields which contain incorrect information and fields which are wrongly omitted, rather than separating misspellings, MARC errors and deviations from cataloguing rules.

Dissertation | Introduction < Literature review > Proposed technique

Owen Massey McKnight <owen.mcknight@worc.ox.ac.uk>