Starting Points on the Internet

Frances Condron

black lineOn this page

Digital libraries and text archives

Arts and Humanities Data Service

Oxford Text Archive

Electronic Text Center

Project Gutenberg

Alex

Gateways

Resource Discovery Network

HUMBUL Humanities Hub

BUBL

Voice of the Shuttle

Library of Congress

World Lecture Hall

Searching the Web

Guides

Dykes Library at University of Kansas

University of California Berkeley Library

Search engines

AltaVista

AllTheWeb

Google

NorthernLight

Evaluating digital resources

Internet Detective

Terena Guide to Network Resource Tools

Information Quality WWW Virtual Library

Discussion lists and email

Directory of Scholarly and Professional E-Conferences

Mailbase

Humanist electronic seminar

Organizations for humanities computing

Association for Literary and Linguistic Computing

Association for Computers in the Humanities

red squarered square

Through the 1990s, the Internet has grown exponentially, both in terms of the number and range of electronic discussion lists and programs now available, and the number of pages on the Web. The cost of access to email and the Web has gone down in real terms for many people in the western world, and many individuals and businesses, as well as institutions, have taken advantage of this. For those outside of an educational institution, libraries and museums have been providing Internet access for several years, and are joined by Internet cafés, providing an alternative environment for browsing the Web. The British government is spending millions on ensuring that school children have access to the Internet, and to quality educational resources. A recent report estimated that, as of February 1999, the Web contained 800 million pages, held on 300 million servers (Lawrence and Giles 1999). The amount and diversity of material continues to grow, and attempting to sift through the millions of Web pages now available can be an overwhelming task. However, services have developed to support users of the Web, both for finding useful resources and for providing ideas and frameworks to put them to good use in research and teaching.

Humanities scholars are embracing the opportunities offered by developments in computing hardware, programs and quality electronic datasets, for creating and using digital resources. This Guide contains details of hundreds of electronic resources, ranging from corpora, analysis tools, to teaching aids and virtual learning environments. While these are primarily organized by subject area, there is a great wealth of material that cannot be so easily fitted into these categories, because they cover so many disciplines. Over the years, digital libraries and text centres have built up impressive collections of quality electronic editions of writings and images, providing access to rare or widely used resources for the academic community and beyond. In addition, individuals and groups have been compiling and updating catalogues and guides to the Internet, providing gateways to online resources. Communication tools now enable scholars to have regular contact with colleagues around the world, and there are now hundreds (if not thousands) of discussion lists serving specialist communities. Within the UK, several organizations can support the creation and use of digital resources, both for research and teaching.

Digital libraries and text archives

Over the past ten years, significant effort has been spent on generating digital versions of texts and images, and in defining standards and guidelines for their digitization, access, and preservation. Many of the resources listed in the following chapters of this Guide are the results of these labours. The major digital libraries listed below have developed in different ways, and hold important but contrasting collections of digital resources.

The Arts and Humanities Data Service provides support, guidelines and an archiving service for scholarly digital material in the humanities, for the higher education community in the UK. The AHDS has five service providers, the Archaeology Data Service, History Data Service, Performing Arts Data Service, the Oxford Text Archive, and the Visual Arts Data Service. All of these can be accessed through the AHDS Web pages . Each service provider supplies guidance on the creation of digital resources in a way that ensures their longevity, and can serve as a secure long-term archive, providing access to these resources for future reuse. Currently, the AHDS service providers are the recommended repositories for digital resources resulting from projects funded by the AHRB, Leverhulme Trust, Wellcome Trust, and the ESRC, though projects funded through other sources can contact the AHDS for support, advice and archiving services.

The Oxford Text Archive has one of the largest collections of digital resources amongst the AHDS service providers, and is one of the oldest digital archives for the humanities in the world. The OTA has been built up since 1976, and consists of literary and linguistic resources in a range of languages and covering many humanities disciplines. The collection holds more than 2,500 resources, from the Dictionary of Old English Corpus (containing 3,000 texts) to individual novels, manuscript transcriptions and reference works. The OTA's archive largely consists of text-based resources, though will be expanding to include databases, and image and audio files (where they are an integral part of a resource). The majority of the catalogued resources can be obtained directly from the Web site. Because of deposit agreements, the user must sign an agreement that sets out the conditions of use prior to accessing the other digital resources. Almost every resource held by the OTA contains documentation about how and why that resource was created, identifies those involved in its creation, and information regarding reuse. The resources are available in a variety of formats, as SGML encoded files following the TEI Lite guidelines, as HTML documents, in plain text, RTF, and other formats. They can therefore be viewed, and used in various text analysis tools for teaching and research.

The other digital libraries listed in this section have evolved in different ways to the OTA collection, and some are more involved in creating digital resources as compared to the OTA. The Electronic Text Center, based at the University of Virginia, is involved in digitizing and providing access to text-based material. It holds thousands of publicly accessible documents, the great majority in English, though with material in other European and East Asian languages. These are organized by language, and some resources have been bundled into themed collections, for example, Shakespeare Resources, and Religious Resources. The documents are transcribed from particular print editions, and encoded in SGML, using the TEI guidelines. These are then checked against the original print edition. High quality images of illustrations, and sometimes the entire text, are provided. As well as the document itself, information about the document, its transcription and digitization history, and the names of those involved in the process are provided. Documents are available online in HTML. The Electronic Text Center is one of the major digital libraries in the USA, and is maintaining excellent standards for the preservation and presentation of digital resources.

Two other digital libraries are worth exploring. Project Gutenberg was started in 1971 by Michael Hart, at the Materials Research Lab, University of Illinois. It now holds over 2,000 texts, largely in English (and also in German, French, Italian, Latin, Spanish, and Japanese). The collection focuses on popular literature now out of (USA) copyright. Each text has been digitized and checked by volunteers; the texts are not based on critical editions and may contain transcription and typographic errors. Because Project Gutenberg aims to make texts available to the widest possible audience, all files are in plain text format, with no encoding. For documents in languages with accented characters, the files are available with and without accents. Alex is another catalogue of electronic texts, housed at the North Carolina State University. It holds about 2,000 documents on American and English literature and western philosophy. The collection is based on canonical works in English, and the documents are available in plain text or HTML. Two added features of Alex are a concordance search, which can be applied to select documents, and the option to convert documents to PDF format. Both Project Gutenberg and Alex provide access to many documents, though users should be aware of their limitations - because the texts have not been systematically proofed, there may be transcription errors and omissions. No information on the provenance of the digital texts is available. As a result, texts from these libraries cannot be used for detailed scholarly work.

Gateways

In addition to the digital libraries listed above, there are now many excellent gateways to online resources. The ones listed below point to quality resources suitable for use in higher education. For the UK academic community, the new Resource Discovery Network is providing access to online resources for the majority of disciplines studied in higher education. This service is funded by the Joint Information Systems Committee (JISC), the body that oversees and organizes communications and information technology initiatives for UK higher education institutions and research councils. The RDN contains four hubs, the majority based on existing gateways. Each hub contains extensive, searchable catalogues of online resources, all reviewed by subject specialists for their suitability for use in higher education. For the humanities, the main gateway is Humbul which, unlike the others, is a new service. This provides information and guidance about online resources for literature and linguistics in English and other modern European languages, classics, archaeology, history, religion, and philosophy. Humbul includes entries for the full range of available online resources, including bibliographies, documentary resources and databases. Of the other RDN hubs, SOSIG may be useful for humanities scholars. It contains descriptions of digital resources for the social sciences, law and business studies. All the RDN hubs share the same underlying structure, and in the near future it will be possible to run a single search across all the hubs.

Another useful gateway for the UK academic community is BUBL, also funded by the JISC. Resources catalogued in BUBL have been identified by searching through subject-specialist discussion lists, and reviewed for quality of content. The gateway was primarily designed for librarians and information science staff, though includes many resources of interest to humanities scholars.

The Voice of the Shuttle is a major gateway to the Internet. It is managed by Alan Liu of the English Department, University of California, Santa Barbara, and is a significant achievement for just one person. Voice of the Shuttle contains briefly annotated links to Web sites covering a wide range of humanities and arts disciplines, and should be visited by anyone looking for subject-specific online resources.

The Library of Congress offers a gateway to online resources, with subject-specific catalogues . Most links are to pages in English, and include material aimed at school levels, as well as higher education. Resources are divided into different categories, largely by subject area. For the humanities, this gateway is particularly useful for classics and sources of electronic texts in general, and also issues regarding using and publishing on the Internet.

The World Lecture Hall is a gateway to online lectures, course outlines and entire Web-based modules for a vast range of subject areas. Some of the courses require students to enroll and pay a fee. The great majority of these courses are in English, and from North American higher education institutions. The catalogue is organized by subject area, and can also be searched.

Searching the Web

There are several good reasons for searching through the Web in addition to using Internet gateways. The most significant reason is that new material is added to the Web on a daily basis, and gateways will not be up-to-date because pages and sites are reviewed prior to being included in their catalogues. Secondly, Web pages are moved or deleted by their authors, and gateway sites may contain broken links. Finally, one may be looking for material on a very specific topic that is not clearly catalogued in Web gateways. For further information about searching the Web, guides are available from the Dykes Library at the University of Kansas, and the University of California Berkeley Library. Many search engines are now available for the Web, and it is important to note that none of them give comprehensive coverage of the Web. By using more than one search engine, one can get pretty good coverage of available material. Because search engines do not have information on the same resources, some are more suited to finding scholarly resources than others. Four are listed below, selected for their broad coverage and reliability of service.

AltaVista has become one of the most renowned search engines. Queries can be entered in a variety of forms, using Boolean expressions, plus or minus signs, and brackets to build complex queries. AltaVista also includes a natural language processor, and can interpret formal questions as search queries (such as, 'where is Michigan', or 'what is the CTI Centre for Textual Studies'). Another key feature of AltaVista is Babel Fish, providing a basic, automated translation for both queries and text from Web pages.

AllTheWeb has a very basic search engine, supporting searches for words and phrases, but its catalogue of Web pages is vast (currently covering about 30% of the Web). The search engine is also very quick, and results load onto the screen rapidly as there are no adverts.

Google also has a simple user interface with no advertising, so pages finish loading quickly. Google can search for individual or multiple words. Its strength is that results are filtered to ensure that the most relevant pages are at the top of the results list. Google searches through the entire contents of Web pages and ranks matching pages by how close to the top of the document the key terms occur, how frequently, and also how many other Web pages link to that page.

NorthernLight) has good coverage of academic sites, and will find online articles as well as other types of Web page. One can enter complex searches, and restrict queries to particular parts of a Web page (e.g. the title, URL, main body of text). The online articles are not necessarily accessed for free, and NorthernLight includes a secure page for pay-to-view documents.

The above search engines differ in their coverage of the Web, the way in which one enters queries, how the searches are run, and (to a lesser extent) the presentation of results. For effective searching, one should rely on several search engines, and experiment with defining and re-defining search terms.

Evaluating digital resources

Unlike print publication, material online may not have gone through any formal review process. Anyone can publish material on the Web, and search engines cannot easily distinguish between pages with reliable, quality content and others. Given this situation, one therefore needs to be very critical of the material found online. Fortunately, there are many ways in which one can evaluate the content of Web pages, and several guidelines are available. The Internet Detective is an online tutorial taking one through the stages of exploring and judging the value of Web pages. A more detailed study now available as the Terena Guide to Network Resource Tools introduces a range of Internet tools, from Web browsers and search engines to group communication software, publishing resources and security issues. In addition, the Information Quality WWW Virtual Library has a useful annotated catalogue of Web pages covering all aspects of evaluation.

There are several aspects to evaluating online resources. Resources listed on a subject gateway have already been through a review process (those failing to meet the criteria of the gateway are excluded). Another good indication is where Web pages are hosted by academic or research institutions, and the author gives their name and affiliation. However, this information is not always available, and one needs to look in more detail at the document in order to assess its scholarly value. The content, its organization, and presentation are important factors to bear in mind. The way that material is organized is another indication of quality - and ideally this organization maximizes the scholarly value of the resource, and helps the user to navigate through the material. This leads onto the final issue, of presentation. This is possibly the least reliable factor when evaluating the scholarly value of a resource, though still an important one. Presentation influences the user's opinion regarding the Web site; good presentation holds the user's attention and supports them in making the most of online material.

The three resources on evaluation are worth exploring if one is interested in building catalogues of online resources for research projects or teaching, and in general will help one to make more effective use of one's time browsing the Web. Ultimately though, decisions regarding the utility of online resources will no doubt be based on one's research and teaching interests.

Discussion lists and email

Electronic communication tools are a key component of the Internet, enabling one to exchange ideas, information (and documents) with individuals and groups. They can keep one up-to-date with developments in one's field, and provide ample opportunities for exploring different approaches to one's subject. Discussion groups use email to distribute messages around a group, providing an easy method for sharing ideas and documents. Newsgroups operate in a different way, and usually require members to log onto somewhere (a central server) in order to access and exchange messages. In both cases, messages are ideally archived, and can be browsed to see how particular themes have developed. Discussion groups and newsgroups can be set up for local or widespread use, with open or closed subscription options, and can therefore support an individual class or a dispersed group of individuals sharing a common interest.

A useful first point of call is Diane Kovacs' Directory of Scholarly and Professional E-Conferences. This is one of the most comprehensive catalogues of online discussion and conferencing lists available, and has global coverage. The catalogue is searchable, and is updated annually. Within the UK, the major provider of discussion groups is Mailbase. From their Web page, one can find details on current Mailbase discussion groups, and access archives (where the list owner has granted permission). One interesting example of the use of communications technologies is the Humanist electronic seminar , edited by Willard McCarty of King's College London. This is a forum for discussing all aspects of humanities computing, with discussions summarized and grouped by the editor. The archive can be accessed through the URL given above.

Organizations for humanities computing

We finish this section with a brief mention of two subject associations for humanities computing. The Association for Literary and Linguistic Computing supports the application of new technologies to humanities disciplines. As the use of C&IT has developed, the ALLC has expanded its remit to cover literature and linguistics, and also work including image and audio data. The ALLC publishes the journal Literary and Linguistic Computing, and also organizes one of the major conferences for humanities computing, jointly with its counterpart in the USA, the Association for Computers in the Humanities (ACH). This is based in Brigham Young University, Utah. The Humanist electronic seminar, managed by Willard McCarty (now at King's College London), is associated with the ACH. For anyone using new technologies in their teaching or research, these associations and their activities may be useful for disseminating the results of one's projects, and for finding colleagues working in related areas.

black line