Computers & Texts 16/17: Page

Computers & Texts No. 16/17

Winter 1998

This article sponsored by	Computer indices for the Zuozhuan
	John Page Isabel García-Hidalgo Rosa Elena Moncayo El Colegio de Mexico jpage@colmex.mx

This project is designed to create indices to facilitate the study and translation of the early Chinese text known as the Zuozhuan. It is financed by a grant from Mexico's National Council on Science and Technology. By data entry of the Chinese text without intercalated notes and the Chinese-English lexical material in the Fraser-Lockhart Index to the Tso-chuan (Oxford 1930) and linking them in a database, it is proposed to complete and up-date the Fraser-Lockhart lexicon as a tool for translation and create English-Chinese and Spanish-Chinese subject indices to the text.

Description of the text

The Zuozhuan is a late fourth century BCE historical text covering events during what is known as the Spring and Autumn period (722-468 BCE) of the Zhou dynasty, inconclusively attributed to Zuo Qiuming, a contemporary of Confucius. The text's canonical status derives from the fact that it has come down to us as one of three commentaries to the Spring and Autumn Annals of Confucius' home state of Lu, said to have been compiled by the master himself, and so studied, edited and commented upon since the Han dynasty.

The Zuozhuan consists of 179,570 Chinese graphs, written in a consistent literary Chinese of its period and edited to conform to the annals of the state of Lu. It contains a great wealth of political, military, social, geographical and astronomical information as well as literature, tradition, folklore and religion shared by Lu and sixteen other important feudatories of the Zhou dynasty as well as numerous smaller ones which were absorbed throughout the period by their larger neighbours.

However, several characteristics that make the text difficult to study invite computer analysis. Not all the feudatories included are consistently covered in the text for each year, nor are all the events that occurred in thos that are covered consistently presented, considerably complicating continuity and coherent reading of the text.

Actors in the various feudatories are not always referred to consistently by a single title or name but often successively by family and personal names, changing social styles as well as changing political titles and posthumous honorifics pursuant to Chinese custom and used by the compiler in describing their activities during their lifetime.

Extremely valuable supporting material in the form of more than 2,000 years of commentary and explication normally intercalated in the columns of the printed text further complicate reading both the original and existing western translations which follow the same procedure.

Additional complication arises out of the density of the language. Its universe of 179,570 characters condenses to a lexicon of 3,789. Setting aside the 1,214 characters registering only one meaning, the remaining characters share 15,491 different meanings including all the place and personal names and titles in the text.

The Fraser-Lockhart Index (FLI)

The Chinese characters in the printed FLI are arranged solely by 'radical' and stroke count, i.e. by a series of 214 classifying elements visible in those characters, chosen and established in the 18th century for that purpose and within each classification by the number of strokes additional to the 'radical'. Since the phonetic value of each character was originally supplied in a system of transliteration no longer in use for English, this has been converted to Wade-Giles referencing and pinyin, and the database will allow a search by these methods. However the compilers of the FLI did not include all the characters in the text, nor did they include all the meanings of all the characters in all the positions they did register. Though there are only 242 missing Zuozhuan characters, there are many missing meanings for characters that are registered, and these fall into two categories. The largest consists of characters that appear very frequently but rarely change meaning and are principally grammatical particles. The second consists of characters for words that appear often and which, once the compilers had covered all their several meanings up to a certain point in the text, the meanings at their remaining appearances are left to et saepe, etc., et alia and passim. These will be supplied in the database.

The isolating aspect of Chinese is nowhere better illustrated than in the early literary language which tends to include few digraphs; nevertheless many personal and place names are composed of two to four characters, a search for which requires cumbersome cross referencing in the printed FLI since such multicharacter names are rarely supplied complete; instead the reader is referred to another character in the string. The foregoing were perhaps strategies to reduce the bulk and hence the cost of final publication which would enjoy limited demand at the time.

None of the positions of characters in the FLI includes the line number, specifying only page and column in the printed text used, which is set in most instances at 18 columns by 42 lines per page but is not entirely uniform.

The project

Fortunately, the inception of the project found Windows-compatible Chinese software already available. Entering the text's Chinese characters and those in the FLI by the alphanumeric codes in the software was straightforward, a tool being provided in it to create characters not included in its dictionary.

The printing chosen for data entry is a punctuated version of the 1815 Ruan Yuan edition without intercalated commentary, used by James Legge (1982), its first English translator, to which the FLI was keyed on compilation in 1930, and the contents of which have become standard since.

A printed text in literary Chinese set in the traditional way appears as a grid of characters presented in columns and lines to be read from top to bottom, right to left, each character occupying one cell of the grid. Thus, the meaning of a character can be registered and displayed at the intersection of any given column and line on any given page. Since the printed FLI provides only page and column numbers, the database will now not only respond to a query at any given column and line but a search by character will include the line number as well, making it possible for the reader or translator to display the meaning for the character at the specific page and intersection. The database will also respond to queries for di-, tri- and tetragraphs and differentiate between multiple meanings of the same character repeated in the same column, or supply all meanings if the object of the query is general.

Data entry

The FLI: the immediate difficulty faced was the need to enter Chinese characters along with their English meanings, not only the entry character but additional characters which, combining with the entry character, form personal and place names. The task was undertaken by two separate teams, one unfamiliar with Chinese and the other consisting of two MA sinologists. The English meanings and page and column data were entered by the former, adding a slash in the position of every Chinese character encountered. The Chinese characters were then entered by BIG5 code number included in the Twinbridge Chinese System 3.3. Since there are 242 Zuozhuan characters missing from the FLI, others that are either alternates or not in general use and therefore absent from the Twinbridge dictionary, plus the 64 hexagrams and 8 trigrams of the Yijing, these had to be computer-drawn in the Twinbridge Editor raising the total database graphs to 3,909. Altogether 33,698 Chinese graphs had to be correctly positioned in the database. Both phases of data entry were carried out in Windows 3.1, MS-Word 6 and Twinbridge 3.3. Integration of the two phases has yet to be completed.
The Zuozhuan: Since one of the project's principal goals is to take full advantage of the contents of the FLI by linking it with the text in a database, the first step was to retain the page and column numbers of the Legge printing as they appear in the FLI and subsequently add line numbers. This was achieved by entering the text in a grid exactly duplicating the number of columns and lines of each page in the Legge printing, each page becoming a file in MS-Word 6.0. Though there is little variation from the number of 18 columns per page, the number of lines varies from the standard of 42 to a minimum of one. The complete text has been successfully printed.

The Database

Computer application to the study and translation of the Zuozhuan required 1) a method to associate each and every occurrence of a Chinese character with its meaning in context in English and Spanish, and any other characteristic of interest to the analysis and translation of the text; 2) methods for selective retrieval of those meanings and characteristics; 3) methods as user-friendly as the common controls used in the Windows environment, to handle Chinese characters. Creation of a database in which Chinese characters were represented by alphanumeric codes would have fulfilled requirements 1 and 2, but not 3. To cover all three, a computer solution was proposed that would combine those functions in a database and a tool that could handle Chinese characters and the Latin alphabet for English and Spanish.

The solution implemented consists of the following: 1) A relational database developed in Personal Oracle 7 (a database management system that conforms to the Structured Query Language (SQL) standard). This database fundamentally stores the information from the FLI. In it, the Chinese characters are handled by means of alphanumeric codes. 2) The Zuozhuan is organized in 351 files each corresponding to a single page in the Legge printing of the Ruan Yuan text. Each file/page is entered in MS-Word, and is displayed by means of Twinbridge, which converts the Chinese character alphanumeric codes into the Chinese characters themselves. 3) A group of macros programmed in WordBasic, and used in MS-Word to access information in the database.

The database consists of 5 basic tables related as follows:

CCHAR: Contains 3,909 rows corresponding to each and every character in the database. The data for each entry character include, name (in Wade-Giles transliteration), consecutive FLI number, BIG5 code, pinyin transliteration, BIG5 code for the radical, residual stroke count, and a unique identification number (CCID) whereby this table relates to the table ASOCIA. Since the Yijing hexagrams and trigrams are not in the Zuozhuan, but appear only as illustrations among the FLI English meanings, not as entry characters, they are represented only by a BIG5 code and a CCID.

ASOCIA: this is an intermediate table whereby any given entry character stored in the table CCHAR relates to its corresponding meanings stored in the table SIGNIF. It contains a row for every meaning of every entry character in the FLI. These rows include: the identification number CCID of the entry character to which the meaning belongs, the unique meaning identifier, SIGID, used in the table SIGNIF, and its own unique identifier, ASOCID. This setup makes it possible to retrieve one, several, or all of the meanings for a given entry character.

SIGNIF: this table contains 16,705 rows, one for each meaning of every entry character in the FLI. The rows are uniquely identified by the SIGID identifier. Meanings in the FLI are composed of a mixture of English words and Chinese characters. It is not possible to handle mixed Chinese characters and English words in a user friendly fashion in the database. Therefore, a special character (the slash) is used to mark the location of each Chinese character within an English phrase. Each slash is tracked and replaced by the corresponding BIG5 code by means of a SQL query. The resulting codes are interpreted as Chinese characters in MS-Word. Since there may be several Chinese characters in an English phrase, the macros use the position information stored in the Table CCCODES to recognize them.

CCCODES: This table contains 33,698 rows, one for each Chinese character appearing among the English meanings of the FLI including hexagrams and trigrams. The information stored in each row includes: the FLI consecutive number of the entry to which the character belongs, the FLI consecutive number of the character represented, the FLI page number where the character is located, and the unique identifier whereby the order of appearance of the Chinese characters is controlled throughout the meanings of a given entry.

OCUR: This table makes it possible to relate the meanings contained in the database, to the characters in the Zuozhuan. Each row contains the location (page and column) of 93,056 characters in the text as obtained from the FLI, and the unique ASOCID identifier used in the ASOCIA table, by means of which the meaning of the character in that place in the text may be retrieved. Hence, the meanings of 86,514 characters at as many intersections in the text, which do not appear in the FLI, can be supplied in the up-dated database.

The database includes other tables, and specific fields in the tables already described, with information used to control the data entry processes. The structure and organization of the database reflects solutions for the simultaneous management of Chinese and English designed to facilitate the stage of data entry and to be used at the stage of retrieval.

Each page of the Zuozhuan text was transcribed in table format. Therefore, every character in the text has a location: file-page, column and line. Programming macros which use data relating to the location of characters in the text, and recover information from the database, entered via the OCUR table, constitute the method of association and the selective retrieval procedure mentioned above. By the logic of this solution the macros make it possible to include the missing datum, the line, in the OCUR table, as well as any and all other references required.

The complex of macros already programmed and those yet to be programmed, constitute a system, the front end of which is a customization of Word in which Chinese, English, and Spanish are managed in a user-friendly way.

References

Legge, James (1982). The Ch'un Ts'ew, with the Tso chuen. Hong Kong University Press.

[Table of Contents] [Letter to the Editor]