Why can’t a computer index my book?

phonebookEditor’s Corner

Recently a reader wrote to us questioning some of the alphabetizing recommendations in The Chicago Manual of Style:

“In all computer-based systems that are in current use, the ASCII sort order is used. Any sort order that violates the ASCII order cannot be maintained except through repeated manual adjustments. Solutions such as sorting St. as if it were Saint violate this order and thus cannot be correctly manipulated in MS Word, for example. What is your thinking on this issue?”

The reader has a point—when it comes to computer sorting. Surprisingly, however, the making of back-of-the-book indexes is almost universally still done manually by human indexers. Although the software used by professional book indexers can help with basic sorting and styling, the brainwork of choosing and organizing index entries cannot (yet) be accomplished perfectly by a computer, since it involves reasoned and prejudicial choices and—most important—flexibility.

Indexes are not simply concordances (alphabetical listings of words that appear in a book). For instance, an index entry might consist of a concept that describes a whole section of a book. In fact, the concept word in the index might not even appear elsewhere in the book. The indexer decides how that concept relates to other concepts (and therefore how to organize index entries and subentries) by reading the book. So far, this process has not been reducible to an algorithm.

In this way, a proper index has value for readers that is not found in a list of words that a computer could compile: the value in being able to browse among the main topics of a book and be directed to subtopics contained within them.

Even alphabetizing requires an intelligent human eye at times. Here’s a simple example from CMOS 16.62, showing the alphabetizing of a place and person with the same name:

London, England
London, Jack

But what if Amy London turns up in the index? A computer would sort like this:

London, Amy
London, England
London, Jack

A human would see potential for confusion and make refinements to aid readers:

London (England)
London, Amy
London, Jack

Thus the alphabetizing guidelines in CMOS follow long-standing conventions. People understand them intuitively because the dictionaries and library catalogues and indexes that they’ve been using for centuries have followed the same rules.

For more on indexing and an insider’s view on the process, read the Shop Talk interview with indexer Mary Laur.

~ ~ ~

Carol-SmallSCE2 thumbnail with borderEditor’s Corner posts are the opinion of Carol Fisher Saller, editor of the Chicago Manual of Style Online Q&A and author of The Subversive Copy Editor, now in its 2nd edition. Find Carol on Facebook and Twitter (@SubvCopyEd).

Photo: Selmer van Alten, Iceland phonebook is sorted on first names.

 

Tweet about this on TwitterShare on FacebookEmail this to someonePrint this pageShare on Google+Share on LinkedInShare on TumblrPin on Pinterest

One thought on “Why can’t a computer index my book?

  1. It’s worth pointing out that the the question’s “In all computer-based systems… in current use, the ASCII sort order is used” either has a notion of sort algorithms at least twenty years out of date, or misunderstands “ASCII sort order.”

    For example, ASCII places the full set of capital letters before any lowercase ones; thus, “Zorba” would come before “alligator” in ASCII sort order. ASCII also puts most (but not all) punctuation before the alphabet, meaning it would put “x-ray” before “xenon.”

    But any modern sort algorithm will do a case-insensitive, alphabetical sort, which correctly handles these two cases and many others that ASCII sort order gets wrong.

    Of course, this still doesn’t handle sorting “St.” as “Saint,” but this has nothing to do with the algorithm’s collating sequence, but rather involves contextual knowledge the computer simply doesn’t have.

Comment