Teaching Computers to Spell

By Charlotte Bruce Harvey '78 / May/June 2010
May 13th, 2010

It's not surprising that Professor Emeritus of Linguistics and Cognitive Sciences Henry Kucera, who died February 20 at age 85, wrote his own obituary. A pioneer in the field of computational linguistics who retired from the Brown faculty in 1990, Kucera wanted to make sure the record was accurate, which seems only natural given the nature of his work: he invented the first spell-check software. He also created the building blocks upon which much of the text-critiquing software we use daily—spell check, grammar check, style check, and readability scaling—is based.

Although Kucera spent most of his career analyzing American English usage, it was not his native language. He was born Jindrich Kucera in the former Czechoslovakia and studied philosophy and linguistics at Charles University in Prague. Fearing that his political writings and activities put him at risk, in April 1948 he fled to occupied Germany, where he worked for Czech refugee organizations until 1949, when he was given the chance to emigrate to the United States and earn a PhD at Harvard. Kucera taught for two years at the University of Florida at Gainesville before returning to Harvard on a research fellowship. Brown hired him in 1955; he was promoted to full professor in 1965 and retired in 1990 as the Fred M. Seed Professor Emeritus of Linguistics and Cognitive Sciences.

After taking an interest in using computers to analyze language, Kucera offered his faculty colleagues Russian lessons for instruction on using Brown's early mainframe computer. He later joked that he'd received the better end of the deal. In 1963–64, he and his linguistics colleague W. Nelson Francis created the Brown Corpus of Standard American English database. They culled more than a million words from 1,000 sources published in 1961, covering a wide variety of subject matter. For years the Brown Corpus, as it was called, was the most cited resource in the field. Kucera and Francis used it to compile their classic 1967 study, Computational Analysis of Present-Day American English and, later, their 1982 Frequency Analysis of English Usage: Lexicon and Grammar.

In the early 1960s, intrigued by the word-frequency analysis made possible by the Brown Corpus, publisher Houghton Mifflin asked Kucera to create a million-word, three-line citation base for its American Heritage Dictionary. The groundbreaking 1969 dictionary was the first to be compiled electronically using corpus linguistics for word frequency and other information.

In 1981 Kucera founded what became Language Systems Software Inc., and over Christmas break that year he created his first spell checker for Digital Equipment Corp.'s VAX machines. He followed that with International Spell Check, which was used in Word Star and Microsoft Word, and he later oversaw the development of Houghton Mifflin's Correct Text grammar checker.

Kucera earned honorary degrees from Pembroke College, Bucknell University, and Masaryk University in Brno, Czech Republic. He was also a member of Phi Beta Kappa.

When interviewed for a 1991 profile in the journal Language Industry Monitor, Kucera was asked what software he used. "Oh, I couldn't live without a spell checker," he responded. "I'm a fast typist and make typos, even phonetic errors occasionally. You know how it is, an author always reads what he anticipates is on the page."

Kucera is survived by two sons, Tomas and Edward Kucera; and three grandchildren. 

What do you think?
See what other readers are saying about this article and add your voice. 
Related Issue
May/June 2010