Corpus Linguistics: Say What???
This year I had the benefit of taking a course entitled "Introduction to Lexicography" which dealt with the study of dictionaries. Although it sounds as exciting as folding laundry, it was actually an intriguing class as we learned about the history of dictionaries, how they're compiled, and the issues facing them in the future. One of the things discussed was corpus linguistics, the method of compiling words to determine how they are used in the language being studied (computers are typically used due to their powerful processing power when compiling data)
. What follows is a short commentary I wrote on it.
Corpus linguistics offers many benefits to society, but does not get the funding or recognition it deserves. As Landau notes, corpus linguistics’ first major use of a computer-based study dates back to “the Brown Corpus” (278) and with advances in computer processing power, many more possibilities have opened up in terms of improving corpus linguistics. Unfortunately, the potential for corpus linguistics has not been realized in the United States. Corpus linguistics offers many benefits such as preserving historical records of our society’s language, improving our dictionaries, and improving voice recognition. Regrettably, there is currently no motivation other than profit to make corpus linguistics move forward.
The concept of the corpus mixed with the power of computing makes it seem as if lexicographers will finally have the chance to develop a thorough corpus—if they have the proper staff to do so. As Landau points out, even advances in computers do not guarantee a broad corpus if publishers are unwilling to give the staff the paid time they need to develop it. The idea of “doing more with less” may be good for a profit margin, but it is not good for fully developing a corpus of words.
Language is such a central part of society that it seems obvious we should have a dedicated mission of compiling our language. This reminds me of Isaac Asimov’s Foundation series which had the premise of a group of scholars preserving a galactic empire’s knowledge for its eventual demise. Knowing what we know about language and its ability to tell us about a culture’s past, why would we not assemble the most thorough corpus possible?
As Landau notes, there are a number of benefits to having a corpus. A lexicon will benefit from a broad corpus. With computers capable of processing data much faster and more efficiently than when Landau’s book was written, there is even more that can be accomplished with a linguistics corpus. Whether it’s compiling a broader selection of words or keeping a dictionary more up to date, a well-developed linguistics corpus will improve any dictionary.
Landau also notes the use of the corpus for voice recognition. There are several voice recognition technologies now, ranging from home devices such as Alexa to voice recognition used in automobiles, to voice recognition replacing telephone customer service to voice-assisted computers. However, anyone who has tried to use these devices can relate that they have not been perfected. Furthering our development of a corpus by including regional differences in the spoken word will likely improve voice recognition.
Voice recognition has a special place for me because I remember when I thought I had carpal tunnel syndrome and it looked like I might be unable to use my computer at work. I relied on my computer for much of my work and my employer graciously purchased Dragon Dictate for me. Having worked with people with disabilities, I know voice technology can help people work who might not otherwise be able to. That makes it more important to improve voice recognition technology.
As John McWorter noted during his recent appearance on The Steven Colbert Show, our language is not static; something which requires any language corpus to be updated constantly. With computer processing power, this is easier than it was before computers. Although it may be impossible to keep a corpus up to date, it is unwise to let it lag behind so a “new” corpus actually reflects language from several years (or more) back.
The question of whether the United States wishes to invest public funds into a corpus linguistics problem should be raised. It may not have the appeal of improvements to national infrastructure or military spending, but it is both an educational and cultural need that is not being addressed by the private sector. Corpus linguistics offer too many benefits to be ignored because the private sector sees it as a waste of money. The United Kingdom’s dedication to its language corpus suggests the public sector is necessary in order to develop a superior body of words.
Landau, Sidney. Dictionaries: The Art and Craft of Lexicography. Cambridge University Press, 2001.