Languages and linguistics in 2003: The potential contribution of corpus linguistics
Abstract
The nature of linguistic research and even the goals of research are changing as a result of information technology This paper discusses what counts as legitimate linguistic data, and the new standards of data collection, organisation and analysis associated with the methodology of corpus linguistics. Two of the more familiar kinds of text annotation are described, namely tagging and parsing, and attention is drawn to the problems of working on Asian languages, including the pitfalls of applying European categories. Two corpus-based projects currently underway in Malaysia are described, one on English and the other on Malay. The paper ends with a look forward to the possible contribution of corpus linguistics to language-based research in Malaysia.