Lemmatization and parsing with TACT preprocessing programs

By Ray Siemens

University of Victoria

By its ideal definition, lemmatization is a process wherein the inflectional and variant forms of a word are reduced to their lemma: their base form, or dictionary look-up form. When one lemmatizes a text, one replaces each individual word in...

Listed in Article

Description

By its ideal definition, lemmatization is a process wherein the inflectional and variant forms of a word are reduced to their lemma: their base form, or dictionary look-up form. When one lemmatizes a text, one replaces each individual word in that text with its lemma; a text in English which has been lemmatized, then, would contain all forms of a verb represented by its infinitive, all forms of a noun by its nominative singular, and so forth.[1]

Tags

Notes

Original publication information:

Originally published in Digital Studies/le Champ Numérique (1)

Year: 1996

DOI: http://doi.org/10.16995/dscn.233

License: (CC BY 4.0)

Original citation:

Siemens, R. G. (1996). Lemmatization and parsing with TACT preprocessing programs. Digital Studies/le Champ Numérique, (1). DOI: http://doi.org/10.16995/dscn.233