Lemmatization and parsing with TACT preprocessing programs

By Ray Siemens

University of Victoria

By its ideal definition, lemmatization is a process wherein the inflectional and variant forms of a word are reduced to their lemma: their base form, or dictionary look-up form. When one lemmatizes a text, one replaces each individual word in that…

Listada em Article

Preview publication

Descrição

By its ideal definition, lemmatization is a process wherein the inflectional and variant forms of a word are reduced to their lemma: their base form, or dictionary look-up form. When one lemmatizes a text, one replaces each individual word in that text with its lemma; a text in English which has been lemmatized, then, would contain all forms of a verb represented by its infinitive, all forms of a noun by its nominative singular, and so forth.[1]

Cite este trabalho

Pesquisadores devem citar este trabalho da seguinte forma:

Tags

Notas

Original publication information:

Originally published in Digital Studies/le Champ Numérique (1)

Year: 1996

DOI: http://doi.org/10.16995/dscn.233

License: (CC BY 4.0)

Original citation:

Siemens, R. G. (1996). Lemmatization and parsing with TACT preprocessing programs. Digital Studies/le Champ Numérique, (1). DOI: http://doi.org/10.16995/dscn.233

 

Pré-visualização da publicação