Corpus of Modern Tamil text

This page is linked to a tagged corpus of modern Tamil prose text. This may be used to search various sentence patterns of your choice in Tamil like 'sentences with specific tense markers, participle forms, modal forms, etc.' A list of options is provided here to make your selections easy. Consult the list of tags used by the tagger (written in PROLOG) to tag Tamil sentences. It is believed that this tagged corpus will be useful for both Tamil language learners and linguistic researchers. We are hoping to expand this corpus with more text from different sources. If you have any on-line Modern Tamil text, please let us know. We may be able to tag them and include along with the present corpus for the purpose of copus mining. Currently it has only text from two short novels Yuka Santhi written by Jeyakanthan and Kangkaa Snaanam written by Akilan. The recent version of the DOS based Tamil tagger written by Vasu Renganathan may be downloaded from this directory.
I. You can define word in this field yourself like: paar; paar pa; paar, pa, 3fem.sg; paakiirati dat; empe; and so on, or use the selections below. After your selections, press the send button at the bottom.

II. Select verb and other tense forms from this list:

Other combinations for verbs:
Adjectival Participle Participial noun Negation

III. Select noun and case form:

If you prefer to fetch only sentences in Tamil script and not the other two types viz. romanized Tamil sentence and tagged sentence, click here to use a different HTML page.