| |
Tagset of a language
caters main parts of speech as well as morphological
information of the language. A tagset may be consisted
either of syntactic categories or it may be consisted of
morpho-syntactic categories. Considering the efficiency
in machine learning process and to reduce lexical and
syntactic ambiguity, it was decided to concentrate on
the syntactic categories of language.
There were three types of corpus available for analysis
i.e. literature, news and poetry corpus. For the design
of tagset, only literature and news corpus was analyzed.
The corpus was based on the most recent available
vocabulary used by local people. |
|