Center for Research in
Urdu Language Processing

 
 


 

 

 


 
 

[ Localization ] [ Language Processing ] [ Linguistic Resources ]

 
   
  Urdu Part of Speech Tagset  
     
  Release Notes  
 

Tagset of a language caters main parts of speech as well as morphological information of the language. A tagset may be consisted either of syntactic categories or it may be consisted of morpho-syntactic categories. Considering the efficiency in machine learning process and to reduce lexical and syntactic ambiguity, it was decided to concentrate on the syntactic categories of language.

There were three types of corpus available for analysis i.e. literature, news and poetry corpus. For the design of tagset, only literature and news corpus was analyzed. The corpus was based on the most recent available vocabulary used by local people.

 
     
  Download (This file has been accessed: times, since 01 June 2009)  
 

Urdu Part of Speech Tagset

License  
     
 

webmaster@crulp.org