Center for Research in
Urdu Language Processing

 
 


 

 

 


 
 

[ Localization ] [ Language Processing ] [ Linguistic Resources ]

 
   
  Urdu-Nepali-English Parallel Corpus  
     
 

Center for Research in Urdu Language Processing (CRULP) is pleased to release Urdu and Nepali corpora parallel to 100,000 words of common English source from PENN Treebank corpus, available through Linguistic Data Consortium (LDC).  The text files used are listed in the README files provided for each corpus. The corpora are also tagged for part of speech.

The work has been supported by the Language Resource Association (GSK) of Japan and International Development Research Center (IDRC) of Canada, through PAN Localization project (www.PANL10n.net).

 
     
  Download  
  Urdu Corpus Read me License  
  Urdu Corpus Extended Read Me License  
  POS Tagged Urdu Corpus Read Me License  
  Nepali Corpus Read me License  
  POS Tagged Nepali Corpus Read me License  
     
 

webmaster@crulp.org