| |
[
Localization ] [
Language Processing ]
[
Linguistic Resources ] |
|
| |
 |
|
| |
Urdu 5000
most Frequently Used Words |
|
|
|
|
|
| |
Release
Notes |
|
| |
The wordlist has been
extracted from 19.3 million corpus gathered from a wide
range of domains
as mentioned in the following table, keeping in view the
end user perspective. |
|
| |
|
|
| |
|
Domains |
Sub
domains |
| |
|
| C1.
Sports/Games |
C1.1. Sports (special events) |
C2.
News
|
C2.1.
Local and international affairs |
| C2.2.
Editorials and opinions |
C3.
Finance
|
C3.1.
Business, domestic and
foreign market |
| C4.
Culture/Entertainment |
C4.1.
Music, theatre,exhibitions,
review articles on
literature |
| C4.2.
Travel / tourism |
| C5.
Consumer Information |
C5.1.
Health |
| C5.2.
Popular science |
| C5.3.
Consumer technology |
C6.
Personal communications
|
C6.1.
Emails, online, discussions,
editorials, e-zines |
| |
|
|
|
| |
Domain wise
corpus size distribution is given in the following
table. |
|
| |
|
|
| |
|
Domains |
Raw Corpora |
|
Size |
Distinct words |
| |
|
|
| C1.
Sports/Games |
1666304 |
23118 |
| C2.
News |
8957259 |
67365 |
| C3.
Finance |
1162019 |
17024 |
| C4.
Culture/Entertainment |
3845117 |
59214 |
| C5.
Consumer Information |
1980723 |
34151 |
| C6.
Personal communications |
1685424 |
30469 |
| |
|
|
| Total |
19296846 |
104341 |
|
|
| |
|
|
|
| |
Download |
|
|
| |
Urdu
5000 Most Frequently Used Words List |
License |
|
| |
|
|