Skip to content

Instantly share code, notes, and snippets.

@dginev
Created May 1, 2019 18:48
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dginev/f1fa7fae3bed3fb32116c31c81d484fc to your computer and use it in GitHub Desktop.
Save dginev/f1fa7fae3bed3fb32116c31c81d484fc to your computer and use it in GitHub Desktop.
arXMLiv 08.2018 dataset, subject classification frequencies
Subject Document count
math 334932
astro-ph 223437
cond-mat 212384
cs 132338
hep-ph 130788
hep-th 116499
physics 99881
quant-ph 80888
gr-qc 68642
cond-mat.stat-mech 48636
math.mp 48497
math-ph 48497
cond-mat.mes-hall 47489
cond-mat.str-el 44217
astro-ph.co 42940
nucl-th 40583
stat 35585
cond-mat.mtrl-sci 35122
hep-ex 34327
math.co 33170
astro-ph.sr 33116
astro-ph.ga 30579
math.ag 30090
math.pr 29195
math.ap 29134
astro-ph.he 29122
nlin 27453
cond-mat.supr-con 26286
math.dg 25112
cond-mat.soft 22189
math.it 21991
cs.it 21991
math.nt 21909
hep-lat 20220
cs.lg 19031
math.ds 18745
q-bio 18136
math.fa 17015
cs.cv 16901
nucl-ex 16334
math.oc 15800
physics.optics 15593
cond-mat.dis-nn 15186
stat.ml 15128
math.na 13623
math.gt 13406
math.rt 13260
math.ca 12900
math.gr 12521
physics.atom-ph 12260
cond-mat.quant-gas 12246
math.qa 12207
astro-ph.ep 12119
astro-ph.im 11643
physics.flu-dyn 11123
nlin.cd 10860
physics.soc-ph 10517
cond-mat.other 10498
cs.ai 10102
cs.ds 10070
math.cv 9958
stat.th 9689
math.st 9689
math.ra 9575
physics.comp-ph 8387
stat.me 8381
nlin.si 8270
physics.chem-ph 8225
physics.ins-det 8151
math.oa 7946
cs.cl 7808
math.ac 7656
physics.plasm-ph 7477
math.at 7259
cs.si 7152
q-fin 7112
cs.ni 6993
physics.bio-ph 6796
math.lo 6767
cs.dm 6329
cs.lo 6149
math.mg 6052
cs.cr 5836
cs.dc 5812
nlin.ps 5789
stat.ap 5634
math.sp 5622
cs.sy 5520
math.sg 5429
physics.data-an 5424
physics.gen-ph 5335
cs.cc 5189
q-bio.pe 4905
physics.class-ph 4218
cs.gt 4102
cs.ne 3789
nlin.ao 3672
cs.ro 3546
q-bio.qm 3425
q-bio.nc 3164
cs.cg 3062
cs.ir 3061
math.kt 3055
stat.co 3005
math.gn 2865
physics.acc-ph 2721
cs.se 2718
math.ct 2680
physics.space-ph 2410
cs.db 2338
cs.pl 2331
q-bio.bm 2321
cs.cy 2309
physics.geo-ph 2170
physics.ao-ph 2145
chao-dyn 2140
eess 2121
cs.fl 2022
cs.na 1896
q-bio.mn 1869
physics.hist-ph 1824
cs.hc 1750
q-fin.st 1715
cs.ce 1682
math.ho 1646
math.gm 1530
physics.atm-clus 1475
cs.ma 1467
q-alg 1426
solv-int 1319
eess.sp 1301
physics.ed-ph 1300
alg-geom 1271
cs.sd 1247
physics.med-ph 1242
q-fin.pr 1190
cs.sc 1180
q-fin.gn 1169
cs.pf 1134
cs.mm 1086
q-bio.gn 1065
cs.dl 1033
physics.app-ph 1002
q-bio.cb 1000
physics.pop-ph 950
q-fin.cp 934
nlin.cg 932
q-fin.rm 919
cs.gr 918
cs.ms 915
q-bio.sc 915
q-fin.mf 817
q-fin.tr 790
cs.et 783
q-fin.pm 777
q-bio.to 676
dg-ga 655
patt-sol 589
eess.as 581
cs.ar 559
cmp-lg 536
cs.oh 517
adap-org 466
q-fin.ec 417
funct-an 391
econ 382
stat.ot 380
q-bio.ot 377
econ.em 312
eess.iv 302
mtrl-th 255
cs.os 250
chem-ph 233
comp-gas 155
supr-con 153
atom-ph 117
cs.gl 90
econ.th 41
econ.gn 35
acc-phys 34
plasm-ph 25
ao-sci 13
bayes-an 12
@dginev
Copy link
Author

dginev commented May 1, 2019

Metadata statistics over the arXMLiv 08.2018 dataset, obtained by using arXiv's OAI-harvested metadata.

Documents may have more than one category label, and specific categories also induce the general label in preprocessing (a document tagged cs.AI will also be marked as cs).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment