Skip to content

Instantly share code, notes, and snippets.

@symbolboxer
Last active March 25, 2024 20:35
Show Gist options
  • Save symbolboxer/764c65a6d2f7ee9b8d971ebd6414b8c3 to your computer and use it in GitHub Desktop.
Save symbolboxer/764c65a6d2f7ee9b8d971ebd6414b8c3 to your computer and use it in GitHub Desktop.
Shift-as-letter analysis results
## Overall character frequency
: a lot; number inaccurate because of corpus format
e: 4965563
t: 3796360
a: 3366681
o: 3229192
i: 3003731
n: 2880999
s: 2692688
r: 2470163
h: 2119643
l: 1714555
d: 1546806
¤: 1395737 <-- Shift
c: 1229776
u: 1205511
m: 1050041
p: 927183
g: 902122
y: 861389
f: 840468
w: 824253
.: 675067
b: 644296
,: 540836
v: 434956
k: 381381
': 230595
-: 187483
": 141299
<: 111205
>: 111096
j: 89050
?: 74907
x: 74820
:: 58806
): 45344
(: 44998
!: 44441
z: 43587
q: 36929
#: 25225
;: 24034
/: 13462
*: 9002
$: 8124
&: 5047
%: 4017
]: 3752
[: 3751
=: 2292
_: 2059
+: 658
Numbers and @ were removed from results because this corpus used them to mark different samples, and so their occurrence was artificially high.
## Frequency of bigrams (top 250)
th: 1073975
he: 915091
in: 786075
er: 637156
an: 621225
re: 579337
on: 485476
at: 471859
ou: 412104
en: 411688
nd: 397639
or: 387015
es: 377357
ha: 375321
to: 375117
ng: 373767
it: 360774
st: 350207
te: 348129
ar: 344949
ti: 335140
ed: 332034
is: 328754
al: 318794
ve: 291261
nt: 291023
le: 272852
se: 271888
me: 270783
as: 270057
hi: 258579
ea: 257806
of: 249921
ne: 235059
ll: 235043
co: 229202
ro: 219083
de: 215500
ri: 214281
li: 213779
¤i: 204708
ic: 192137
ra: 187578
om: 186027
be: 185960
ca: 183029
ho: 182049
io: 178003
el: 176617
ch: 171476
ma: 170499
ce: 169949
no: 168850
ur: 166929
ut: 163025
ta: 160037
us: 157455
ot: 153872
yo: 152845
wa: 152475
si: 151988
fo: 151701
la: 150243
et: 149663
il: 146943
pe: 145210
wh: 143243
we: 142034
so: 141641
ac: 139693
di: 139023
ee: 137932
¤t: 137191
ow: 136634
rs: 135088
ly: 132838
lo: 132200
ns: 130714
ge: 129715
ec: 127861
wi: 123997
un: 123951
sh: 120299
ad: 118979
id: 117800
pr: 117007
ie: 115460
tr: 113230
mo: 112961
ol: 112920
ke: 112331
ay: 110280
ss: 108713
rt: 108087
¤a: 108009
ul: 105787
ai: 105580
p>: 105525
am: 104917
¤s: 104822
oo: 104646
<p: 104356
ct: 104254
do: 104005
ts: 103883
ni: 103309
em: 101161
nc: 100131
's: 100112
na: 99974
mi: 99849
po: 97897
pa: 96899
im: 95530
ir: 95521
ig: 94807
ld: 94620
sa: 92466
gh: 92133
os: 87689
fi: 87454
pl: 86885
ev: 86450
wo: 86383
ry: 82444
¤w: 82423
bo: 82130
av: 81574
go: 80876
su: 80325
vi: 80169
ab: 77440
iv: 77120
bu: 76151
op: 76093
¤c: 75647
ey: 73270
¤m: 72687
fe: 72361
ba: 71834
tu: 71537
ia: 71439
ci: 71336
ck: 69514
bl: 69101
if: 66576
¤b: 66157
¤h: 65609
da: 65288
fr: 64590
ag: 64453
ov: 64072
tt: 63751
mp: 63044
od: 61290
gr: 58414
ht: 58221
ty: 57306
sp: 56941
ap: 56837
rd: 55572
ki: 55340
ei: 55162
ex: 54589
fa: 52021
up: 51472
ga: 51360
ep: 50439
ye: 50393
cr: 50278
uc: 50046
rn: 49665
n': 49608
't: 49307
ak: 49274
ff: 47821
ug: 47694
gi: 47304
cl: 47227
¤p: 47226
sc: 46553
¤d: 46105
au: 46035
..: 45586
oc: 45069
ls: 44720
cu: 43727
pp: 43329
ew: 42958
pi: 42938
ef: 42894
¤n: 42146
ru: 41706
oi: 41416
rr: 41382
ue: 41322
¤r: 39744
ds: 39744
rm: 39629
ok: 39194
¤y: 38791
ny: 38608
¤f: 38501
by: 38397
nk: 38333
bi: 38290
¤o: 38203
my: 37337
¤l: 37106
br: 36849
um: 36790
ys: 36729
¤g: 36284
ju: 35353
lu: 35173
rk: 35049
eo: 34873
eg: 34593
lt: 34241
mu: 33876
ob: 33407
ua: 33406
qu: 33385
dr: 33321
du: 32783
rc: 32651
ud: 32500
pu: 32287
¤j: 32042
va: 31879
ik: 31171
kn: 30847
nn: 30708
tl: 30646
fu: 30602
gu: 30088
hr: 29878
¤e: 29535
rg: 29044
wn: 28823
rl: 28373
og: 28167
mm: 27615
ft: 27209
ui: 26923
ip: 26818
mb: 26205
ms: 26136
af: 25794
--: 25053
## Frequency of letters following Shift
i: 204708
t: 137191
a: 108009
s: 104822
w: 82423
c: 75647
m: 72687
b: 66157
h: 65609
p: 47226
d: 46105
n: 42146
r: 39744
y: 38791
f: 38501
o: 38203
l: 37106
g: 36284
j: 32042
e: 29535
k: 17838
u: 16233
v: 12061
q: 2765
z: 2613
x: 1291
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment