These numbers are taken from the FastParse test suite, which runs over the following libraries:
- fastparse
- scalaJs
- scalaz
- shapeless
- akka
- lift
- play
- PredictionIO
- spark
- sbt
- cats
- finagle
- kafka
- breeze
- spire
- saddle
- scala
And checks how often each rule succeeds. This covers >15,000 files and >12,000,000 LOC
The raw numbers are below:
Rule | Count |
---|---|
Ideographic | 0 |
PI | 0 |
EmptyElemTagPEnd | 0 |
CDStart | 2 |
CData | 2 |
CDSect | 2 |
CDEnd | 2 |
CharRef | 3 |
ScalaPatterns | 4 |
Patterns | 4 |
ContentP | 7 |
ETagP | 7 |
XmlPattern | 7 |
ElemPattern | 7 |
TagPHeader | 7 |
STagPEnd | 7 |
EntityRef | 24 |
Reference | 25 |
OctalEscape | 73 |
ClsAnnot | 89 |
DoWhile | 109 |
do |
109 |
<% |
130 |
CharA | 160 |
Assign | 168 |
ClassQualifier | 171 |
ImplicitLambda | 179 |
Digit | 189 |
" | " |
PkgObj | 312 |
Implicit | 316 |
EmptyElemTagEnd | 378 |
EarlyDefTmpl | 385 |
forSome |
392 |
ExistentialClause | 392 |
PkgBlock | 399 |
← |
432 |
# |
806 |
macro |
807 |
XmlExpr | 1099 |
finally |
1129 |
Finally | 1130 |
SelfType | 1243 |
sealed |
1597 |
>: |
1620 |
Refinement | 1629 |
Return | 1639 |
return |
1640 |
ScalaExpr | 1715 |
PostFix | 1754 |
yield |
1774 |
Binding | 1810 |
UnicodeEscape | 1881 |
FloatType | 1890 |
"*" | 1908 |
Exp | 1909 |
Symbol | 1945 |
Eq | 1963 |
Attribute | 1964 |
AttValue | 1964 |
_* | 2067 |
Enumerator | 2370 |
Content | 2396 |
ETag | 2396 |
STagEnd | 2397 |
{ | 2419 |
super |
2547 |
XmlContent | 2634 |
Content1 | 2744 |
abstract |
2769 |
Element | 2774 |
TagHeader | 2775 |
Catch | 2799 |
catch |
2799 |
CharData | 3075 |
HexNum | 3207 |
BacktickId | 3233 |
While | 3549 |
while |
3657 |
ThisPath | 3713 |
ThisSuper | 3714 |
Try | 3782 |
try |
3782 |
Guard | 4210 |
lazy |
4585 |
TripleTail | 4616 |
TripleChars | 4616 |
throw |
4963 |
Throw | 4963 |
protected |
5736 |
Enumerators | 5935 |
For | 5935 |
for |
5936 |
TypeDef | 6196 |
<: |
6568 |
⇒ |
6677 |
ClsArgMod | 6797 |
PatLiteral | 7126 |
Name | 7169 |
BaseChar | 7170 |
XNameStart | 7170 |
HexDigit | 7521 |
<- |
7565 |
Generator | 7998 |
final |
8851 |
PlainIdNoDollar | 9224 |
TQ | 9231 |
Selectors | 9559 |
EscapedChars | 9771 |
TopPkgSeq | 9842 |
AccessQualifier | 10200 |
with |
10811 |
AscriptionType | 10969 |
type |
11060 |
TypePattern | 11661 |
TypePat | 11681 |
TraitDef | 11894 |
trait |
11896 |
TypeArgList | 12882 |
Ascription | 12988 |
QualId | 13092 |
Annot | 13194 |
CharQ | 13375 |
match |
13379 |
package |
13574 |
var |
13900 |
CompilationUnit | 15064 |
@ |
15097 |
TopStatSeq | 15434 |
AllArgs | 16338 |
Thingy | 16563 |
ClsArgs | 16609 |
ExprPrefix | 18739 |
ObjDef | 19601 |
object |
19615 |
this |
19755 |
Variant | 19860 |
NameStartChar | 20087 |
implicit |
20323 |
NameChar | 20354 |
FunTypeArgs | 20711 |
override |
21124 |
else |
21729 |
Else | 21730 |
Selector | 23101 |
Bool | 23596 |
Thing | 24398 |
CaseClauses | 24502 |
private |
24904 |
Thing2 | 26236 |
Float | 26332 |
MatchAscriptionSuffix | 26371 |
LambdaRhs | 26849 |
ClsDef | 27830 |
class |
27885 |
LocalMod | 27970 |
AccessMod | 30636 |
ClsArg | 30743 |
extends |
33124 |
LetterDigitDollarUnderscore | 33174 |
If | 33797 |
TupleEx | 36982 |
ExtractorArgs | 36986 |
if |
37993 |
ParenedLambda | 39642 |
TmplBody | 49989 |
Char1 | 51177 |
DefTmpl | 51319 |
New | 53019 |
new |
53026 |
Import | 54099 |
import |
54100 |
ImportExpr | 54280 |
CtxBounds | 55627 |
TypeArg | 55634 |
Parened | 56529 |
Char | 57771 |
CaseClause | 59454 |
Mod | 79023 |
TopStat | 83166 |
NamedTmpl | 85746 |
Constrs | 85747 |
AnonTmpl | 86155 |
Constr | 94889 |
Args | 97834 |
=> |
99875 |
FunArgs | 114242 |
BlockExpr | 119049 |
_ |
121207 |
Pattern | 123107 |
TypeOrBindPattern | 131584 |
val |
139110 |
ValVarDef | 146190 |
FunArg | 146440 |
Tmpl | 148466 |
FunSig | 148928 |
FunDef | 149123 |
def |
149142 |
Int | 154846 |
} | 171254 |
SingleChars | 171605 |
String | 176462 |
case |
177082 |
"}" | 183623 |
BlockEnd | 184599 |
Block | 184996 |
TypeArgs | 190898 |
"{" | 202803 |
OpChar | 203886 |
Types | 205562 |
Operator | 219160 |
InfixSuffix | 228631 |
"[" | 234697 |
"]" | 234701 |
Extractor | 238699 |
TmplStat | 240386 |
VarId | 255910 |
InfixPattern | 266774 |
BindPattern | 266951 |
SimplePattern | 272459 |
Dcl | 301426 |
= |
319508 |
BlockDef | 321937 |
: |
323664 |
Body | 343983 |
DecNum | 358460 |
ExprLiteral | 393978 |
Literal | 400798 |
MultilineComment | 417655 |
SameLineCharChunks | 422960 |
LineComment | 428650 |
BlockStat | 434312 |
"." | 487694 |
"," | 497062 |
PostDotCheck | 527584 |
ArgList | 532846 |
ParenArgList | 542139 |
Exprs | 553102 |
PostfixType | 607743 |
InfixType | 608010 |
Type | 612167 |
Unbounded | 615180 |
CompoundType | 628819 |
Letter | 648719 |
TypeBounds | 670383 |
AnnotType | 723032 |
SimpleType | 737524 |
BasicType | 738579 |
O | 779453 |
")" | 827263 |
"(" | 828441 |
Comment | 845933 |
Prelude | 875773 |
W | 1076937 |
Semis | 1132011 |
Path | 1204887 |
AlphabetKeywords | 1248607 |
SymbolicKeywords | 1484209 |
PostfixLambda | 1538313 |
PostfixExpr | 1545698 |
PostfixSuffix | 1566877 |
SmallerExprOrLambda | 1579317 |
Expr | 1627600 |
SimpleExpr | 1767224 |
PrefixExpr | 1771497 |
ExprSuffix | 1799842 |
Upper | 1938985 |
IdPath | 2189408 |
StableId | 2195822 |
OneNLMax | 2345703 |
ConsumeComments | 2353451 |
Semi | 2530044 |
CommentChunk | 2646017 |
Keywords | 2716775 |
NotNewline | 5355498 |
VarId0 | 5860421 |
Lower | 5890725 |
IdUnderscoreChunk | 6689840 |
IdRest | 7467894 |
PlainId | 7710709 |
WS | 12040535 |
Newline | 12456005 |
Id | 12580572 |
WSChars | 42329645 |
WL | 62297135 |
Cool stuff! Is there a mapping from rule name to definition? For example, what does "WL" mean? It's the most common thing here.
And, if you do another run, throw in the rapture libraries 😃. They are a lot of interesting Scala code.
Wondering what the breakdown of that library's idoms vs. "mainstream" are. Maybe we could do some topic modeling or clustering on these rules to see what the "neighborhoods" of code structure are in Scala. For example, maybe there's a Scalaz/cats/rapture space that is distinct from a Spark/"Scala as Java" space.