Skip to content

Instantly share code, notes, and snippets.

@alexott
Last active July 22, 2020 12:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save alexott/79f3fdc4478c1753e38cbfce87fd881e to your computer and use it in GitHub Desktop.
Save alexott/79f3fdc4478c1753e38cbfce87fd881e to your computer and use it in GitHub Desktop.
Testing quality of the language detection of Spark NLP, compared with FastText
// full code is here: https://github.com/alexott/spark-playground/tree/master/spark-nlp
Results of evaluation against dataset linked to the following blog post:
http://alexott.blogspot.com/2017/10/evaluating-fasttexts-models-for.html
+--------+-----+-------+-------------------+
|src_lang|count|correct| precision|
+--------+-----+-------+-------------------+
| bg| 203| 146| 0.7192118226600985|
| de| 236| 150| 0.635593220338983|
| el| 199| 165| 0.8291457286432161|
| en| 249| 99|0.39759036144578314|
| es| 255| 45|0.17647058823529413|
| fi| 199| 176| 0.8844221105527639|
| fr| 205| 140| 0.6829268292682927|
| hr| 175| 130| 0.7428571428571429|
| hu| 197| 170| 0.8629441624365483|
| it| 206| 144| 0.6990291262135923|
| no| 183| 142| 0.7759562841530054|
| pl| 196| 158| 0.8061224489795918|
| pt| 206| 156| 0.7572815533980582|
| ro| 193| 105| 0.5440414507772021|
| ru| 446| 373| 0.8363228699551569|
| sk| 195| 164| 0.841025641025641|
| sv| 202| 201| 0.995049504950495|
| tr| 195| 152| 0.7794871794871795|
| uk| 197| 171| 0.868020304568528|
+--------+-----+-------+-------------------+
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment