Skip to content

Instantly share code, notes, and snippets.

@djanowski
Created December 11, 2012 14:09
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save djanowski/4258832 to your computer and use it in GitHub Desktop.
Save djanowski/4258832 to your computer and use it in GitHub Desktop.
Snowball Spanish stemmer improvements
Index: snowball/algorithms/spanish/stem_ISO_8859_1.sbl
===================================================================
--- snowball/algorithms/spanish/stem_ISO_8859_1.sbl (revision 556)
+++ snowball/algorithms/spanish/stem_ISO_8859_1.sbl (working copy)
@@ -98,7 +98,7 @@
(
R2 delete
)
- 'adora' 'ador' 'aci{o'}n'
+ 'adora' 'ador' 'aci{o'}n' 'acion'
'adoras' 'adores' 'aciones'
'ante' 'antes' 'ancia' 'ancias'// Note 1
(
Index: data/spanish/voc.txt
===================================================================
--- data/spanish/voc.txt (revision 556)
+++ data/spanish/voc.txt (working copy)
@@ -9742,6 +9742,7 @@
edú
eduard
eduardo
+educacion
educación
educada
educador
Index: data/spanish/diffs.txt
===================================================================
--- data/spanish/diffs.txt (revision 556)
+++ data/spanish/diffs.txt (working copy)
@@ -1288,7 +1288,7 @@
alimento aliment
alimentó aliment
alimentos aliment
-alineacion alineacion
+alineacion alin
alineación alin
alineaciones alin
alineado alin
@@ -6967,7 +6967,7 @@
cornoyer cornoy
coro cor
corona coron
-coronacion coronacion
+coronacion coron
coronación coron
coronada coron
coronado coron
@@ -8183,7 +8183,7 @@
deprimente depriment
deprimido deprim
deprimidos deprim
-depuracion depuracion
+depuracion depur
depuración depur
depuradísima depuradisim
depurado depur
@@ -9742,6 +9742,7 @@
edú edu
eduard eduard
eduardo eduard
+educacion educ
educación educ
educada educ
educador educ
@@ -11334,7 +11335,7 @@
evadía evad
evadió evad
evadir evad
-evaluacion evaluacion
+evaluacion evalu
evaluación evalu
evaluaciones evalu
evaluado evalu
@@ -20127,7 +20128,7 @@
pendiente pendient
pendientes pendient
pendleton pendleton
-penetracion penetracion
+penetracion penetr
penetración penetr
penetrados penetr
penetran penetr
@@ -22686,7 +22687,7 @@
recomiendo recom
recompensa recompens
recompra recompr
-reconciliacion reconciliacion
+reconciliacion reconcili
reconciliación reconcili
reconcilió reconcil
reconducción reconduccion
Index: data/spanish/output.txt
===================================================================
--- data/spanish/output.txt (revision 556)
+++ data/spanish/output.txt (working copy)
@@ -1288,10 +1288,10 @@
aliment
aliment
aliment
-alineacion
alin
alin
alin
+alin
aline
alist
aliusk
@@ -6967,11 +6967,11 @@
cornoy
cor
coron
-coronacion
coron
coron
coron
coron
+coron
coronel
coron
cor
@@ -8183,8 +8183,8 @@
depriment
deprim
deprim
-depuracion
depur
+depur
depuradisim
depur
depur
@@ -9753,6 +9753,7 @@
educ
educ
educ
+educ
eduqu
edward
efe
@@ -11334,7 +11335,6 @@
evad
evad
evad
-evaluacion
evalu
evalu
evalu
@@ -11344,6 +11344,7 @@
evalu
evalu
evalu
+evalu
evangel
evangeliz
evangeliz
@@ -20127,12 +20128,12 @@
pendient
pendient
pendleton
-penetracion
penetr
penetr
penetr
penetr
penetr
+penetr
penich
peninsul
peninsular
@@ -22686,8 +22687,8 @@
recom
recompens
recompr
-reconciliacion
reconcili
+reconcili
reconcil
reconduccion
reconform
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment