Skip to content

Instantly share code, notes, and snippets.

@kostasx
Last active December 16, 2022 09:03
Show Gist options
  • Star 5 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save kostasx/7516158 to your computer and use it in GitHub Desktop.
Save kostasx/7516158 to your computer and use it in GitHub Desktop.
Convert greek strings to URL slugs in JavaScript.
function string_to_slug(str) {
str = str.replace(/^\s+|\s+$/g, '') // TRIM WHITESPACE AT BOTH ENDS.
.toLowerCase(); // CONVERT TO LOWERCASE
const from = [ "ου", "ΟΥ", "Ού", "ού", "αυ", "ΑΥ", "Αύ", "αύ", "ευ", "ΕΥ", "Εύ", "εύ", "α", "Α", "ά", "Ά", "β", "Β", "γ", "Γ", "δ", "Δ", "ε", "Ε", "έ", "Έ", "ζ", "Ζ", "η", "Η", "ή", "Ή", "θ", "Θ", "ι", "Ι", "ί", "Ί", "ϊ", "ΐ", "Ϊ", "κ", "Κ", "λ", "Λ", "μ", "Μ", "ν", "Ν", "ξ", "Ξ", "ο", "Ο", "ό", "Ό", "π", "Π", "ρ", "Ρ", "σ", "Σ", "ς", "τ", "Τ", "υ", "Υ", "ύ", "Ύ", "ϋ", "ΰ", "Ϋ", "φ", "Φ", "χ", "Χ", "ψ", "Ψ", "ω", "Ω", "ώ", "Ώ" ];
const to = [ "ou", "ou", "ou", "ou", "au", "au", "au", "au", "eu", "eu", "eu", "eu", "a", "a", "a", "a", "b", "b", "g", "g", "d", "d", "e", "e", "e", "e", "z", "z", "i", "i", "i", "i", "th", "th", "i", "i", "i", "i", "i", "i", "i", "k", "k", "l", "l", "m", "m", "n", "n", "ks", "ks", "o", "o", "o", "o", "p", "p", "r", "r", "s", "s", "s", "t", "t", "y", "y", "y", "y", "y", "y", "y", "f", "f", "x", "x", "ps", "ps", "o", "o", "o", "o" ];
for ( var i = 0; i < from.length; i++ ) {
while( str.indexOf( from[i]) !== -1 ){
str = str.replace( from[i], to[i] ); // CONVERT GREEK CHARACTERS TO LATIN LETTERS
}
}
str = str.replace(/[^a-z0-9 -]/g, '') // REMOVE INVALID CHARS
.replace(/\s+/g, '-') // COLLAPSE WHITESPACE AND REPLACE BY DASH -
.replace(/-+/g, '-'); // COLLAPSE DASHES
return str;
}
@kostasx
Copy link
Author

kostasx commented Dec 6, 2022

Έχεις κάποιο παράδειγμα του τι προσπαθείς να μετατρέψεις για να πάρω μια ιδέα; @alexroumi

@alexroumi
Copy link

Προσπαθω να μετατρέψω μια πρόταση σε greeklish πχ "Αυτό είναι ένα μήλο" οπότε μου επιστρέφει "auto-einai-ena-milo" και έχω πάρει το function έχω αντιστρέψει το from και του και μου επιστρέφει "αυτο εηναη ενα μηλο" έχωντας αντικαταστίσει τις γραμμές 19,20,21 με str = str.replaceAll("-", ' ');

@kostasx
Copy link
Author

kostasx commented Dec 16, 2022

Γενικά, είναι πολύ δύσκολο να αντιστρέψεις τη διαδικασία του string => slug, μιας και έχει χαρακτήρα μονής κατεύθυνσης.

Θα μπορούσες να αλλάξεις τη μετατροπή των "Η, η, ή, Ή" σε "h" αντί για "i" και να λύσεις το πρόβλημα με την πρόταση που έδωσες, αλλά θα πρέπει να κάνεις έναν εκτενή έλεγχο και στις άλλες μετατροπές.

Εδώ, είναι ο κώδικας που κάνει τη μετατροπή και στις 2 κατευθύνσεις αλλά δεν περιέχει τόνους:


function string_to_slug(str) {

 str  = str.replace(/^\s+|\s+$/g, '') // TRIM WHITESPACE AT BOTH ENDS.
          .toLowerCase();            // CONVERT TO LOWERCASE

 const from = [ "ου", "ΟΥ", "Ού", "ού", "αυ", "ΑΥ", "Αύ", "αύ", "ευ", "ΕΥ", "Εύ", "εύ", "α", "Α", "ά", "Ά", "β", "Β", "γ", "Γ", "δ", "Δ", "ε", "Ε", "έ", "Έ", "ζ", "Ζ", "η", "Η", "ή", "Ή", "θ", "Θ", "ι", "Ι", "ί", "Ί", "ϊ", "ΐ", "Ϊ", "κ", "Κ", "λ", "Λ", "μ", "Μ", "ν", "Ν", "ξ", "Ξ", "ο", "Ο", "ό", "Ό", "π", "Π", "ρ", "Ρ", "σ", "Σ", "ς", "τ", "Τ", "υ", "Υ", "ύ", "Ύ", "ϋ", "ΰ", "Ϋ", "φ", "Φ", "χ", "Χ", "ψ", "Ψ", "ω", "Ω", "ώ", "Ώ" ];
 const to   = [ "ou", "ou", "ou", "ou", "au", "au", "au", "au", "eu", "eu", "eu", "eu", "a", "a", "a", "a", "b", "b", "g", "g", "d", "d", "e", "e", "e", "e", "z", "z", "h", "h", "h", "h", "th", "th", "i", "i", "i", "i", "i", "i", "i", "k", "k", "l", "l", "m", "m", "n", "n", "ks", "ks", "o", "o", "o", "o", "p", "p", "r", "r", "s", "s", "s", "t", "t", "y", "y", "y", "y", "y", "y", "y", "f", "f", "x", "x", "ps", "ps", "o", "o", "o", "o" ];

 for ( var i = 0; i < from.length; i++ ) {

    while( str.indexOf( from[i]) !== -1 ){

        str = str.replace( from[i], to[i] );    // CONVERT GREEK CHARACTERS TO LATIN LETTERS

    }
 
 }

 str = str.replace(/[^a-z0-9 -]/g, '') // REMOVE INVALID CHARS
         .replace(/\s+/g, '-')        // COLLAPSE WHITESPACE AND REPLACE BY DASH - 
         .replace(/-+/g, '-');        // COLLAPSE DASHES

 return str;

}
function slug_to_string(str) {
 str  = str.replace(/^\s+|\s+$/g, '') // TRIM WHITESPACE AT BOTH ENDS.
          .toLowerCase();            // CONVERT TO LOWERCASE
 const to = [ "ου", "ΟΥ", "Ού", "ού", "αυ", "ΑΥ", "Αύ", "αύ", "ευ", "ΕΥ", "Εύ", "εύ", "α", "Α", "ά", "Ά", "β", "Β", "γ", "Γ", "δ", "Δ", "ε", "Ε", "έ", "Έ", "ζ", "Ζ", "η", "Η", "ή", "Ή", "θ", "Θ", "ι", "Ι", "ί", "Ί", "ϊ", "ΐ", "Ϊ", "κ", "Κ", "λ", "Λ", "μ", "Μ", "ν", "Ν", "ξ", "Ξ", "ο", "Ο", "ό", "Ό", "π", "Π", "ρ", "Ρ", "σ", "Σ", "ς", "τ", "Τ", "υ", "Υ", "ύ", "Ύ", "ϋ", "ΰ", "Ϋ", "φ", "Φ", "χ", "Χ", "ψ", "Ψ", "ω", "Ω", "ώ", "Ώ" ];
 const from   = [ "ou", "ou", "ou", "ou", "au", "au", "au", "au", "eu", "eu", "eu", "eu", "a", "a", "a", "a", "b", "b", "g", "g", "d", "d", "e", "e", "e", "e", "z", "z", "h", "h", "h", "h", "th", "th", "i", "i", "i", "i", "i", "i", "i", "k", "k", "l", "l", "m", "m", "n", "n", "ks", "ks", "o", "o", "o", "o", "p", "p", "r", "r", "s", "s", "s", "t", "t", "y", "y", "y", "y", "y", "y", "y", "f", "f", "x", "x", "ps", "ps", "o", "o", "o", "o" ];

 for ( var i = 0; i < from.length; i++ ) {
    while( str.indexOf( from[i]) !== -1 ){
        str = str.replace( from[i], to[i] );    // CONVERT GREEK CHARACTERS TO LATIN LETTERS
    }
 }
 return str;
}

const slug = string_to_slug("Αυτό είναι ένα μήλο");
console.log( slug ); //=> "auto-einai-ena-mhlo"
const string = slug_to_string( slug );
console.log( string ); //=> "αυτο-ειναι-ενα-μηλο"

Εάν θέλεις να αντιστρέψεις τη μετατροπή με ακρίβεια (π.χ. να περιέχει και τους τόνους), μάλλον θα πρέπει να χρησιμοποιήσεις ένα λεξικό και να κάνεις ορθογραφική σύγκριση και μετατροπή, αλλά αυτό χρειάζεται αρκετή δουλίτσα και σίγουρα δεν θα είναι τόσο γρήγορο όσο ένα απλό search and replace όπως ο αλγόριθμος παραπάνω.

@alexroumi

@kostasx
Copy link
Author

kostasx commented Dec 16, 2022

Κάποιες λίστες με ελληνικές λέξεις θα βρεις εδώ:

https://raw.githubusercontent.com/kalpetros/greek-dictionary/main/files/el.txt
https://raw.githubusercontent.com/huertatipografica/greekguide/master/greek-dictionary.txt
https://raw.githubusercontent.com/titoBouzout/Dictionaries/master/Greek.dic

Θα μπορούσες να εκμεταλλευτείς αλγόριθμους σύγκρισης String (π.χ. Levenshtein distance algorithm) και να συγκρίνεις το αποτέλεσμα της αντίστροφης διαδικασίας με το λεξικό και να κάνεις τη διόρθωση, π.χ. εάν 'σκανάρεις' το λεξικό θα βρεις ότι η λέξη "μηλο" βρίσκεται σε 1 μονάδα απόσταση από το "μήλο" (πάντα με βάση τον αλγόριθμο, μιας και ένα γράμμα έχει μόνο αλλάξει, το "η") και έτσι θα κάνεις την αντικατάσταση της λέξης "μηλο" με τη λέξη "μήλο" από το λεξικό. Θα πρέπει επίσης να λάβεις υπόψιν σου και τους χαρακτήρες, μιας και ο αλγόριθμος θα βρει ενδεχομένως και άλλες υποψήφιες λέξεις π.χ. "μηλο" => "μιλώ" (2 μονάδες απόστασης) και ούτω καθεξής, οπότε από εκεί και πέρα η αντιστροφή περιπλέκεται.

Σε κάθε περίπτωση, πρέπει να εξετάσεις παραμέτρους όπως η ανάγκη και το πλαίσιο του εν λόγω αλγορίθμου για να δεις αν πρέπει να μελετήσεις και να υλοποιήσεις μια παρόμοια λύση και το ανάλογο κόστος της (σε χρόνο και CPU cycles).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment