Skip to content

Instantly share code, notes, and snippets.

View ep1804's full-sized avatar

Soonmok Kwon ep1804

  • Mesh Korea
  • Korea
View GitHub Profile
@amir-rahnama
amir-rahnama / create-ngrams.R
Last active February 21, 2019 18:54
Create N-grams for large text-files (very fast)
source("fast-ngrams.R")
con <- file("path_to_file", "r")
data <- readLines(con, encoding = 'UTF-8')
close(con)
data <- clean(data)
onegram <- text_to_ngrams(decode(data), 1)
bigram <- text_to_ngrams(decode(data), 2)
trigram <- text_to_ngrams(decode(data, 3))
@rponte
rponte / StringUtils.java
Last active September 12, 2024 16:18
Removing accents and special characters in Java: StringUtils.java and StringUtilsTest.java
package br.com.triadworks.rponte.util;
import java.text.Normalizer;
public class StringUtils {
/**
* Remove toda a acentuação da string substituindo por caracteres simples sem acento.
*/
public static String unaccent(String src) {