Skip to content

Instantly share code, notes, and snippets.

View vi3k6i5's full-sized avatar
👨‍💻
Learning...

Vikash Singh vi3k6i5

👨‍💻
Learning...
View GitHub Profile
@vi3k6i5
vi3k6i5 / regex_re_with_special_chars.py
Created December 12, 2017 16:23
trie regex with special characters
import re
class Trie():
"""Regex::Trie in Python. Creates a Trie out of a list of words. The trie can be exported to a Regex pattern.
The corresponding Regex should match much faster than a simple Regex union."""
def __init__(self):
self.data = {}
def add(self, word):
@vi3k6i5
vi3k6i5 / gist:4ea37490cddf6d8b4a1daf13f6e51457
Created December 12, 2017 16:04
Compare flashtext to Trie re
import re
class Trie():
"""Regex::Trie in Python. Creates a Trie out of a list of words. The trie can be exported to a Regex pattern.
The corresponding Regex should match much faster than a simple Regex union."""
def __init__(self):
self.data = {}
def add(self, word):
@vi3k6i5
vi3k6i5 / flashtext_vs_cython_automaton_benchmark.py
Created November 14, 2017 16:35
Comparing flashtext with a cython implementation of similar algo
#!/bin/python
from flashtext.keyword import KeywordProcessor
import random
import string
import re
from automaton import Automaton
import time
def get_word_of_length(str_length):
# generate a random word of given length
@vi3k6i5
vi3k6i5 / flashtext_regex_timing_keyword_extraction_regex_module.py
Created October 25, 2017 16:23
Benchmarking timing performance Keyword Extraction between regex (regex module) and flashtext
#!/bin/python
from flashtext.keyword import KeywordProcessor
import random
import string
import regex
import time
def get_word_of_length(str_length):
# generate a random word of given length
return ''.join(random.choice(string.ascii_lowercase) for _ in range(str_length))
@vi3k6i5
vi3k6i5 / flashtext_regex_timing_keyword_extraction.java
Created October 25, 2017 15:49
Benchmarking timing performance Keyword Extraction using regex in java
// compare the results with FlashText here https://gist.github.com/vi3k6i5/604eefd92866d081cfa19f862224e4a0
import java.util.regex.*;
import java.lang.StringBuilder;
import java.util.*;
public class RegexBenchmark {
public static String getWordOfLength(int length) {
String SALTCHARS = "abcdefghijklmnopqrstuvwxyz1234567890";
StringBuilder salt = new StringBuilder();
@vi3k6i5
vi3k6i5 / guided_lda_example.py
Created October 7, 2017 07:57
guidedlda example code
import numpy as np
import guidedlda
X = guidedlda.datasets.load_data(guidedlda.datasets.NYT)
vocab = guidedlda.datasets.load_vocab(guidedlda.datasets.NYT)
word2id = dict((v, idx) for idx, v in enumerate(vocab))
print(X.shape)
print(X.sum())
@vi3k6i5
vi3k6i5 / flashtext_regex_timing_keyword_replace.py
Last active May 28, 2023 19:54
Benchmarking timing performance Keyword Replace between regex and flashtext
#!/bin/python
from flashtext.keyword import KeywordProcessor
import random
import string
import re
import time
def get_word_of_length(str_length):
# generate a random word of given length
@vi3k6i5
vi3k6i5 / flashtext_regex_timing_keyword_extraction.py
Last active May 28, 2023 21:05
Benchmarking timing performance Keyword Extraction between regex and flashtext
#!/bin/python
from flashtext.keyword import KeywordProcessor
import random
import string
import re
import time
def get_word_of_length(str_length):
# generate a random word of given length
return ''.join(random.choice(string.ascii_lowercase) for _ in range(str_length))
@vi3k6i5
vi3k6i5 / flashtext_regex_timing_find_and_replace.ipynb
Created October 3, 2017 07:36
Find and replace FlashText and regex comparison
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@vi3k6i5
vi3k6i5 / comparison.md
Last active September 16, 2017 11:35
Comparison results for FlashText vs Regex
Text Length 319065 Keywords Count 47326
FlashText 156 ms per loop
Compiled Regex 19.5 s per loop