Skip to content

Instantly share code, notes, and snippets.

Avatar
👨‍💻
Learning...

Vikash Singh vi3k6i5

👨‍💻
Learning...
View GitHub Profile
@vi3k6i5
vi3k6i5 / regex_re_with_special_chars.py
Created Dec 12, 2017
trie regex with special characters
View regex_re_with_special_chars.py
import re
class Trie():
"""Regex::Trie in Python. Creates a Trie out of a list of words. The trie can be exported to a Regex pattern.
The corresponding Regex should match much faster than a simple Regex union."""
def __init__(self):
self.data = {}
def add(self, word):
View gist:4ea37490cddf6d8b4a1daf13f6e51457
import re
class Trie():
"""Regex::Trie in Python. Creates a Trie out of a list of words. The trie can be exported to a Regex pattern.
The corresponding Regex should match much faster than a simple Regex union."""
def __init__(self):
self.data = {}
def add(self, word):
@vi3k6i5
vi3k6i5 / flashtext_vs_cython_automaton_benchmark.py
Created Nov 14, 2017
Comparing flashtext with a cython implementation of similar algo
View flashtext_vs_cython_automaton_benchmark.py
#!/bin/python
from flashtext.keyword import KeywordProcessor
import random
import string
import re
from automaton import Automaton
import time
def get_word_of_length(str_length):
# generate a random word of given length
@vi3k6i5
vi3k6i5 / flashtext_regex_timing_keyword_extraction_regex_module.py
Created Oct 25, 2017
Benchmarking timing performance Keyword Extraction between regex (regex module) and flashtext
View flashtext_regex_timing_keyword_extraction_regex_module.py
#!/bin/python
from flashtext.keyword import KeywordProcessor
import random
import string
import regex
import time
def get_word_of_length(str_length):
# generate a random word of given length
return ''.join(random.choice(string.ascii_lowercase) for _ in range(str_length))
@vi3k6i5
vi3k6i5 / flashtext_regex_timing_keyword_extraction.java
Created Oct 25, 2017
Benchmarking timing performance Keyword Extraction using regex in java
View flashtext_regex_timing_keyword_extraction.java
// compare the results with FlashText here https://gist.github.com/vi3k6i5/604eefd92866d081cfa19f862224e4a0
import java.util.regex.*;
import java.lang.StringBuilder;
import java.util.*;
public class RegexBenchmark {
public static String getWordOfLength(int length) {
String SALTCHARS = "abcdefghijklmnopqrstuvwxyz1234567890";
StringBuilder salt = new StringBuilder();
@vi3k6i5
vi3k6i5 / guided_lda_example.py
Created Oct 7, 2017
guidedlda example code
View guided_lda_example.py
import numpy as np
import guidedlda
X = guidedlda.datasets.load_data(guidedlda.datasets.NYT)
vocab = guidedlda.datasets.load_vocab(guidedlda.datasets.NYT)
word2id = dict((v, idx) for idx, v in enumerate(vocab))
print(X.shape)
print(X.sum())
@vi3k6i5
vi3k6i5 / flashtext_regex_timing_keyword_replace.py
Last active Jul 8, 2021
Benchmarking timing performance Keyword Replace between regex and flashtext
View flashtext_regex_timing_keyword_replace.py
#!/bin/python
from flashtext.keyword import KeywordProcessor
import random
import string
import re
import time
def get_word_of_length(str_length):
# generate a random word of given length
@vi3k6i5
vi3k6i5 / flashtext_regex_timing_keyword_extraction.py
Last active Mar 7, 2021
Benchmarking timing performance Keyword Extraction between regex and flashtext
View flashtext_regex_timing_keyword_extraction.py
#!/bin/python
from flashtext.keyword import KeywordProcessor
import random
import string
import re
import time
def get_word_of_length(str_length):
# generate a random word of given length
return ''.join(random.choice(string.ascii_lowercase) for _ in range(str_length))
@vi3k6i5
vi3k6i5 / flashtext_regex_timing_find_and_replace.ipynb
Created Oct 3, 2017
Find and replace FlashText and regex comparison
View flashtext_regex_timing_find_and_replace.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@vi3k6i5
vi3k6i5 / comparison.md
Last active Sep 16, 2017
Comparison results for FlashText vs Regex
View comparison.md
Text Length 319065 Keywords Count 47326
FlashText 156 ms per loop
Compiled Regex 19.5 s per loop