Skip to content

Instantly share code, notes, and snippets.

👨‍💻
Learning...

Vikash Singh vi3k6i5

Block or report user

Report or block vi3k6i5

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@vi3k6i5
vi3k6i5 / regex_re_with_special_chars.py
Created Dec 12, 2017
trie regex with special characters
View regex_re_with_special_chars.py
import re
class Trie():
"""Regex::Trie in Python. Creates a Trie out of a list of words. The trie can be exported to a Regex pattern.
The corresponding Regex should match much faster than a simple Regex union."""
def __init__(self):
self.data = {}
def add(self, word):
View gist:4ea37490cddf6d8b4a1daf13f6e51457
import re
class Trie():
"""Regex::Trie in Python. Creates a Trie out of a list of words. The trie can be exported to a Regex pattern.
The corresponding Regex should match much faster than a simple Regex union."""
def __init__(self):
self.data = {}
def add(self, word):
@vi3k6i5
vi3k6i5 / flashtext_vs_cython_automaton_benchmark.py
Created Nov 14, 2017
Comparing flashtext with a cython implementation of similar algo
View flashtext_vs_cython_automaton_benchmark.py
#!/bin/python
from flashtext.keyword import KeywordProcessor
import random
import string
import re
from automaton import Automaton
import time
def get_word_of_length(str_length):
# generate a random word of given length
@vi3k6i5
vi3k6i5 / flashtext_regex_timing_keyword_extraction_regex_module.py
Created Oct 25, 2017
Benchmarking timing performance Keyword Extraction between regex (regex module) and flashtext
View flashtext_regex_timing_keyword_extraction_regex_module.py
#!/bin/python
from flashtext.keyword import KeywordProcessor
import random
import string
import regex
import time
def get_word_of_length(str_length):
# generate a random word of given length
return ''.join(random.choice(string.ascii_lowercase) for _ in range(str_length))
@vi3k6i5
vi3k6i5 / flashtext_regex_timing_keyword_extraction.java
Created Oct 25, 2017
Benchmarking timing performance Keyword Extraction using regex in java
View flashtext_regex_timing_keyword_extraction.java
// compare the results with FlashText here https://gist.github.com/vi3k6i5/604eefd92866d081cfa19f862224e4a0
import java.util.regex.*;
import java.lang.StringBuilder;
import java.util.*;
public class RegexBenchmark {
public static String getWordOfLength(int length) {
String SALTCHARS = "abcdefghijklmnopqrstuvwxyz1234567890";
StringBuilder salt = new StringBuilder();
@vi3k6i5
vi3k6i5 / guided_lda_example.py
Created Oct 7, 2017
guidedlda example code
View guided_lda_example.py
import numpy as np
import guidedlda
X = guidedlda.datasets.load_data(guidedlda.datasets.NYT)
vocab = guidedlda.datasets.load_vocab(guidedlda.datasets.NYT)
word2id = dict((v, idx) for idx, v in enumerate(vocab))
print(X.shape)
print(X.sum())
@vi3k6i5
vi3k6i5 / flashtext_regex_timing_keyword_replace.py
Last active Mar 18, 2018
Benchmarking timing performance Keyword Replace between regex and flashtext
View flashtext_regex_timing_keyword_replace.py
#!/bin/python
from flashtext.keyword import KeywordProcessor
import random
import string
import re
import time
def get_word_of_length(str_length):
# generate a random word of given length
@vi3k6i5
vi3k6i5 / flashtext_regex_timing_keyword_extraction.py
Last active Dec 12, 2017
Benchmarking timing performance Keyword Extraction between regex and flashtext
View flashtext_regex_timing_keyword_extraction.py
#!/bin/python
from flashtext.keyword import KeywordProcessor
import random
import string
import re
import time
def get_word_of_length(str_length):
# generate a random word of given length
return ''.join(random.choice(string.ascii_lowercase) for _ in range(str_length))
@vi3k6i5
vi3k6i5 / flashtext_regex_timing_find_and_replace.ipynb
Created Oct 3, 2017
Find and replace FlashText and regex comparison
View flashtext_regex_timing_find_and_replace.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@vi3k6i5
vi3k6i5 / comparison.md
Last active Sep 16, 2017
Comparison results for FlashText vs Regex
View comparison.md
Text Length 319065 Keywords Count 47326
FlashText 156 ms per loop
Compiled Regex 19.5 s per loop
You can’t perform that action at this time.