Skip to content

Instantly share code, notes, and snippets.

View JayKimBravekjh's full-sized avatar

Data Scientist JayKimBravekjh

  • LA, USA
View GitHub Profile
@JayKimBravekjh
JayKimBravekjh / KJV_Spacy_.idea_KJV_Spacy.iml
Created November 16, 2017 08:24 — forked from denjn5/KJV_Spacy_.idea_KJV_Spacy.iml
Topic Modeling with Spacy and Gensim
<?xml version="1.0" encoding="UTF-8"?>
<module type="PYTHON_MODULE" version="4">
<component name="NewModuleRootManager">
<content url="file://$MODULE_DIR$" />
<orderEntry type="jdk" jdkName="Python 3.5.2 (~/anaconda/bin/python)" jdkType="Python SDK" />
<orderEntry type="sourceFolder" forTests="false" />
</component>
<component name="TestRunnerService">
<option name="PROJECT_TEST_RUNNER" value="Unittests" />
</component>
@JayKimBravekjh
JayKimBravekjh / KJV_Spacy_.idea_KJV_Spacy.iml
Created November 16, 2017 08:24 — forked from denjn5/KJV_Spacy_.idea_KJV_Spacy.iml
Topic Modeling with Spacy and Gensim
<?xml version="1.0" encoding="UTF-8"?>
<module type="PYTHON_MODULE" version="4">
<component name="NewModuleRootManager">
<content url="file://$MODULE_DIR$" />
<orderEntry type="jdk" jdkName="Python 3.5.2 (~/anaconda/bin/python)" jdkType="Python SDK" />
<orderEntry type="sourceFolder" forTests="false" />
</component>
<component name="TestRunnerService">
<option name="PROJECT_TEST_RUNNER" value="Unittests" />
</component>
@JayKimBravekjh
JayKimBravekjh / build.sh
Created December 25, 2017 05:45
nltk-with-data conda recipe
#!/bin/bash
python setup.py install --single-version-externally-managed --record=record.txt
# Download data
python -m nltk.downloader -d $PREFIX/nltk_data all
# Remove original zip files
rm $PREFIX/nltk_data/**/*.zip
@JayKimBravekjh
JayKimBravekjh / Tensorflow_Build_GPU.md
Created December 30, 2017 08:27 — forked from smitshilu/Tensorflow_Build_GPU.md
Tensorflow 1.4 Mac OS High Sierra 10.13 GPU Support

Tensorflow

System information

  • OS - High Sierra 10.13
  • Tensorflow - 1.4
  • Xcode command line tools - 8.2 (Download from here: Xcode - Support - Apple Developer & Switch to different clang version: sudo xcode-select --switch/Library/Developer/CommandLineTools & check version: clang -v)
  • Cmake - 3.7
  • Bazel - 0.7.0
@JayKimBravekjh
JayKimBravekjh / remove_stop_words.py
Created January 16, 2018 13:41 — forked from glenbot/remove_stop_words.py
Test various ways of removing stop words in python.
"""
Demonstration of ways to implement this API:
sanitize(user_input, stop_words)
Related discussions:
- Modifying a list while looping over it:
- http://stackoverflow.com/questions/1207406/remove-items-from-a-list-while-iterating-in-python
- Remove all occurences of a value in a list:
- http://stackoverflow.com/questions/1157106/remove-all-occurences-of-a-value-from-a-python-list
@JayKimBravekjh
JayKimBravekjh / remove_stop_words.py
Created January 16, 2018 13:41 — forked from glenbot/remove_stop_words.py
Test various ways of removing stop words in python.
"""
Demonstration of ways to implement this API:
sanitize(user_input, stop_words)
Related discussions:
- Modifying a list while looping over it:
- http://stackoverflow.com/questions/1207406/remove-items-from-a-list-while-iterating-in-python
- Remove all occurences of a value in a list:
- http://stackoverflow.com/questions/1157106/remove-all-occurences-of-a-value-from-a-python-list
# List unique values in a DataFrame column
# h/t @makmanalp for the updated syntax!
df['Column Name'].unique()
# Convert Series datatype to numeric (will error if column has non-numeric values)
# h/t @makmanalp
pd.to_numeric(df['Column Name'])
# Convert Series datatype to numeric, changing non-numeric values to NaN
# h/t @makmanalp for the updated syntax!
# List unique values in a DataFrame column
# h/t @makmanalp for the updated syntax!
df['Column Name'].unique()
# Convert Series datatype to numeric (will error if column has non-numeric values)
# h/t @makmanalp
pd.to_numeric(df['Column Name'])
# Convert Series datatype to numeric, changing non-numeric values to NaN
# h/t @makmanalp for the updated syntax!
import pandas as pd
# Calculate information value
def calc_iv(df, feature, target, pr=0):
lst = []
for i in range(df[feature].nunique()):
val = list(df[feature].unique())[i]
lst.append([feature, val, df[df[feature] == val].count()[feature], df[(df[feature] == val) & (df[target] == 1)].count()[feature]])
import pandas as pd
# Calculate information value
def calc_iv(df, feature, target, pr=0):
lst = []
for i in range(df[feature].nunique()):
val = list(df[feature].unique())[i]
lst.append([feature, val, df[df[feature] == val].count()[feature], df[(df[feature] == val) & (df[target] == 1)].count()[feature]])