Skip to content

Instantly share code, notes, and snippets.

$ python data.py --nrows 1000
@ADGEfficiency
ADGEfficiency / pd_subset.py
Last active December 4, 2020 09:07
Mistakes Data Scientists Make
data = pd.read_csv('data.csv', nrows=1000)
nrows = 1000
data = pd.read_csv('data.csv', nrows=nrows)
@ADGEfficiency
ADGEfficiency / home.py
Last active December 4, 2020 08:29
Mistakes Data Scientists Make
import os
home = os.environ['HOME']
path = os.path.join(home, 'adg'))
os.makedirs(path, exist_ok=True)
np.save(path, data)
@ADGEfficiency
ADGEfficiency / cli.py
Last active December 4, 2020 09:04
Mistakes Data Scientists Make
# data.py
parser.add_argument('--nrows', nargs='?')
args = parser.parse_args()
data = pd.read_csv('data.csv', nrows=args.nrows)
print(f'loaded {data.shape[0]} rows')
@ADGEfficiency
ADGEfficiency / int_index.py
Created December 19, 2019 05:14
Mistakes Data Scientists Make
data = data[:1000]
@ADGEfficiency
ADGEfficiency / normalizer.py
Created December 19, 2019 05:13
Mistakes Data Scientists Make
normalized = (data - np.min(data)) / (np.max(data) - np.min(data))
@ADGEfficiency
ADGEfficiency / standardizer.py
Last active December 19, 2019 05:17
Mistakes Data Scientists Make
standardized = (data - np.mean(data)) / np.std(data)
@ADGEfficiency
ADGEfficiency / standardizer.py
Created December 19, 2019 05:12
Mistakes Data Scientists Make
standardized = (data - np.mean(data)) / np.std(data)
standardized = (data - np.mean(data)) / np.std(data)