Skip to content

Instantly share code, notes, and snippets.

View ftfarias's full-sized avatar
🎯
Focusing

Felipe Farias ftfarias

🎯
Focusing
  • Data Lead at Alice
  • São Paulo, Brazil
View GitHub Profile
# from tqdm import tqdm
import csv
with open('source.csv', 'r', encoding='utf-8', errors='replace') as input_file:
# protects from "null" bytes
input_file = (l.replace('\0' ,'') for l in input_file)
input_csv = csv.reader(input_file, delimiter=';', quotechar='"')
# remove header if necessary
header = next(input_csv)
import re
DATA = "Hey, you - what are you doing here!?"
print re.findall(r"[\w']+", DATA)
# Prints ['Hey', 'you', 'what', 'are', 'you', 'doing', 'here']
>>> re.split('\W+', 'Words, words, words.')
['Words', 'words', 'words', '']
>>> re.split('(\W+)', 'Words, words, words.')
@ftfarias
ftfarias / README.md
Created January 24, 2017 23:10 — forked from datagrok/README.md
Circular imports in Python 2 and Python 3: when are they fatal? When do they work?

When are Python circular imports fatal?

In your Python package, you have:

  • an __init__.py that designates this as a Python package
  • a module_a.py, containing a function action_a() that references an attribute (like a function or variable) in module_b.py, and
  • a module_b.py, containing a function action_b() that references an attribute (like a function or variable) in module_a.py.

This situation can introduce a circular import error: module_a attempts to import module_b, but can't, because module_b needs to import module_a, which is in the process of being interpreted.

But, sometimes Python is magic, and code that looks like it should cause this circular import error works just fine!

import re
DOUBLE_SPACES_REMOVER = re.compile(r'[ ]+')
def remove_double_spaces_re(text):
return DOUBLE_SPACES_REMOVER.sub(' ',text)
%timeit remove_double_spaces_re('ads sadf scbvcxb ret h fdgh jj gh erty ')
4.3 µs ± 62.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
############################
@ftfarias
ftfarias / data_cleaning.py
Last active April 7, 2020 17:59
Data cleaning
import csv
import re
EMAIL_REGEXP = '^[_A-Za-z0-9-+]+(\.[_A-Za-z0-9-]+)*@[A-Za-z0-]+(\.[A-Za-z0-9]+)*(\.[A-Za-z]{2,3})$'
#from tqdm import tqdm_notebook as tqdm
import gensim
import collections
import nltk
@ftfarias
ftfarias / import_request_zipped.py
Created November 3, 2017 19:38
Read from zipped file in internet
import requests
from io import BytesIO
from zipfile import ZipFile
# Download the dataset
dk = requests.get('http://www.ssfpack.com/files/DK-data.zip').content
f = BytesIO(dk)
zipped = ZipFile(f)
df = pd.read_table(
BytesIO(zipped.read('internet.dat')),
@ftfarias
ftfarias / nlp_portugues.py
Created December 5, 2017 15:49
NLP Portugues
# Combinações e contrações do português
https://pt.wiktionary.org/wiki/Ap%C3%AAndice:Adv%C3%A9rbios_do_portugu%C3%AAs
https://pt.wiktionary.org/wiki/Ap%C3%AAndice:Gent%C3%ADlicos_e_top%C3%B3nimos_em_portugu%C3%AAs
CONTRACOES = [
# Com a preposição "com" + Artigos definidos
(('com','um'),'cum'),
# A preposição "de" + Artigos definidos
(('de','o'),'do'),
(('de','a'),'da'),
@ftfarias
ftfarias / basic.py
Last active August 4, 2022 13:52
basic general all-purpose python file template
"""
{USAGE}
python thisfile.py param1
"""
import sys
import os
import pprint as pp
import argparse
import logging
import traceback
@ftfarias
ftfarias / readS3CSV.txt
Created May 22, 2018 19:09
How to read files from Amazon AWS S3 line by line
import boto3
import argparse
import elasticsearch
from io import TextIOWrapper
from gzip import GzipFile
import csv
fact_key = "/2018/05/15/mycsv_files"
BUCKET = 'csv_data'
print(f'Reading files at {fact_key}')
@ftfarias
ftfarias / Linux_useful_commands.txt
Last active October 11, 2018 16:55
Linux useful commands
watch -n 2 'ip address'
# Du para diretórios
https://dev.yorhel.nl/ncdu
Freeing disk space on your Linux server
- Get to the root of your machine by running cd /
- Run sudo du -h --max-depth=1.
- Note which directories are using a lot of disk space.