Skip to content

Instantly share code, notes, and snippets.

@jamespaultg
jamespaultg / SPSS_to_CSV.R
Created March 17, 2021 14:49
Convert SPSS files to CSV using R
library(foreign)
# Converting one SPSS file
data <- read.spss("File_path_SPSS.sav",
reencode='utf-8',
use.value.labels = FALSE,
to.data.frame = TRUE)
head(data)
View(data)
write.csv2(data,"C:/Users/LYCJPG1/Documents/survey_results.csv")
@jamespaultg
jamespaultg / csv_diff.py
Created December 15, 2020 13:23
python csv diff
pip install csv-diff
from csv_diff import load_csv, compare
diff = compare(
load_csv(open("one.csv"), key="id"),
load_csv(open("two.csv"), key="id")
)
@jamespaultg
jamespaultg / iterrows.py
Created May 20, 2020 13:53
iterrows to process each row of the dataframe
# iterrows functionality for quick reference
for index, row in document.iterrows():
print(index, row['section name'], len(row['tekst']), len(row['sentence_list']))
@jamespaultg
jamespaultg / enumerate.py
Created May 20, 2020 11:18
use of enumerate in loops
# just to check the enumerate functionality on lists
temp_list = ['a','b','c']
for i,val in enumerate(temp_list):
print(i, val)
@jamespaultg
jamespaultg / readDBF.py
Created April 15, 2020 10:41
Read a dbf file in python and convert to a pandas dataframe
from dbfread import DBF
import pandas as pd
dbf = DBF('your_dbf_file.dbf')
frame = pd.DataFrame(iter(dbf))
frame
@jamespaultg
jamespaultg / getworddoc.py
Created April 14, 2020 18:09
Read word document in Python
!pip3 install python-docx -q
import docx2txt
# replace following line with location of your .docx file
wordfile = "your word document.docx"
# get the contents of the word document
def getDocxContent(filename):
doc = docx.Document(filename)
fullText = ""
@jamespaultg
jamespaultg / formattedprint.py
Last active April 14, 2020 15:39
formatted print in python
# https://docs.python.org/3/tutorial/inputoutput.html
#https://www.python-course.eu/python3_formatted_output.php
print(f'section {file_names[i]:70} has length {len(section_text):10d}')
@jamespaultg
jamespaultg / regex_match.py
Created April 14, 2020 11:48
Match a regular expression in a given string and get back the value and its start and end position
import re
p = re.compile("[a-z]")
for m in p.finditer('a1b2c3d4'):
print(m.start(), m.end(), m.group())
@jamespaultg
jamespaultg / gettweets.py
Created March 5, 2020 02:35
Get tweets from twitter using the tweepy api
# Thanks to Kaggle user yassinehamdaoui1
# https://www.kaggle.com/c/nlp-getting-started/discussion/132762
import pandas as pd
import tweepy as tw
consumer_key = "put here you consumer_key"
consumer_secret ="put here your consumer_secret"
access_token = "your access"
access_token_secret ="your access token"
@jamespaultg
jamespaultg / accessS3.R
Created May 6, 2019 12:33
Access AWS S3 objects from R
require("devtools")
install.packages("aws.s3", repos = c("cloudyr" = "http://cloudyr.github.io/drat"))
library("aws.s3")
# set the environment parameters
Sys.setenv("AWS_ACCESS_KEY_ID" = "Key_id",
"AWS_SECRET_ACCESS_KEY" = "secret_access_key",
"AWS_DEFAULT_REGION" = "region")