Skip to content

Instantly share code, notes, and snippets.

View GabrielSGoncalves's full-sized avatar
🏀
Data Engineer @ Big Data

GabrielSGoncalves GabrielSGoncalves

🏀
Data Engineer @ Big Data
View GitHub Profile
@GabrielSGoncalves
GabrielSGoncalves / lambda_function.py
Last active May 8, 2022 20:16
Amazon Lambda function used on medium article
import json
from io import StringIO
import boto3
import os
import pandas as pd
def write_dataframe_to_csv_on_s3(dataframe, filename, bucket):
""" Write a dataframe to a CSV on S3 """
@GabrielSGoncalves
GabrielSGoncalves / invoke_lambda.py
Last active May 8, 2022 20:16
Python script to invoke Amazon Lambda using boto3
import boto3
import json
import sys
BUCKET = sys.argv[1]
KEY = sys.argv[2]
OUTPUT = sys.argv[3]
GROUP = sys.argv[4]
COLUMN = sys.argv[5]
CREDENTIALS = sys.argv[6]
@GabrielSGoncalves
GabrielSGoncalves / fifa19_kaggle.csv
Last active August 6, 2019 16:27
CSV file with player information from Fifa10 provided by Kaggle (https://www.kaggle.com/karangadiya/fifa19)
We can make this file beautiful and searchable if this error is corrected: It looks like row 2 should actually have 89 columns, instead of 35. in line 1.
,ID,Name,Age,Photo,Nationality,Flag,Overall,Potential,Club,Club Logo,Value,Wage,Special,Preferred Foot,International Reputation,Weak Foot,Skill Moves,Work Rate,Body Type,Real Face,Position,Jersey Number,Joined,Loaned From,Contract Valid Until,Height,Weight,LS,ST,RS,LW,LF,CF,RF,RW,LAM,CAM,RAM,LM,LCM,CM,RCM,RM,LWB,LDM,CDM,RDM,RWB,LB,LCB,CB,RCB,RB,Crossing,Finishing,HeadingAccuracy,ShortPassing,Volleys,Dribbling,Curve,FKAccuracy,LongPassing,BallControl,Acceleration,SprintSpeed,Agility,Reactions,Balance,ShotPower,Jumping,Stamina,Strength,LongShots,Aggression,Interceptions,Positioning,Vision,Penalties,Composure,Marking,StandingTackle,SlidingTackle,GKDiving,GKHandling,GKKicking,GKPositioning,GKReflexes,Release Clause
0,158023,L. Messi,31,https://cdn.sofifa.org/players/4/19/158023.png,Argentina,https://cdn.sofifa.org/flags/52.png,94,94,FC Barcelona,https://cdn.sofifa.org/teams/2/light/241.png,€110.5M,€565K,2202,Left,5,4,4,Medium/ Medium,Messi,Yes,RF,10,"Jul 1, 2004",,2021,5'7,159lbs,88+2,88+2,88+2,92+2,93+2,93+2,93+
@GabrielSGoncalves
GabrielSGoncalves / fifa19_output.csv
Last active August 6, 2019 16:25
Resulted CSV file from group by analysis using Pandas
Club Overall
Juventus 82.28
Napoli 80.0
Inter 79.75
Real Madrid 78.24
Milan 78.07
FC Barcelona 78.03
Paris Saint-Germain 77.43
Roma 77.42
Manchester United 77.24
@GabrielSGoncalves
GabrielSGoncalves / nlp_aws_medium_part1.py
Last active September 24, 2019 14:29
First part of the NLP analysis for the Medium article on AWS ML/AI tools
from __future__ import print_function
import boto3
import os
import time
import pandas as pd
import matplotlib as plt
import logging
from botocore.exceptions import ClientError
from datetime import date
import json
@GabrielSGoncalves
GabrielSGoncalves / nlp_aws_medium_part2.py
Last active September 18, 2019 16:41
Second part of the NLP analysis for the Medium article on AWS ML/AI tools
# 5) Creating a new S3 bucket to upload the audio files
bucket_name = 'medium-nlp-aws'
client_s3 = boto3.client('s3')
client_s3.create_bucket(Bucket=bucket_name)
# 6) Uploading the files to the created bucket
for audio_file in df_audio.filename.values:
print(audio_file)
client_s3.upload_file(audio_file, bucket_name, audio_file)
@GabrielSGoncalves
GabrielSGoncalves / nlp_aws_medium_part3.py
Last active September 24, 2019 14:31
Third part of the NLP analysis for the Medium article on AWS ML/AI tools
# 10) Function to get text from the JSON file generated using Amazon Transcribe
def get_text_from_json(bucket, key):
s3 = boto3.client('s3')
object = s3.get_object(Bucket=bucket, Key=key)
serializedObject = object['Body'].read()
data = json.loads(serializedObject)
return data.get('results').get('transcripts')[0].get('transcript')
# 11) Reading the original transcription from the JSON file
with open('original_transcripts.json', 'r') as f:
@GabrielSGoncalves
GabrielSGoncalves / nlp_aws_medium_part5.py
Last active September 24, 2019 16:26
Fifth part of the NLP analysis for the Medium article on AWS ML/AI tools for NLP.
# 16) Function to call Amazon Comprehend service using boto3
def start_comprehend_job(text):
"""
Executes sentiment analysis of a text using Amazon Comprehend.
The text can be larger than 5000 bytes (one limitation for each job), as
the function will split it into multiple processes and return a
averaged value for each sentiment.
Parameter
- text (str): The text to be analyzed
@GabrielSGoncalves
GabrielSGoncalves / nlp_aws_medium_part4.py
Created September 24, 2019 14:48
Fourth part of the NLP analysis for the Medium article on AWS ML/AI tools for NLP.
# 15) Iterate over the speakers and apply spaCy visualizer on each speech
for index, row in df_audio.iterrows():
print(f"Rendering {index}'s texts")
nlp = spacy.load('en_core_web_lg')
original_transcription = nlp(original_transcriptions.get(index))
transcribe_transcription = nlp(get_text_from_json(bucket_name, row.json_transcription))
svg_original = spacy.displacy.render(original_transcription, style="ent",jupyter=False)
svg_transcribe = spacy.displacy.render(transcribe_transcription, style="ent",jupyter=False)
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.