Skip to content

Instantly share code, notes, and snippets.

View romain9292's full-sized avatar
:shipit:
Happy

Romain Granger romain9292

:shipit:
Happy
View GitHub Profile
@romain9292
romain9292 / concat.R
Last active November 4, 2019 09:42
[Concatener plusieurs fichiers textes en un dataframe ] En applicant un pattern matching sur l'extension .txt #dataframe #text #R #bind
#Une librairie pour la data manipulation (Jointures,Subsetting)
library(data.table)
#Créer une liste de tous les fichiers .txt avec les noms
filelist = list.files(pattern = ".*.txt")
#Dans la liste des fichiers, on va charger comme data frame tous les fichiers .txt
data_list = lapply(filelist, read.table, sep = ",")
#Creation du dataframe, en assumant que les headers sont les mêmes
@romain9292
romain9292 / dedup.R
Created September 12, 2017 16:34
Deduplicated xlsx rows on dataFrame two ways
library(readxl)
library(digest)
library(data.table)
library(rJava)
#Data path
dataPath <-"/Users/romain/Desktop/your_file.xlsx"
#Load data with read
@romain9292
romain9292 / compare_two_dataframe.R
Last active November 1, 2019 11:05
[Comparer deux dataframes avec R] Trouver les lignes manquants dans deux datasets en utilisant SQLDF #R #datascience #data #compare
library(sqldf)
# Chargement de nos deux datasets
df_1 <- read.csv2('/Users/romain/Downloads/df_1.csv',sep=',')
df_2 <- read.csv2('/Users/romain/Desktop/df_2.csv',sep=',')
# On isole la colonne qu'on souhaite comparer
df_1 <- as.data.frame(df_1$col_name)
df_2 <- as.data.frame(df_2$col_name)
@romain9292
romain9292 / clean_text.R
Last active November 1, 2019 15:16
[Nettoyer un texte avec R] Supprimer les sauts de ligne, balises HTML, espaces et plus #R #text #clean #datacleansing
clean_text <- function(text){
#Retrait du saut de ligne \n
text <- gsub("\n"," ",text)
#Retrait des URLs
text <- gsub('http\\S+\\s*',"",text)
#Retrait des espaces en trop
text <- gsub("\\s+"," ",text)
@romain9292
romain9292 / mysql_to_bq.py
Last active June 22, 2023 14:31
[Mysql to BigQuery using Pandas] Load Mysql tables to BigQuery using pandas to auto-generate schema #python #pandas #bigquery
import os
import pandas as pd
import pandas_gbq as pd_gbq
import modin.pandas as pd_mod
from sqlalchemy import create_engine
from google.oauth2 import service_account
# Service account file for GCP connection
credentials = service_account.Credentials.from_service_account_file('key.json')
@romain9292
romain9292 / growth_rate.sql
Last active December 2, 2020 17:21
[Growth rate using SQL in BigQuery] Computing a month over month revenue growth rate #SQL #BigQuery
WITH
revenue_over_month AS (
SELECT
DATE_TRUNC(PARSE_DATE('%Y%m%d',
date),MONTH) AS months,
ROUND(SUM(totals.totalTransactionRevenue)/10e+6,2) AS revenue,
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_20*`
GROUP BY
1)
@romain9292
romain9292 / growth_rate_1.sql
Last active December 16, 2020 10:38
[Growth rate using SQL in BigQuery - Part 1] Computing a month over month revenue growth rate #SQL #BigQuery
SELECT
DATE_TRUNC(PARSE_DATE('%Y%m%d',
date),MONTH) AS months,
ROUND(SUM(totals.totalTransactionRevenue)/10e+6,2) AS revenue,
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_20*`
GROUP BY
1
ORDER BY
1
@romain9292
romain9292 / growth_rate_2.sql
Created December 16, 2020 10:41
[Growth rate using SQL in BigQuery - Part 2] Computing a month over month revenue growth rate #SQL #BigQuery
WITH
revenue_over_month AS (
SELECT
DATE_TRUNC(PARSE_DATE('%Y%m%d',
date),MONTH) AS months,
ROUND(SUM(totals.totalTransactionRevenue)/10e+6,2) AS revenue,
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_20*`
GROUP BY
1)
@romain9292
romain9292 / growth_rate_3.sql
Created December 16, 2020 10:43
[Growth rate using SQL in BigQuery - Part 3] Computing a month over month revenue growth rate #SQL #BigQuery
WITH
revenue_over_month AS (
SELECT
DATE_TRUNC(PARSE_DATE('%Y%m%d',
date),MONTH) AS months,
ROUND(SUM(totals.totalTransactionRevenue)/10e+6,2) AS revenue,
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_20*`
GROUP BY
1)
@romain9292
romain9292 / transactions_per_sessions.sql
Created January 25, 2021 16:43
[Leverage the power of variables in BigQuery using SQL - Part 1] Defining a single value #SQL #BigQuery
SELECT
fullVisitorId,
totals.transactions
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_20170801`
WHERE totals.transactions IS NOT NULL