Skip to content

Instantly share code, notes, and snippets.

View khunreus's full-sized avatar

Anastasia Reusova khunreus

View GitHub Profile
@khunreus
khunreus / bq_standard_page_views_agg.sql
Created August 4, 2022 17:33
This query returns an ordered table of pages from Google Merchandise Store GA4 dataset. Pages are ordered from most to least viewed.
-- pulling user page views from GA4 events
WITH base_table AS (
SELECT
event_name,
event_date,
event_timestamp,
user_pseudo_id,
user_id,
device,
geo,
@khunreus
khunreus / bq_standard_page_power_user_agg.sql
Created August 3, 2022 18:51
this query performs aggregations to find Power Users on GA4 sample data from Google Merchandise Score
-- pulling user page views from GA4 events
WITH base_table AS (
SELECT
event_name,
event_date,
event_timestamp,
user_pseudo_id,
user_id,
device,
geo,
@khunreus
khunreus / bq_standard_page_views_standard_unnest_categorised.sql
Last active August 2, 2022 16:31
The query pulls page views on Google Merchandise Store from the GA4 public datasource. Pages are categorised as PLP, PDPs and the rest of pages to understand User Journeys without going into too much detail.
-- pulling user page views from GA4 events
WITH base_table AS (
-- pulls relevant columns from relevant dates to decrease the size of data scanned
SELECT
event_name,
event_date,
event_timestamp,
user_pseudo_id,
user_id,
device,
@khunreus
khunreus / bq_standard_page_title_mapping.sql
Created August 2, 2022 16:15
Mapping Google Merchandise Store GA4 page titles to categories such as PLP, PDP and rest
WITH base_table AS (
SELECT
event_name,
event_date,
event_timestamp,
user_pseudo_id,
user_id,
device,
geo,
traffic_source,
@khunreus
khunreus / bq_standard_page_views_standard_unnest.sql
Created July 30, 2022 10:29
The query pulls page_view event data from the Google BigQuery public dataset for ecommerce (GA4). Using standard UNNEST practice.
-- pulling user page views from GA4 events
WITH base_table AS (
SELECT
event_name,
event_date,
event_timestamp,
user_pseudo_id,
user_id,
device,
geo,
@khunreus
khunreus / bq_standard_page_views.sql
Created July 28, 2022 05:53
The query pulls page_view event data from the Google BigQuery public dataset for ecommerce (GA4). The implementation goes around using traditional UNNEST approach as it can cause missing rows when a CROSS JOIN is applied to a table with NULL values.
-- the query pulls page_view event data from the Google BigQuery public dataset for ecommerce (GA4)
-- the implementation with array aggregation is addressing an issue raised here https://stackoverflow.com/questions/44918108/google-bigquery-i-lost-null-row-when-using-unnest-function
-- stackoverflow thread is exploring a different solution
WITH base_table AS (
-- pulls relevant columns from relevant dates to decrease the size of data scanned
SELECT
event_name,
event_date,
event_timestamp,
"""
python3.6
writes a csv to MySQL database
"""
mydb = mysql.connector.connect(host = 'localhost',
user = 'root',
passwd = '****',
db = '****')
cursor = mydb.cursor()
"""
python 3.6
Scrapy + Selenium
"""
scrapy_selector = Selector(text = self.driver.page_source)
homes_selector = scrapy_selector.xpath('//*[@itemtype="http://schema.org/ListItem"]')
self.logger.info('Theres a total of ' + str(len(homes_selector)) + ' links.')
profile_urls_distinct = []
try:
s = 0
#reviews_dict = {}
reviews_list = []
sleep(2)
try:
sleep(4)
reviews_button = self.driver.find_element_by_xpath('//*[@class="_ff6jfq"]')
reviews_button.click()
k = 0
profile_scrapy_selector_1 = Selector(text = self.driver.page_source)
# reviewers = profile_scrapy_selector_1.xpath('//*[@id = "reviews"]//section/div[2]//*[@class="_hgs47m"]/div[2]/div[1]/div/div/text()').extract()
"""
python3.6
Scrapy + Selenium
"""
for profile_url in profile_urls_distinct:
self.logger.info('Home #' + str(q))
self.driver.get(profile_url)
q = q+1
sleep(10)
link_to_home = profile_url