Watched this video: https://www.youtube.com/watch?v=W_Sp4jo1ACg
Installed Superset locallly using these instructions, slightly tweaked for use with pipenv.
cd ~/learn
mkdir superset
cd superset
pipenv --three
pipenv install superset
#!/bin/bash | |
# Get start time in seconds since the epoch | |
start=$(date +%s) | |
# Run chatblade | |
output=$(chatblade -c 4 write a poem) | |
# Get end time in seconds since the epoch | |
end=$(date +%s) |
cachetools==4.1.1 | |
certifi==2020.6.20 | |
chardet==3.0.4 | |
google-api-core==1.21.0 | |
google-auth==1.19.2 | |
google-auth-oauthlib==0.4.1 | |
google-cloud-bigquery==1.25.0 | |
google-cloud-core==1.3.0 | |
google-cloud-language==1.3.0 | |
google-resumable-media==0.5.1 |
Watched this video: https://www.youtube.com/watch?v=W_Sp4jo1ACg
Installed Superset locallly using these instructions, slightly tweaked for use with pipenv.
cd ~/learn
mkdir superset
cd superset
pipenv --three
pipenv install superset
I had a really interesting journey today with a thorny little challenge I had while trying to delete all the files in a s3 bucket with tons of nested files.
The bucket path (s3://buffer-data/emr/logs/
) contained log files created by ElasticMapReduce jobs that ran every day over a couple of years (from early 2015 to early 2018).
Each EMR job would run hourly every day, firing up a cluster of machines and each machine would output it's logs. That resulted thousands of nested paths (one for each job) that contained thousands of other files. I estimated that the total number of nested files would be between 5-10 million.
I had to estimate this number by looking at samples counts of some of the nested directories, because getting the true count would mean having to recurse through the whole s3 tree which was just too slow. This is also exactly why it was challenging to delete all the files.
Deleting all the files in a s3 object like this is pretty challenging, since s3 doesn't really work like a true f
import pandas as pd | |
from rsdf import redshift | |
table = """ | |
<table border="1"> | |
<thead valign="bottom"><tr class="row-odd"><th class="head">Code</th> | |
<th class="head">Text</th> | |
<th class="head">Description</th> | |
</tr></thead><tbody valign="top"><tr><td>3</td> | |
<td>Invalid coordinates.<br> |
FROM python:3.6 | |
ENV GRPC_PYTHON_VERSION 1.4.0 | |
RUN python -m pip install --upgrade pip | |
RUN pip install grpcio==${GRPC_PYTHON_VERSION} grpcio-tools==${GRPC_PYTHON_VERSION} | |
COPY requirements.txt /tmp/requirements.txt | |
RUN pip install --no-cache-dir -r /tmp/requirements.txt | |
WORKDIR /usr/src/app |
2017-07-05 20:50:38.320 +0000 [ERROR|6e766|] :: LookerSDK::InternalServerError : An error has occurred. | |
uri:classloader:/bundler/gems/looker-sdk-ruby-595320e261c6/lib/looker-sdk/response/raise_error.rb:15:in `on_complete' | |
uri:classloader:/gems/faraday-0.9.0/lib/faraday/response.rb:9:in `block in call' | |
uri:classloader:/gems/faraday-0.9.0/lib/faraday/response.rb:57:in `on_complete' | |
uri:classloader:/gems/faraday-0.9.0/lib/faraday/response.rb:8:in `call' | |
uri:classloader:/gems/faraday-0.9.0/lib/faraday/rack_builder.rb:139:in `build_response' | |
uri:classloader:/gems/faraday-0.9.0/lib/faraday/connection.rb:377:in `run_request' | |
uri:classloader:/gems/faraday-0.9.0/lib/faraday/connection.rb:140:in `delete' | |
uri:classloader:/gems/sawyer-0.6.0/lib/sawyer/agent.rb:94:in `call' | |
uri:classloader:/bundler/gems/looker-sdk-ruby-595320e261c6/lib/looker-sdk/client.rb:256:in `request' |