Skip to content

Instantly share code, notes, and snippets.

@joshuarobinson
joshuarobinson / SimpleDownloader.ipynb
Last active April 25, 2019 10:15
Trivial example to illustrate how to use Spark to parallelize URL downloads.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@joshuarobinson
joshuarobinson / DownloadImagenet.ipynb
Last active October 26, 2022 03:05
Working PySpark notebook to retrieve imagenet URL list and parallelize downloads.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
FROM openjdk:8-slim
RUN apt-get update && apt-get install -y curl python --no-install-recommends \
&& rm -rf /var/lib/apt/lists/*
# Download and extract the Presto package.
ARG PRESTO_VER=0.221
RUN curl https://repo1.maven.org/maven2/com/facebook/presto/presto-server/$PRESTO_VER/presto-server-$PRESTO_VER.tar.gz \
| tar xvz -C /opt/ \
&& ln -s /opt/presto-server-$PRESTO_VER /opt/presto-server \
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
FROM openjdk:8-slim
ARG HADOOP_VERSION=3.2.0
RUN apt-get update && apt-get install -y curl --no-install-recommends \
&& rm -rf /var/lib/apt/lists/*
# Download and extract the Hadoop binary package.
RUN curl https://archive.apache.org/dist/hadoop/core/hadoop-$HADOOP_VERSION/hadoop-$HADOOP_VERSION.tar.gz \
| tar xvz -C /opt/ \
<configuration>
<property>
<name>metastore.thrift.uris</name>
<value>thrift://10.62.205.205:9083</value>
</property>
<property>
<name>metastore.task.threads.always</name>
<value>org.apache.hadoop.hive.metastore.events.EventCleanerTask</value>
</property>
<property>
@joshuarobinson
joshuarobinson / purewatch.yaml
Last active June 16, 2020 23:37
Example Prometheus+Grafana Standalone with PureExporter
---
apiVersion: v1
kind: Service
metadata:
name: purewatch
labels:
app: purewatch
spec:
clusterIP: None
ports:
{
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": "-- Grafana --",
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
apiVersion: v1
kind: Service
metadata:
name: confluent
labels:
app: confluent
spec:
clusterIP: None
ports:
- name: kafka-port