Skip to content

Instantly share code, notes, and snippets.

@jcrist
Last active May 6, 2019 17:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jcrist/75e81f6792e610f81ffa86b4474a9b1f to your computer and use it in GitHub Desktop.
Save jcrist/75e81f6792e610f81ffa86b4474a9b1f to your computer and use it in GitHub Desktop.
Hadoop Pseudodistributed Skein debugging
Package: *
Pin: release o=Cloudera, l=Cloudera
Pin-Priority: 501
FROM ubuntu:xenial
RUN apt-get update && \
apt-get install -y -q curl bzip2 git && \
rm -rf /var/lib/apt/lists/*
# Install CDH5 in a single node: Pseudo Distributed
# Docs: https://www.cloudera.com/documentation/enterprise/5-6-x/topics/cdh_qs_yarn_pseudo.html
ADD cloudera.pref /etc/apt/preferences.d/cloudera.pref
RUN curl -s https://archive.cloudera.com/cdh5/ubuntu/xenial/amd64/cdh/archive.key | apt-key add - && \
echo 'deb [arch=amd64] http://archive.cloudera.com/cdh5/ubuntu/xenial/amd64/cdh xenial-cdh5 contrib' > /etc/apt/sources.list.d/cloudera.list && \
echo 'deb-src http://archive.cloudera.com/cdh5/ubuntu/xenial/amd64/cdh xenial-cdh5 contrib' >> /etc/apt/sources.list.d/cloudera.list && \
apt-get update && \
apt-get install -y -q sudo openjdk-8-jre-headless hadoop-conf-pseudo && \
sudo -u hdfs hdfs namenode -format -force && \
for x in `cd /etc/init.d ; ls hadoop-hdfs-*` ; do sudo service $x start ; done && \
bash /usr/lib/hadoop/libexec/init-hdfs.sh && \
sudo -u hdfs hdfs dfs -mkdir /user/testuser && \
sudo -u hdfs hdfs dfs -chown testuser /user/testuser && \
rm -rf /var/lib/apt/lists/*
RUN useradd -m testuser
USER testuser
# Install conda & build conda environments:
RUN curl https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -o /tmp/miniconda.sh \
&& /bin/bash /tmp/miniconda.sh -b -p /home/testuser/miniconda \
&& echo 'export PATH="/home/testuser/miniconda/bin:$PATH"' >> /home/testuser/.bashrc \
&& rm /tmp/miniconda.sh
USER root
ADD start.sh /tmp/start.sh
CMD ["bash", "/tmp/start.sh", "-d"]
#!/bin/bash
# Start HDFS
sudo service hadoop-hdfs-namenode start
sudo service hadoop-hdfs-datanode start
sudo service hadoop-yarn-resourcemanager start
sudo service hadoop-yarn-nodemanager start
echo "HDFS Started"
if [[ $1 == "-d" ]]; then
# Running as a daemon
sleep infinity
fi
import time
import skein
# Simulate a dask cluster without dask dependency
spec = skein.ApplicationSpec.from_yaml("""
name: test
services:
scheduler:
instances: 1
resources:
vcores: 1
memory: 256
script: sleep infinity
worker:
instances: 0
resources:
vcores: 1
memory: 256
depends:
- scheduler
script: sleep infinity
""")
client = skein.Client()
app = client.submit_and_connect(spec)
app.scale('worker', 2)
timeout = 60
while timeout >= 0:
n = len(app.get_containers(services=["worker"], states=['RUNNING']))
print("N workers: %d" % n)
if n == 2:
break
time.sleep(2)
timeout -= 2
else:
print("FAILED TO GET WORKERS IN TIME")
app.shutdown()
@jcrist
Copy link
Author

jcrist commented May 6, 2019

To setup, copy everything into same directory. From directory:

# Build image
docker build -t debug .

# Start hadoop daemons
container_id=`docker run -d -v $(pwd):/home/testuser/workdir debug`

# Wait a bit for everything to start up (~30 seconds)

# Install skein
docker exec -u testuser $container_id /home/testuser/miniconda/bin/pip install skein

# Run the test
docker exec -u testuser $container_id /home/testuser/miniconda/bin/python /home/testuser/workdir/test.py

# Stop docker and remove container
docker stop $container_id
docker rm $container_id

On my machine the test outputs:

$ docker exec -u testuser $container_id /home/testuser/miniconda/bin/python /home/testuser/workdir/test.py
19/05/06 17:26:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/05/06 17:26:38 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
19/05/06 17:26:38 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
19/05/06 17:26:39 INFO skein.Driver: Driver started, listening on 34213
19/05/06 17:26:40 INFO skein.Driver: Uploading application resources to hdfs://localhost:8020/user/testuser/.skein/application_1557163543973_0001
19/05/06 17:26:42 INFO skein.Driver: Submitting application...
19/05/06 17:26:43 INFO impl.YarnClientImpl: Submitted application application_1557163543973_0001
/home/testuser/miniconda/lib/python3.7/site-packages/skein/exceptions.py:76: UserWarning: Skein global security credentials not found, writing now to '/home/testuser/.skein'.
  warnings.warn(msg)
N workers: 0
N workers: 1
N workers: 2
19/05/06 17:26:55 INFO skein.Driver: Driver shut down

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment