Skip to content

Instantly share code, notes, and snippets.

@BHSPitMonkey
Last active August 29, 2015 13:56
Show Gist options
  • Save BHSPitMonkey/9081509 to your computer and use it in GitHub Desktop.
Save BHSPitMonkey/9081509 to your computer and use it in GitHub Desktop.
A Dockerfile for DataStage (https://github.com/dataflow/DataStage). Now supports DataStage 0.4.0.
# DataStage
# for Docker 0.7 or newer
#
# This Dockerfile builds an image that runs DataStage.
#
# This performs a manual installation of DataStage from GitHub using a
# modified version of the install script found here:
# https://oxfile.ox.ac.uk/oxfile/work/extBoxNoJs?execution=e2s1
#
# How to use (assuming Docker is installed):
#
# # This builds the Docker image. Wait for the command to finish.
# sudo docker build -t datastage .
#
# # This creates a container based on the image and runs it in daemon mode,
# # binding http and ssh to ports 8888 and 2222 on the host machine.
# # The container name will be 'ds'. Change as desired.
# sudo docker run -d -p 8888:80 -p 2222:22 -v /srv/datastage:/srv/datastage:rw -v /tmp:/tmp:rw -name ds datastage
#
# # You should be able to SSH into the container now (pass: badpassword):
# ssh -p 2222 root@localhost
#
# # Once you're in, you should change the root password:
# passwd
#
# # And then use datastage-config to set up users, etc:
# datastage-config
#
# # You may need to stop/start the container for all changes to take effect.
#
# If at any point the container becomes stopped, you can start it again using:
#
# sudo docker start ds
#
# To stop the container, use:
#
# sudo docker stop ds
#
# Note that any changes you made to the database will not persist the next
# time you use "docker run" with this image. (See docker.io for docs).
#
# VERSION 0.2
FROM ubuntu:12.04
# Install prerequisites
RUN echo "debconf debconf/frontend select Teletype" | debconf-set-selections
RUN echo "deb http://archive.ubuntu.com/ubuntu precise main universe" > /etc/apt/sources.list
RUN apt-get update
RUN apt-get install -y postgresql-8.4 openssh-server sudo git net-tools supervisor python-setuptools python-all build-essential debhelper
RUN mkdir -p /var/run/sshd
RUN mkdir -p /var/log/supervisor
# Run installation scripts
ADD install-deps.sh /tmp/
RUN bash /tmp/install-deps.sh
ADD install-datastage.sh /tmp/
RUN bash /tmp/install-datastage.sh
# Configure datastage since it would have failed the first time
RUN /etc/init.d/postgresql start && dpkg-reconfigure dataflow-datastage
# Installation cleanup
RUN apt-get clean
# Set root password
RUN echo "root:badpassword" | chpasswd
# Remove default apache site so the user doesn't have to
RUN rm /etc/apache2/sites-enabled/000-default
# Inject the supervisor config
ADD supervisord.conf /etc/supervisor/conf.d/supervisord.conf
# Specify what to do when starting this image
CMD /etc/init.d/postgresql start && /usr/bin/supervisord
# Expose the ports for SSH and HTTP to outside the container
EXPOSE 22 80
# Define external mount points
VOLUME ["/srv/datastage", "/tmp"]
# A little documentation for the end user
RUN echo -e "\nRun datastage-config to set up DataStage." >> /etc/motd
#!/bin/bash
set -e
TMP_DIR=/tmp/datastage_install
mkdir -p $TMP_DIR
cd $TMP_DIR
git clone https://github.com/dataflow/DataStage.git
cd DataStage
# Remove bad dependencies
echo "***** Removing bad dependencies from control file *****"
for NAME in django-conneg django-longliving django-pam sword-client python-libmount; do
sed -i "/${NAME}/d" debian/control
done
make debian-package
dpkg --force-all -i debian-package/*.deb
cd
rm -rf $TMP_DIR
apt-get -y -f install
#!/bin/bash
set -e
TMP_DIR=/tmp/datastage_install
mkdir -p $TMP_DIR
cd $TMP_DIR
git clone https://github.com/ox-it/python-libmount.git
cd python-libmount
python setup.py install
cd ..
git clone https://github.com/ox-it/django-conneg.git
cd django-conneg
python setup.py install
cd ..
git clone https://github.com/ox-it/django-longliving.git
cd django-longliving
python setup.py install
cd ..
git clone https://github.com/dataflow/python-client-sword2.git
cd python-client-sword2
python setup.py install
cd ..
git clone https://github.com/tehmaze/django-pam.git
cd django-pam
python setup.py install
cd ..
cd
rm -rf $TMP_DIR
[supervisord]
nodaemon=true
[program:sshd]
command=/usr/sbin/sshd -D
[program:apache2]
command=/bin/bash -c "source /etc/apache2/envvars && exec /usr/sbin/apache2 -DFOREGROUND"
[program:datastage]
command=/bin/bash -c "rm -f /var/run/datastage.pid && datastage-server start --no-daemonize"
@cffgan
Copy link

cffgan commented Nov 12, 2014

Apologies, just realised this Dockerfile was tested only on EC2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment