Skip to content

Instantly share code, notes, and snippets.

@BHSPitMonkey
Last active August 29, 2015 13:56
Show Gist options
  • Save BHSPitMonkey/9081509 to your computer and use it in GitHub Desktop.
Save BHSPitMonkey/9081509 to your computer and use it in GitHub Desktop.
A Dockerfile for DataStage (https://github.com/dataflow/DataStage). Now supports DataStage 0.4.0.
# DataStage
# for Docker 0.7 or newer
#
# This Dockerfile builds an image that runs DataStage.
#
# This performs a manual installation of DataStage from GitHub using a
# modified version of the install script found here:
# https://oxfile.ox.ac.uk/oxfile/work/extBoxNoJs?execution=e2s1
#
# How to use (assuming Docker is installed):
#
# # This builds the Docker image. Wait for the command to finish.
# sudo docker build -t datastage .
#
# # This creates a container based on the image and runs it in daemon mode,
# # binding http and ssh to ports 8888 and 2222 on the host machine.
# # The container name will be 'ds'. Change as desired.
# sudo docker run -d -p 8888:80 -p 2222:22 -v /srv/datastage:/srv/datastage:rw -v /tmp:/tmp:rw -name ds datastage
#
# # You should be able to SSH into the container now (pass: badpassword):
# ssh -p 2222 root@localhost
#
# # Once you're in, you should change the root password:
# passwd
#
# # And then use datastage-config to set up users, etc:
# datastage-config
#
# # You may need to stop/start the container for all changes to take effect.
#
# If at any point the container becomes stopped, you can start it again using:
#
# sudo docker start ds
#
# To stop the container, use:
#
# sudo docker stop ds
#
# Note that any changes you made to the database will not persist the next
# time you use "docker run" with this image. (See docker.io for docs).
#
# VERSION 0.2
FROM ubuntu:12.04
# Install prerequisites
RUN echo "debconf debconf/frontend select Teletype" | debconf-set-selections
RUN echo "deb http://archive.ubuntu.com/ubuntu precise main universe" > /etc/apt/sources.list
RUN apt-get update
RUN apt-get install -y postgresql-8.4 openssh-server sudo git net-tools supervisor python-setuptools python-all build-essential debhelper
RUN mkdir -p /var/run/sshd
RUN mkdir -p /var/log/supervisor
# Run installation scripts
ADD install-deps.sh /tmp/
RUN bash /tmp/install-deps.sh
ADD install-datastage.sh /tmp/
RUN bash /tmp/install-datastage.sh
# Configure datastage since it would have failed the first time
RUN /etc/init.d/postgresql start && dpkg-reconfigure dataflow-datastage
# Installation cleanup
RUN apt-get clean
# Set root password
RUN echo "root:badpassword" | chpasswd
# Remove default apache site so the user doesn't have to
RUN rm /etc/apache2/sites-enabled/000-default
# Inject the supervisor config
ADD supervisord.conf /etc/supervisor/conf.d/supervisord.conf
# Specify what to do when starting this image
CMD /etc/init.d/postgresql start && /usr/bin/supervisord
# Expose the ports for SSH and HTTP to outside the container
EXPOSE 22 80
# Define external mount points
VOLUME ["/srv/datastage", "/tmp"]
# A little documentation for the end user
RUN echo -e "\nRun datastage-config to set up DataStage." >> /etc/motd
#!/bin/bash
set -e
TMP_DIR=/tmp/datastage_install
mkdir -p $TMP_DIR
cd $TMP_DIR
git clone https://github.com/dataflow/DataStage.git
cd DataStage
# Remove bad dependencies
echo "***** Removing bad dependencies from control file *****"
for NAME in django-conneg django-longliving django-pam sword-client python-libmount; do
sed -i "/${NAME}/d" debian/control
done
make debian-package
dpkg --force-all -i debian-package/*.deb
cd
rm -rf $TMP_DIR
apt-get -y -f install
#!/bin/bash
set -e
TMP_DIR=/tmp/datastage_install
mkdir -p $TMP_DIR
cd $TMP_DIR
git clone https://github.com/ox-it/python-libmount.git
cd python-libmount
python setup.py install
cd ..
git clone https://github.com/ox-it/django-conneg.git
cd django-conneg
python setup.py install
cd ..
git clone https://github.com/ox-it/django-longliving.git
cd django-longliving
python setup.py install
cd ..
git clone https://github.com/dataflow/python-client-sword2.git
cd python-client-sword2
python setup.py install
cd ..
git clone https://github.com/tehmaze/django-pam.git
cd django-pam
python setup.py install
cd ..
cd
rm -rf $TMP_DIR
[supervisord]
nodaemon=true
[program:sshd]
command=/usr/sbin/sshd -D
[program:apache2]
command=/bin/bash -c "source /etc/apache2/envvars && exec /usr/sbin/apache2 -DFOREGROUND"
[program:datastage]
command=/bin/bash -c "rm -f /var/run/datastage.pid && datastage-server start --no-daemonize"
@cffgan
Copy link

cffgan commented Nov 11, 2014

Hi

I'm trying to install DataStage onto a clean VirtualBox system: Ubuntu 12.04 LTS with Linux kernel upgraded from 3.2 (tag: precise) to 3.8 (tag: raring) as suggested by Docker (see Docker installation instructions for Ubuntu 12.04 LTS -> http://docs.docker.com/installation/ubuntulinux/#ubuntu-precise-1204-lts-64-bit)

The installation failed with the following error:

The following packages have unmet dependencies:
build-essential : Depends: libc6-dev but it is not going to be installed or
libc-dev
Depends: g++ (>= 4:4.4.3) but it is not going to be installed
Depends: dpkg-dev (>= 1.13.5) but it is not going to be installed
debhelper : Depends: perl but it is not going to be installed
Depends: dpkg-dev (>= 1.16.0) but it is not going to be installed
Depends: po-debconf but it is not going to be installed
Depends: dh-apparmor but it is not going to be installed
git : Depends: perl-modules but it is not going to be installed
Depends: liberror-perl but it is not going to be installed
python-all : Depends: python (= 2.7.3-0ubuntu2) but it is not going to be installed
Depends: python2.7 (>= 2.7.2-3) but it is not going to be installed
python-setuptools : Depends: python (>= 2.6) but it is not going to be installed
Depends: python (< 2.8) but it is not going to be installed
Depends: python-pkg-resources (= 0.6.24-1ubuntu1) but it is not going to be installed
supervisor : Depends: python (>= 2.3) but it is not going to be installed
Depends: python-medusa (>= 0.5.4) but it is not going to be installed
Depends: python-meld3 but it is not going to be installed
Depends: python-pkg-resources (>= 0.6c7) but it is not going to be installed
Depends: python-support (>= 0.90.0) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
2014/11/11 09:25:13 The command [/bin/sh -c apt-get install -y postgresql-8.4 openssh-server sudo git net-tools supervisor python-setuptools python-all build-essential debhelper] returned a non-zero code: 100

My guess is that it is caused by the different tags, i.e., precise vs raring but I'm unsure as to how to resolve this. Any ideas?

Thanks in advance.

gan

@cffgan
Copy link

cffgan commented Nov 12, 2014

Apologies, just realised this Dockerfile was tested only on EC2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment