Skip to content

Instantly share code, notes, and snippets.

@ibnesayeed
Created July 20, 2016 20:58
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ibnesayeed/b4cf439e21b35e9ab0ef1f44a4584a31 to your computer and use it in GitHub Desktop.
Save ibnesayeed/b4cf439e21b35e9ab0ef1f44a4584a31 to your computer and use it in GitHub Desktop.
ArchiveSpark Docker image
FROM jupyter/notebook
MAINTAINER Sawood Alam <ibnesayeed@gmail.com>
RUN apt-get update && apt-get install -y default-jre
RUN curl -L -O http://mirrors.ocf.berkeley.edu/apache/spark/spark-1.6.1/spark-1.6.1-bin-hadoop2.6.tgz \
&& mkdir /spark \
&& tar -xf spark-1.6.1-bin-hadoop2.6.tgz --strip-components=1 -C /spark \
&& rm spark-1.6.1-bin-hadoop2.6.tgz
RUN curl -L -O http://l3s.de/~holzmann/archivespark-kernel.tar.gz \
&& mkdir -p /root/.ipython/kernels \
&& tar -xf archivespark-kernel.tar.gz -C /root/.ipython/kernels \
&& rm archivespark-kernel.tar.gz
ADD kernel.json /root/.ipython/kernels/archivespark/kernel.json
ADD example.ipynb /notebooks/
ADD cdx /cdx
ADD warc /warc
CMD ["jupyter", "notebook", "--no-browser"]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment