Skip to content

Instantly share code, notes, and snippets.

@dkapitan
Forked from alonisser/Dockerfile
Last active January 11, 2023 13:24
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dkapitan/c3342956631a59799c7462320082ce52 to your computer and use it in GitHub Desktop.
Save dkapitan/c3342956631a59799c7462320082ce52 to your computer and use it in GitHub Desktop.
Dockerfile for spacy prodigy for cloud setup using remote postgresql including changing instructions file and overriding index.html - and the leanest I've got
FROM python:3.6-alpine
# Opted for alpine to get a lean docker image as possible
RUN apk add --no-cache openssl
ENV DOCKERIZE_VERSION v0.6.1
RUN wget https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-alpine-linux-amd64-$DOCKERIZE_VERSION.tar.gz \
&& tar -C /usr/local/bin -xzvf dockerize-alpine-linux-amd64-$DOCKERIZE_VERSION.tar.gz \
&& rm dockerize-alpine-linux-amd64-$DOCKERIZE_VERSION.tar.gz
# Python deps for alpine
RUN apk add --no-cache postgresql-libs && \
apk add --no-cache --virtual .build-deps gcc musl-dev postgresql-dev g++
RUN mkdir -pv /prodigy /prodigy/src
WORKDIR /prodigy
# the prodigy wheel file is something you get when you buy prodigy, it's not a free package
COPY ./*.whl /prodigy
COPY requirements.txt /prodigy
RUN pip install -r requirements.txt --no-cache-dir \
&& find /usr/local \
\( -type d -a -name test -o -name tests \) \
-o \( -type f -a -name '*.pyc' -o -name '*.pyo' \) \
-exec rm -rf '{}' + \
&& runDeps="$( \
scanelf --needed --nobanner --recursive /usr/local \
| awk '{ gsub(/,/, "\nso:", $2); print "so:" $2 }' \
| sort -u \
| xargs -r apk info --installed \
| sort -u \
)" \
&& apk add --virtual .rundeps $runDeps \
&& apk del .build-deps
ENV PRODIGY_HOME /prodigy
# the template that dockerize would use to create the actual prodigy.json config based on environment variables
COPY ./prodigy.json.tpl /prodigy/prodigy.json.tpl
#Preparing an instructions file
COPY ./instructions.txt /prodigy/instructions.txt
COPY *.sh /prodigy/
COPY src/* /prodigy/src/
# Comment the two next lines if you don't override the index.html file
COPY static/index.html index.html
RUN PRODIGY_FILES=`python -c "import prodigy; print(prodigy.__file__.replace(\"__init__.py\",''))"`; cp index.html $PRODIGY_FILES/static/
# The actual entry point compiling the template
CMD ["dockerize", "-template", "/prodigy/prodigy.json.tpl:/prodigy/prodigy.json", "./launch.sh"]
EXPOSE 8080

for now we only need one Dockerfile, because the only difference between the two is EXPOSE and CMD. If we take the batch image as the default, we could launch the Prodigy server via `docker run -p 8080:8080 python -m prodigy ctgov_titles tsr_worthy_dataset -F recipe.py' getting error in building the Dockerfile batch, issue with prodigy wheel file. Don't know why, but I think the following is the best way to solve it upgrade to Python 3.10-slim-buster since we are at it upgrade to prodigy 1.11.8 include all linux wheels in the Docker build process just to be sure At a later stage we could use Docker [multi-stage builds](Docker multi-stage builds) to keep Docker images to a minimum. using FROM python:3.10-slim-buster will make the image size smaller (see this article). Full 3.10 is almost 1 GB, the slim version around 165-200 MB.

{
"batch_size": 5,
"host":"0.0.0.0",
"instructions":"/prodigy/instructions.txt",
"hide_meta": true,
"choice_auto_accept": true,
"db": "postgresql",
"db_settings": {
"postgresql": {
"host":"{{ .Env.DB_HOST }}",
"dbname": "{{ .Env.DATABASE_NAME }}",
"port": 5432,
"user":"{{ .Env.DB_USERNAME }}",
"password":"{{ .Env.DB_PASSWORD }}"
}
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment