Skip to content

Instantly share code, notes, and snippets.

@jacksonjos
Last active May 6, 2018 16:33
Show Gist options
  • Save jacksonjos/cf75a515e63c0ed254ab195fa5fc3b3b to your computer and use it in GitHub Desktop.
Save jacksonjos/cf75a515e63c0ed254ab195fa5fc3b3b to your computer and use it in GitHub Desktop.
Repository with instructions to build a Dokcer image and creating a container to compile and install the development Python Pandas module to contribute to Pandas project

Instructions to contribute to pandas-dev Python module inside a Docker container

All commands are executed as root because the Docker image is not build with a common Linux user, but the only command that would need root permission is chmod over '/opt' directory because the root user is its owner. The other commands may be run without sudo.

Fork pandas project on Github

1. Login on github.com using your account

2. Go to pandas project page: https://github.com/pandas-dev/pandas

3. Click on Fork button in the upper right side of Pandas project page

Create and build the Docker container for your pandas development repository

1. Create directory to where the pandas repository will be cloned to in the host machine
`$ mkdir pandas-jackson`

2. Build Docker image
`# docker build -t pandas_dev_i --build-arg yourname=jackson .`

3. Create docker container with unamed docker volume and binds the pandas project host directory to pandas container directory/volume  
`# docker run --name pandas_dev_c -v "$(pwd)/pandas-jackson":/pandas-jackson -i -t pandas_dev_i /bin/bash`

Clone Pandas development repository and configure your git inside the container

1. Clone Pandas development repository
`# git clone git@github.com:jacksonjos/pandas.git .`

2. Connects your repository to the upstream (main project) pandas repository.
`# git remote add upstream https://github.com/pandas-dev/pandas.git`

3. Configure your e-mail and name to be identified in GitHub
`$ git config --global user.mail "jacksonjsouza@gmail.com"`
`$ git config --global user.name "Jackson Souza"`

Configure anaconda to use a virtual environment and to install pandas

1. To Make possible to use 'conda activate' command type the following
<!-- # rm /etc/profile.d/conda.sh && ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh -->
`# chmod +x /opt/conda/etc/profile.d/conda.sh`  
`# echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc`
<!-- `# echo "conda activate" >> ~/.bashrc` -->

2. Install IPython interpretor and debugger to help debugging pandas-dev errors
`# conda install ipython`

3. Create and activate the build environment
`# conda env create -f ci/environment-dev.yaml && source activate pandas-dev`

4. Build and install pandas
`# python setup.py build_ext --inplace -j 4 && python -m pip install -e .`

5. Install the rest of the optional dependencies
`# conda install -c defaults -c conda-forge --file=ci/requirements-optional-conda.txt`

Instruction to enter and exit the virtual environment and stop, start and enter in the container

1. To activate pandas-dev:
`# conda activate pandas-dev`  

2. To deactivate pandas-dev:
`# conda deactivate`  

3. To exit the container type:
`# exit`

4. To stop this container type:
`# docker stop pandas_dev_c`

5. To start this container type:
`# docker start pandas_dev_c`

6. To enter the container type:
`# docker exec -ti pandas_dev_c /bin/bash`

Update git repository to be up to date to the official/main Pandas repository

1. Download changes made in the main/upstream repository:
`# git fetch upstream`

2. Checkout repository to local master branch
`# git checkout master`

3. Merge downloaded changes at the local master branch
`# git merge upstream/master`

Doing contributions using a branch

1. Commit the changes you have made

2. Rebase your commit using:
`git rebase -i HEAD~#`
Be careful when rebasing to don't screw up your work

3. Update the git repository as said in the topic above

4. Checkout to the branch you are contributing to
E.g: `git checkout shiny-new-feature`

5. Rebase your branch to the master branch:
`git rebase -i master`

6. Push your changes
`git push origin shiny-new-feature -f`

References:

[1] https://help.github.com/articles/syncing-a-fork/
[2] https://pandas.pydata.org/pandas-docs/stable/contributing.html#combining-commits

FROM continuumio/miniconda3:4.4.10
MAINTAINER Jackson Souza "jackson@ime.usp.br"
RUN echo "\ndeb-src http://archive.ubuntu.com/ubuntu/ xenial main" >> /etc/apt/sources.list
RUN apt-key adv --recv-keys --keyserver keyserver.ubuntu.com 40976EAF437D05B5
RUN apt-key adv --recv-keys --keyserver keyserver.ubuntu.com 3B4FE6ACC0B21F32
RUN apt-get update
# It's also needed to install python3.5 package, it's already installed at
# miniconda3 image
RUN apt-get install -y git-core vim
# Instala as dependências para compilar o Python para desenvolvimento
RUN apt-get build-dep -y python3.5
ARG yourname
ENV yourname=${yourname}
RUN mkdir -p /pandas-${yourname}
WORKDIR /pandas-${yourname}
CMD ["/bin/bash"]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment