Skip to content

Instantly share code, notes, and snippets.

@yspkm
Last active January 18, 2024 06:52
Show Gist options
  • Save yspkm/5ec8dbb4334c452daf30007a9ee43069 to your computer and use it in GitHub Desktop.
Save yspkm/5ec8dbb4334c452daf30007a9ee43069 to your computer and use it in GitHub Desktop.
NAS 관련 문서

PyTorch and NVIDIA CUDA Issue Resolution

Overview

cuda 12.3, nvidia-545 상에서 파이토치가 cudnn 헤더를 못잡는 문제가 있음

Recommended Installation Method

다음과 같이 cuda11.8을 같이 설치해서 연결해주는 식으로 하거나

pip3 install torch torchvision torchaudio torchtext --index-url https://download.pytorch.org/whl/cu118
# 참고로, pip3 install torchtext portal

도커 컨테이너 상에서 학습을 시켜야 함

# Dockerfile
#FROM pytorch/pytorch:2.1.2-cuda11.8-cudnn8-runtime
FROM pytorch/pytorch:2.1.2-cuda12.1-cudnn8-runtime
# Remove any third-party apt sources to avoid issues with expiring keys.
RUN rm -f /etc/apt/sources.list.d/*.list
# Install some basic utilities & python prerequisites
RUN apt-get update -y && apt-get install -y --no-install-recommends\
wget \
vim \
curl \
ssh \
tree \
sudo \
git \
libgl1-mesa-glx \
libglib2.0-0 \
zip && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
# Set up time zone
ENV TZ=Asia/Seoul
RUN sudo ln -snf /usr/share/zoneinfo/$TZ /etc/localtime
# pip
RUN python -m pip install --upgrade pip
WORKDIR /app
COPY . .
echo "=== Acquiring datasets ==="
echo "---"
mkdir -p save
mkdir -p data
cd data
echo "- Downloading WikiText-2 (WT2)"
wget --quiet --continue https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-v1.zip
unzip -q wikitext-2-v1.zip
cd wikitext-2
mv wiki.train.tokens train.txt
mv wiki.valid.tokens valid.txt
mv wiki.test.tokens test.txt
cd ..
echo "- Downloading WikiText-103 (WT2)"
wget --continue https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-103-v1.zip
unzip -q wikitext-103-v1.zip
cd wikitext-103
mv wiki.train.tokens train.txt
mv wiki.valid.tokens valid.txt
mv wiki.test.tokens test.txt
cd ..
echo "- Downloading enwik8 (Character)"
mkdir -p enwik8
cd enwik8
wget --continue http://mattmahoney.net/dc/enwik8.zip
python prep_enwik8.py
cd ..
echo "- Downloading Penn Treebank (PTB)"
wget --quiet --continue http://www.fit.vutbr.cz/~imikolov/rnnlm/simple-examples.tgz
tar -xzf simple-examples.tgz
mkdir -p penn
cd penn
mv ../simple-examples/data/ptb.train.txt train.txt
mv ../simple-examples/data/ptb.test.txt test.txt
mv ../simple-examples/data/ptb.valid.txt valid.txt
cd ..
echo "- Downloading Penn Treebank (Character)"
mkdir -p pennchar
cd pennchar
mv ../simple-examples/data/ptb.char.train.txt train.txt
mv ../simple-examples/data/ptb.char.test.txt test.txt
mv ../simple-examples/data/ptb.char.valid.txt valid.txt
cd ..
rm -rf simple-examples/
echo "---"
echo "Happy language modeling :)"

이미지 받아 두기

docker pull pytorch/pytorch:2.1.2-cuda12.1-cudnn8-runtime 
docker pull pytorch/pytorch:2.1.2-cuda12.1-cudnn8-devel 

첨부된 Dockerfile 빌드하고

docker image build . -t <image>:<tag>
docker container run --name <container> -d -i -t --rm --runtime=nvidia --gpus 0  <image:tag> # 전부 사용하려면 --gpus all

도커 컨테이너상에서 잘 돌아가는지 확인

root@3af30ec07621:/app# ls
Dockerfile  LICENSE  README.md  cnn  data  img  packages.txt  rnn  test  venv
root@3af30ec07621:/app# cd cnn
root@3af30ec07621:/app/cnn# python3 train_search.py --unrolled --batch_size 32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment