Skip to content

Instantly share code, notes, and snippets.

View GD06's full-sized avatar

Xinfeng GD06

  • UC, Santa Barbara
  • https://seal.ece.ucsb.edu/location
View GitHub Profile
import argparse
from datetime import datetime
import numpy as np
import os
import logging
import torch
import torch.distributed as dist
import torch.multiprocessing as mp
from torch.nn.parallel import DistributedDataParallel as DDP
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <memory.h>
#include <string.h>
#include <math.h>
void DataInit(float* ptr, int length)
{
srand(7);
#include <stdlib.h>
#include <stdio.h>
#include <mkldnn.h>
#include <sys/time.h>
#include <memory.h>
#define CHECK(f) do { \
mkldnn_status_t s = f; \
if (s != mkldnn_success) { \
printf("[%s:%d] error: %s return %d,\n", __FILE__, __LINE__, #f, s); \
@GD06
GD06 / Makefile
Last active August 10, 2017 04:48
Update the Makefile to use GNU thread library only
MKLROOT=/home/security/intel/mkl
MKLDNNROOT=/home/security/.local
COMMON_FLAGS=-O4 -std=c++11
all:main.o mkldnn_conv.o im2col_mkl.o
g++ $(COMMON_FLAGS) -o main $^ \
-L ${MKLDNNROOT}/lib -lmkldnn -lmklml_intel \
-Wl,--start-group \
${MKLROOT}/lib/intel64/libmkl_intel_lp64.a \
${MKLROOT}/lib/intel64/libmkl_gnu_thread.a \
@GD06
GD06 / nv-topo-matrix.txt
Created June 15, 2017 07:10
The topology of GPU interconnections for the machine equipped with 8 Titan X GPUs
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 mlx4_0 CPU Affinity
GPU0 X PIX PHB PHB SOC SOC SOC SOC SOC 0-9,20-29
GPU1 PIX X PHB PHB SOC SOC SOC SOC SOC 0-9,20-29
GPU2 PHB PHB X PIX SOC SOC SOC SOC SOC 0-9,20-29
GPU3 PHB PHB PIX X SOC SOC SOC SOC SOC 0-9,20-29
GPU4 SOC SOC SOC SOC X PIX PHB PHB PHB 10-19,30-39
GPU5 SOC SOC SOC SOC PIX X PHB PHB PHB 10-19,30-39
GPU6 SOC SOC SOC SOC PHB PHB X PIX PHB 10-19,30-39
GPU7 SOC SOC SOC SOC PHB PHB PIX X PHB 10-19,30-39
mlx4_0 SOC SOC SOC SOC PHB PHB PHB PHB X
@GD06
GD06 / tf_multiGPU.py
Created June 15, 2017 06:11
The python script to test the scalability of TensorFlow on single machine equipped with multiGPU.
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
#
import os
import argparse
import pwd
import re
import csv
import numpy as np
import pickle
@GD06
GD06 / training.py
Created May 3, 2017 10:03
Worker 1 exists after the first time output
'''
Distributed Tensorflow 0.8.0 example of using data parallelism and share model parameters.
Trains a simple sigmoid neural network on mnist for 20 epochs on three machines using one parameter server.
Change the hardcoded host urls below with your own hosts.
Run like this:
pc-01$ python example.py --job_name="ps" --task_index=0
pc-02$ python example.py --job_name="worker" --task_index=0
pc-03$ python example.py --job_name="worker" --task_index=1
@GD06
GD06 / training.py
Last active July 25, 2017 11:44
Distributed training script for the worker and parameter servers
'''
Distributed Tensorflow 0.8.0 example of using data parallelism and share model parameters.
Trains a simple sigmoid neural network on mnist for 20 epochs on three machines using one parameter server.
Change the hardcoded host urls below with your own hosts.
Run like this:
pc-01$ python example.py --job_name="ps" --task_index=0
pc-02$ python example.py --job_name="worker" --task_index=0
pc-03$ python example.py --job_name="worker" --task_index=1
'''
Distributed Tensorflow 0.8.0 example of using data parallelism and share model parameters.
Trains a simple sigmoid neural network on mnist for 20 epochs on three machines using one parameter server.
Change the hardcoded host urls below with your own hosts.
Run like this:
pc-01$ python example.py --job_name="ps" --task_index=0
pc-02$ python example.py --job_name="worker" --task_index=0
pc-03$ python example.py --job_name="worker" --task_index=1
@GD06
GD06 / run.sh
Last active May 3, 2017 10:01
For running the distributed training on the localhost among different processes
#!/bin/bash -e
CUDA_VISIBLE_DEVICES='' python3 training.py --job_name="ps" --task_index=0 &
sleep 60
CUDA_VISIBLE_DEVICES='0' python3 training.py --job_name="worker" --task_index=0 2> worker_1_log &
CUDA_VISIBLE_DEVICES='1' python3 training.py --job_name="worker" --task_index=1 2> worker_2_log