Skip to content

Instantly share code, notes, and snippets.

View mehdidc's full-sized avatar

Mehdi Cherti mehdidc

View GitHub Profile
import io
import tarfile
import random
from collections import defaultdict
from lxml import etree
import uuid
from PIL import Image, ImageDraw
from glob import glob
import time
import os
@mehdidc
mehdidc / pytorch_performance_profiling.md
Created February 16, 2023 08:44 — forked from mingfeima/pytorch_performance_profiling.md
How to do performance profiling on PyTorch

(Internal Tranining Material)

Usually the first step in performance optimization is to do profiling, e.g. to identify performance hotspots of a workload. This gist tells basic knowledge of performance profiling on PyTorch, you will get:

  • How to find the bottleneck operator?
  • How to trace source file of a particular operator?
  • How do I indentify threading issues? (oversubscription)
  • How do I tell a specific operator is running efficiently or not?

This tutorial takes one of my recent projects - pssp-transformer as an example to guide you through path of PyTorch CPU peformance optimization. Focus will be on Part 1 & Part 2.

model_fullname,model_fullname_pretty,model_arch,samples_seen,gmacs_per_sample,gmacs_total,upstream_dataset,downstream_dataset,acc1,acc5,mean_per_class_recall,image_retrieval_recall@5,text_retrieval_recall@5
ViT-g-14 /fsx/rom1504/open_clip/good_models/g_90.pt,g/14 2B,ViT-g-14,12208147020,290.74,3549396664594.8003,LAION-2B,vtab+,0.5654112282297443,0.8329414582676622,0.56279878057792,,
ViT-g-14 /fsx/rom1504/open_clip/good_models/g_90.pt,g/14 2B,ViT-g-14,12208147020,290.74,3549396664594.8003,LAION-2B,vtab/caltech101,0.8522353714661407,0.963346482577252,0.944284654839904,,
ViT-g-14 /fsx/rom1504/open_clip/good_models/g_90.pt,g/14 2B,ViT-g-14,12208147020,290.74,3549396664594.8003,LAION-2B,imagenet1k,0.76664,0.9485,0.76656,,
ViT-g-14 /fsx/rom1504/open_clip/good_models/g_90.pt,g/14 2B,ViT-g-14,12208147020,290.74,3549396664594.8003,LAION-2B,vtab/cifar100,0.8391,0.9729,0.8388,,
ViT-g-14 /fsx/rom1504/open_clip/good_models/g_90.pt,g/14 2B,ViT-g-14,12208147020,290.74,3549396664594.8003,LAION-2B,imagenetv2,0.6961,0.9086,0.6
We can make this file beautiful and searchable if this error is corrected: It looks like row 6 should actually have 13 columns, instead of 11. in line 5.
model_fullname,model_fullname_pretty,model_arch,samples_seen,gmacs_per_sample,gmacs_total,upstream_dataset,downstream_dataset,acc1,acc5,mean_per_class_recall,image_retrieval_recall@5,text_retrieval_recall@5
ViT-g-14 /fsx/rom1504/open_clip/good_models/g_90.pt,g/14 2B,ViT-g-14,12208147020,290.74,3549396664594.8003,LAION-2B,vtab+,0.5654112282297443,0.8329414582676622,0.56279878057792,,
ViT-g-14 /fsx/rom1504/open_clip/good_models/g_90.pt,g/14 2B,ViT-g-14,12208147020,290.74,3549396664594.8003,LAION-2B,vtab/caltech101,0.8522353714661407,0.963346482577252,0.944284654839904,,
ViT-g-14 /fsx/rom1504/open_clip/good_models/g_90.pt,g/14 2B,ViT-g-14,12208147020,290.74,3549396664594.8003,LAION-2B,imagenet1k,0.76664,0.9485,0.76656,,
ViT-g-14 /fsx/rom1504/open_clip/good_models/g_90.pt,g/14 2B,ViT-g-14,12208147020,290.74,3549396664594.8003,LAION-2B,vtab/cifar100,0.8391,0.9729,0.8388,,
ViT-g-14 /fsx/rom1504/open_clip/good_models/g_90.pt,g/14 2B,ViT-g-14,12208147020,290.74,3549396664594.8003,LAION-2B,imagenetv2,0.6961,0.9086,0.6
@mehdidc
mehdidc / example.sbatch
Created September 27, 2022 10:17
Content of the files
#!/bin/bash -x
#SBATCH --account=cstdl
#SBATCH --nodes=8
#SBATCH --gres=gpu:4
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=12
#SBATCH --wait-all-nodes=1
#SBATCH --time=00:30:00
#SBATCH --partition=batch
#SBATCH --job-name=open_clip
This file has been truncated, but you can view the full file.
{"info": {"description": "COCO 2014 Dataset", "url": "http://cocodataset.org", "version": "1.0", "year": 2014, "contributor": "COCO Consortium", "date_created": "2017/09/01"}, "images": [{"license": 3, "file_name": "COCO_val2014_000000391895.jpg", "coco_url": "http://images.cocodataset.org/val2014/COCO_val2014_000000391895.jpg", "height": 360, "width": 640, "date_captured": "2013-11-14 11:18:45", "flickr_url": "http://farm9.staticflickr.com/8186/8119368305_4e622c8349_z.jpg", "id": 391895}, {"license": 4, "file_name": "COCO_val2014_000000060623.jpg", "coco_url": "http://images.cocodataset.org/val2014/COCO_val2014_000000060623.jpg", "height": 427, "width": 640, "date_captured": "2013-11-14 17:24:15", "flickr_url": "http://farm7.staticflickr.com/6080/6113512699_37b4c98473_z.jpg", "id": 60623}, {"license": 3, "file_name": "COCO_val2014_000000483108.jpg", "coco_url": "http://images.cocodataset.org/val2014/COCO_val2014_000000483108.jpg", "height": 640, "width": 428, "date_captured": "2013-11-14 18:27:53", "flickr_u
import matplotlib as mpl
mpl.use('Agg')
import argparse
import pandas as pd
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
def plot_scaling_and_efficiency(df):
"""
@mehdidc
mehdidc / config.yaml
Created September 24, 2021 05:32
Config example with diversity
lr: 0.001
epochs: 10
noise_dim: 128
dim: 128
depth: 8
vq_image_size: 16
dropout: 0
cutn: 16
batch_size: 2
repeat: 8
This file has been truncated, but you can view the full file.
https://i.pinimg.com/736x/66/01/6c/66016c3ba27c0e04f39e2bd81a934e3e--anita-ekberg-bob-hope.jpg
http://www.standard.net/image/2015/02/04/800x_a16-9_b0_q81_p1/winter-fly-fishing.jpg
http://indianapolis-photos.funcityfinder.com/files/2009/12/Clearwater-Crossing-Shopping-Center-sign-Indianapolis-Indiana.jpg
http://www.abc.net.au/news/image/9066492-3x2-700x467.jpg
https://www.featurepics.com/StockImage/20090316/carrying-globe-stock-image-1115085.jpg
http://i.dailymail.co.uk/i/pix/2014/11/05/1415187324676_wps_31_Home_is_a_little_Deer_Ivy.jpg
http://www.waste360.com/sites/waste360.com/files/styles/article_featured_standard/public/Trista%2002%20007_0.jpg?itok=F1eJZsX3
https://media.gettyimages.com/photos/young-woman-seated-on-the-beach-picture-id97545987?s=612x612
https://worldjourneysdiscover.files.wordpress.com/2014/07/kyoto-07.jpg?w=860&h=645
http://piquemagazine.uk/wp-content/uploads/2017/10/LPO-24-Feb-Albrecht-Menzel-%C2%AE-Anne-Hornemann-300dpi.jpg
import os
from imageio import imread
import pandas as pd
import lmdb
from caffe2.proto import caffe2_pb2
# Folder should contain a set of images
image_folder = "flickr30k_images"
# CSV should contain image filenames with corresponding captions
dataframe_path = "flickr30k_images/results.csv"