Skip to content

Instantly share code, notes, and snippets.

@m-klasen
Last active November 22, 2023 19:48
Show Gist options
  • Star 28 You must be signed in to star a gist
  • Fork 4 You must be signed in to fork a gist
  • Save m-klasen/651297e28199b4bb7907fc413c49f58f to your computer and use it in GitHub Desktop.
Save m-klasen/651297e28199b4bb7907fc413c49f58f to your computer and use it in GitHub Desktop.

Get pretrained weights:

wget https://dl.fbaipublicfiles.com/detr/detr-r50-e632da11.pth

Remove class weights

checkpoint = torch.load("detr-r50-e632da11.pth", map_location='cpu')
del checkpoint["model"]["class_embed.weight"]
del checkpoint["model"]["class_embed.bias"]
torch.save(checkpoint,"detr-r50_no-class-head.pth")

and make sure to set non-strict weight loading in main.py

model_without_ddp.load_state_dict(checkpoint['model'], strict=False)

Your dataset should ideally be in the COCO-format. Make your own data-builder (alternatively rename your train/valid/annotation file to match the COCO Dataset) In datasets.coco.py add:

def build_your_dataset(image_set, args):
    root = Path(args.coco_path)
    assert root.exists(), f'provided COCO path {root} does not exist'
    mode = 'instances'
    PATHS = {
        "train": (root / "train", root / "annotations" / f'train.json'),
        "val": (root / "valid", root / "annotations" / f'valid.json'),
    }

    img_folder, ann_file = PATHS[image_set]
    dataset = CocoDetection(img_folder, ann_file, transforms=make_coco_transforms(image_set), return_masks=args.masks)
    return dataset

In datasets.__init__.py add your builder as an option:

def build_dataset(image_set, args):
    if args.dataset_file == 'coco':
        return build_coco(image_set, args)
    if args.dataset_file == 'your_dataset':
        return build_your_dataset(image_set, args)
    [...]

And lastly define how many classes you have in models.detr.py

def build(args):
    [...]
    if args.dataset_file == 'your_dataset': num_classes = 4
    [...]

Run your model (example): python main.py --dataset_file your_dataset --coco_path data --epochs 50 --lr=1e-4 --batch_size=2 --num_workers=4 --output_dir="outputs" --resume="detr-r50_no-class-head.pth"

@quangdaist01
Copy link

Hi @dsshean I have done the same mentioned above and I am still getting a single bounding box in the same position for all the image. I have trained the model with custom dataset with class label of 1. Have you trained with custom dataset and then using the same collab notebook example for inference?

I've been encountering the same issue for a while. Yesterday I just figured out the correct way to load my custom DETR model, which came exactly for the deadline of my project later in the day :))). Here is the colab notebook with some impressing result. The DETR class I use to load all matched keys lies in hubconfig.py on the original detr repositoy. Hope this help

@zobeirraisi
Copy link

Hi @dsshean I have done the same mentioned above and I am still getting a single bounding box in the same position for all the image. I have trained the model with custom dataset with class label of 1. Have you trained with custom dataset and then using the same collab notebook example for inference?

I've been encountering the same issue for a while. Yesterday I just figured out the correct way to load my custom DETR model, which came exactly for the deadline of my project later in the day :))). Here is the colab notebook with some impressing result. The DETR class I use to load all matched keys lies in hubconfig.py on the original detr repositoy. Hope this help
I saw your notebook and thanks for sharing it, I am wondering how I can find the datasets used in the notebook?

@rsharmapty
Copy link

@zobeirraisi
Can you please mention the number of epochs trained with and the dataset size. as I see 100 predictions with same value no more than 0.2 score.

I have trained with 2k dataset size with 300 epochs.

@quangdaist01
Copy link

@zobeirraisi
@rsharmapty
I've updated my model and dataset link in the notebook, you can download and try it yourself. I trained it with 200 epochs but it seemed to plateau at around 70 or 80th epoch due to small dataset size

@rsharmapty
Copy link

rsharmapty commented Jul 3, 2020

@quangdaist123
can you please share information regarding, how can I visualize the training graphs?

@dsshean
Copy link

dsshean commented Jul 4, 2020

I did about 400 epochs 10gb one class. I have weird quirks with the loss, but end of the day loading model by above colab notebook or the official detr colab notebook on predictions gets you results.

As for visualizing the graph import https://github.com/facebookresearch/detr/blob/master/util/plot_utils.py

there are only two functions, use either to plot against the output dir.

@Haresh-16
Copy link

Hi @dsshean I have done the same mentioned above and I am still getting a single bounding box in the same position for all the image. I have trained the model with custom dataset with class label of 1. Have you trained with custom dataset and then using the same collab notebook example for inference?

I've been encountering the same issue for a while. Yesterday I just figured out the correct way to load my custom DETR model, which came exactly for the deadline of my project later in the day :))). Here is the colab notebook with some impressing result. The DETR class I use to load all matched keys lies in hubconfig.py on the original detr repositoy. Hope this help

It was a very useful colab notebook for custom DETR training @quangdaist123. I want to know if you had provided samples in the training data for the "background class" too? It's because the num_classes you've given is 3 and I wondered if above was the case. Thanks in advance.

@Haresh-16
Copy link

@zobeirraisi
Can you please mention the number of epochs trained with and the dataset size. as I see 100 predictions with same value no more than 0.2 score.

I have trained with 2k dataset size with 300 epochs.

@rsharmapty I've been facing the same issue where I'm getting 100 predictions with value no more than 0.2 . Have you overcome the issue? And may I also know the num_classes in your case. The dataset size of mine is ~1k images with 2 classes.

@rsharmapty
Copy link

rsharmapty commented Jul 14, 2020

@Haresh-16
make sure you use pre-trained weights properly, if you still failing to get results decent enough please let me know.

num_classes = 3

your dataset and num_classes are pretty good w.r.t. get some decent results.

@rsharmapty
Copy link

@lessw2020
@mlk1337
@quangdaist123
@dsshean

A big thanks to all of you guys for all the information provided, I am able to generate satisfactory first pass result with the help of your shared information.

@Haresh-16
Copy link

@Haresh-16
make sure you use pre-trained weights properly, if you still failing to get results decent enough please let me know.

num_classes = 3

your dataset and num_classes are pretty good w.r.t. get some decent results.

Thank you @rsharmapty. How much epochs do you think I should train for ? Thanks in advance.

@rsharmapty
Copy link

@Haresh-16
with your dataset, 100 epochs would give you good results.

@Haresh-16
Copy link

Thank you @rsharmapty

@woctezuma
Copy link

woctezuma commented Jul 21, 2020

Hey, yes this is done quite easily. One example:

for param in model_without_ddp.parameters():
    param.requires_grad = False
model.class_embed.weight.requires_grad = True
model.class_embed.bias.requires_grad = True

to disable all weights except specified ones.

Another example could be via named_parameters:

for n, p in model.named_parameters():
        if "backbone" in n:
            p.requires_grad = False
        if "transformer" in n:
            p.requires_grad = False

to disable section of the network. Best of luck.

For information, the two snippets of code are not equivalent:

for n, p in model.named_parameters():
  if "backbone" not in n and "transformer" not in n:
    print(n)
class_embed.weight
class_embed.bias
bbox_embed.layers.0.weight
bbox_embed.layers.0.bias
bbox_embed.layers.1.weight
bbox_embed.layers.1.bias
bbox_embed.layers.2.weight
bbox_embed.layers.2.bias
query_embed.weight
input_proj.weight
input_proj.bias

By the way, would you recommend freezing weights when fine-tuning with new classes? If so, for which layers?

@Haresh-16
Copy link

@Haresh-16
with your dataset, 100 epochs would give you good results.

Thank you @rsharmapty. I was able to achieve the desired results by following your directions.

@Haresh-16
Copy link

Hi @dsshean I have done the same mentioned above and I am still getting a single bounding box in the same position for all the image. I have trained the model with custom dataset with class label of 1. Have you trained with custom dataset and then using the same collab notebook example for inference?

I've been encountering the same issue for a while. Yesterday I just figured out the correct way to load my custom DETR model, which came exactly for the deadline of my project later in the day :))). Here is the colab notebook with some impressing result. The DETR class I use to load all matched keys lies in hubconfig.py on the original detr repositoy. Hope this help
I saw your notebook and thanks for sharing it, I am wondering how I can find the datasets used in the notebook?

@quangdaist123 How many epochs did you train for using the dataset you've provided in your colab notebook? Thanks in advance.

@shafinbinhamid
Copy link

Hi @quangdaist123! I cannot seem to access the colab notebook you provided for custom dataset training. It says the file is not accessible. Can you please make it accessible again? Thanks in advance.

Inkedcolab_LI

@quangdaist01
Copy link

Hi @dsshean I have done the same mentioned above and I am still getting a single bounding box in the same position for all the image. I have trained the model with custom dataset with class label of 1. Have you trained with custom dataset and then using the same collab notebook example for inference?

I've been encountering the same issue for a while. Yesterday I just figured out the correct way to load my custom DETR model, which came exactly for the deadline of my project later in the day :))). Here is the colab notebook with some impressing result. The DETR class I use to load all matched keys lies in hubconfig.py on the original detr repositoy. Hope this help
I saw your notebook and thanks for sharing it, I am wondering how I can find the datasets used in the notebook?

@quangdaist123 How many epochs did you train for using the dataset you've provided in your colab notebook? Thanks in advance.

I trained the model over 200 epochs but It stopped getting better result at 70th epoch or 80th

@quangdaist01
Copy link

Hi @quangdaist123! I cannot seem to access the colab notebook you provided for custom dataset training. It says the file is not accessible. Can you please make it accessible again? Thanks in advance.

Inkedcolab_LI

I checked the link and didn't see any problem. If you want a direct download link to the note book, I just created one here

@shafinbinhamid
Copy link

Thank you @quangdaist123. The download link is working perfectly fine :)

@Madhusakth
Copy link

Hi, I am trying to train on a custom dataset with 38K train images, 6 classes. I fine-tuned the ResNet detr-r50 model for about 15 epochs and the MAP remains at zero. What are the recommended numbers of epochs to train with this dataset?
Since I have just one GPU, training takes about an hour for each epoch and I wanted to make sure I got other parameters right before I train it for longer.

Thanks!

@1chimaruGin
Copy link

can you please share the inference script for custom trained checkpoint.path for prediction ??
Thanks in advance !

Check it

https://github.com/woctezuma/finetune-detr/blob/master/finetune_detr.ipynb

It works for me.

@rsharmapty
Copy link

Hello,
I had made an attempt to change the num_queries to 500 as images have approximately 450 objects.
to which i received following error.

Traceback (most recent call last):
File "main.py", line 248, in
main(args)
File "main.py", line 178, in main
model_without_ddp.load_state_dict(checkpoint['model'], strict=False)
File "/home/rsharma/git/detr/.venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 847, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for DETR:
size mismatch for query_embed.weight: copying a param with shape torch.Size([100, 256]) from checkpoint, the shape in current model is torch.Size([500, 256]).

Can anybody please help me with this ??

@woctezuma
Copy link

Maybe:

If you're fine-tuning, I don't recommend changing the number of queries on the fly, it is extremely unlikely to work out of the box. In this case you're probably better off retraining from scratch (you can change the --num_queries arg from our training script).

facebookresearch/detr#9 (comment)

@rsharmapty
Copy link

@woctezuma
not able to resolve.

@m-klasen
Copy link
Author

m-klasen commented Oct 8, 2020

Hi,
if you change your number of queries, unfortunately, you will have to pretty much train from scratch (except for the resnet backbone). You cannot use transformer weights with num_queries=100for a transformer with 500, same goes for class_embed and bbox_embed. Basically retraining of everything except for the backbone is required. Sorry and good luck.

@rsharmapty
Copy link

@m-klasen thanks for your quick response,
I am having a small dataset ~2k images having 450 max objects if I am not using the transfer learning results are showing 0mAP (no accuracy) with default hyperparametrs, can you suggest something so that I can be able to predict 500 abjects per image with small dataset ??

@m-klasen
Copy link
Author

m-klasen commented Oct 8, 2020

Transformers are notoriously difficult to train, take a long time to converge (200 Epochs on COCO). Unless you can split your images into smaller subsets which feature >100 detections per crop it is going to be difficult.

@rsharmapty
Copy link

just to be clear,

  1. by no means I can change the num_queries and use the transfer learning.
  2. if I am training from scratch, I can get decent results in 200 epochs (because i see 0 mAP for 300 epochs in this case as well)

@woctezuma
Copy link

woctezuma commented Oct 8, 2020

See facebookresearch/detr#216 to work around the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment