Skip to content

Instantly share code, notes, and snippets.

@akrueger
Created March 6, 2017 06:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save akrueger/850115a0ce8e32ca4acdd9ab61a6753e to your computer and use it in GitHub Desktop.
Save akrueger/850115a0ce8e32ca4acdd9ab61a6753e to your computer and use it in GitHub Desktop.
Comparison of Antony Stubbs' bash script and NickK9's python script on linux kernel repo
// bash script
All sizes are in kB's. The pack column is the size of the object, compressed, inside the pack file.
size pack SHA location
2053 218 ee1da0cbc84d6ccf5d0714602e4364b6b8a85a32 drivers/gpu/drm/amd/include/asic_reg/bif/bif_5_1_sh_mask.h
1484 121 6fa98ea0ae40f9a38256f11e5dc270363f785aee sound/soc/codecs/wm8962-tables.c
1254 110 1ddc4183a1c91cd08a05557d575b3bcdc90a1ea6 drivers/gpu/drm/amd/include/asic_reg/dce/dce_11_2_sh_mask.h
1240 188 03f473116f78769af0434366387b3ca8f7a72db4 crypto/testmgr.h
1180 105 a438c2b6e2801327bff0746ec8e9b42f5cc2d70d drivers/gpu/drm/amd/include/asic_reg/dce/dce_11_0_sh_mask.h
1086 453 1f22c9ab66d407accb797e532ab194b043970a85 drivers/net/bnx2x_init_values.h
1069 79 c3f4f4387e162caced02ad58dc36938345c1cd05 drivers/media/dvb/frontends/drxk_map.h
989 72 0bbd4ae1f52466923e9103db8a5b40c7cb822f58 drivers/media/dvb-frontends/drx39xyj/drxj_map.h
982 98 4509c8237db508c5e85d3a63dd38814c1e05238a drivers/gpu/drm/amd/include/asic_reg/gca/gfx_7_2_sh_mask.h
977 404 63019055e4bb56962ba5767f662136a787921c4f drivers/net/bnx2x_init_values.h
// python script
All sizes in kB. The pack column is the compressed size of the object inside the pack file.
size pack hash path
1086 453 1f22c9ab66d407accb797e532ab194b043970a85 drivers/net/bnx2x_init_values.h
977 404 63019055e4bb56962ba5767f662136a787921c4f drivers/net/bnx2x_init_values.h
664 323 8405e719e7fb08c5f9cd17f7faa7f588b58a0e30 firmware/bnx2x/bnx2x-e2-6.2.9.0.fw.ihex
664 322 aef9aa622420d52a93942cec18f69b889e21a72f firmware/bnx2x/bnx2x-e2-6.2.5.0.fw.ihex
663 322 78b41615e7d9b8049b4495e03b0751c2cec91e1f firmware/bnx2x/bnx2x-e2-6.0.34.0.fw.ihex
298 298 aad3c80d07c83c6564918cba5b74c3048f1bba24 drivers/staging/ft1000/ft1000-pcmcia/ft1000.img
552 276 e78c86378f89220bb79b6e2ca23a03eb0c1ed761 firmware/bnx2x-e1h-5.0.21.0.fw.ihex
566 276 ba1ce53df1d87493599873c05571869b39b6645b firmware/bnx2x/bnx2x-e1h-6.2.9.0.fw.ihex
551 276 280bbcf4f2a133aa51c730dd87cdca65f0bff0db firmware/bnx2x-e1h-5.2.7.0.fw.ihex
@nk9
Copy link

nk9 commented Mar 7, 2017

Thanks for the reproducible case. When I use the -p option on the python script to sort the same way, I get the same results from the two scripts. Is this what you're seeing too?

⋊> ~/P/linux on master ⨯ ~/bin/largestFiles.py                          00:42:02
Finding the 10 largest objects…
Finding object paths…

All sizes in kB. The pack column is the compressed size of the object inside the pack file.

size  pack  hash                                      path
1086  453   1f22c9ab66d407accb797e532ab194b043970a85  drivers/net/bnx2x_init_values.h
977   404   63019055e4bb56962ba5767f662136a787921c4f  drivers/net/bnx2x_init_values.h
664   323   8405e719e7fb08c5f9cd17f7faa7f588b58a0e30  firmware/bnx2x/bnx2x-e2-6.2.9.0.fw.ihex
664   322   aef9aa622420d52a93942cec18f69b889e21a72f  firmware/bnx2x/bnx2x-e2-6.2.5.0.fw.ihex
663   322   78b41615e7d9b8049b4495e03b0751c2cec91e1f  firmware/bnx2x/bnx2x-e2-6.0.34.0.fw.ihex
298   298   aad3c80d07c83c6564918cba5b74c3048f1bba24  drivers/staging/ft1000/ft1000-pcmcia/ft1000.img
552   276   e78c86378f89220bb79b6e2ca23a03eb0c1ed761  firmware/bnx2x-e1h-5.0.21.0.fw.ihex
566   276   ba1ce53df1d87493599873c05571869b39b6645b  firmware/bnx2x/bnx2x-e1h-6.2.9.0.fw.ihex
551   276   280bbcf4f2a133aa51c730dd87cdca65f0bff0db  firmware/bnx2x-e1h-5.2.7.0.fw.ihex
551   276   ea3e254335b1845bb646be5db265343978c0f7e4  firmware/bnx2x-e1h-5.2.13.0.fw.ihex
⋊> ~/P/linux on master ⨯ ~/bin/largestFiles.py -p                                                                00:46:46
Finding the 10 largest objects…
Finding object paths…

All sizes in kB. The pack column is the compressed size of the object inside the pack file.

size  pack  hash                                      path
2053  218   ee1da0cbc84d6ccf5d0714602e4364b6b8a85a32  drivers/gpu/drm/amd/include/asic_reg/bif/bif_5_1_sh_mask.h
1484  121   6fa98ea0ae40f9a38256f11e5dc270363f785aee  sound/soc/codecs/wm8962-tables.c
1254  110   1ddc4183a1c91cd08a05557d575b3bcdc90a1ea6  drivers/gpu/drm/amd/include/asic_reg/dce/dce_11_2_sh_mask.h
1180  105   a438c2b6e2801327bff0746ec8e9b42f5cc2d70d  drivers/gpu/drm/amd/include/asic_reg/dce/dce_11_0_sh_mask.h
1086  453   1f22c9ab66d407accb797e532ab194b043970a85  drivers/net/bnx2x_init_values.h
1069  79    c3f4f4387e162caced02ad58dc36938345c1cd05  drivers/media/dvb/frontends/drxk_map.h
989   72    0bbd4ae1f52466923e9103db8a5b40c7cb822f58  drivers/media/dvb-frontends/drx39xyj/drxj_map.h
982   98    4509c8237db508c5e85d3a63dd38814c1e05238a  drivers/gpu/drm/amd/include/asic_reg/gca/gfx_7_2_sh_mask.h
977   404   63019055e4bb56962ba5767f662136a787921c4f  drivers/net/bnx2x_init_values.h
964   176   1e701bc075b907521c0e0cd4a06e41858cda0a5f  crypto/testmgr.h
⋊> ~/P/linux on master ⨯ ~/bin/large.sh                                                                          00:52:19
All sizes are in kB's. The pack column is the size of the object, compressed, inside the pack file.
size  pack  SHA                                       location
2053  218   ee1da0cbc84d6ccf5d0714602e4364b6b8a85a32  drivers/gpu/drm/amd/include/asic_reg/bif/bif_5_1_sh_mask.h
1484  121   6fa98ea0ae40f9a38256f11e5dc270363f785aee  sound/soc/codecs/wm8962-tables.c
1254  110   1ddc4183a1c91cd08a05557d575b3bcdc90a1ea6  drivers/gpu/drm/amd/include/asic_reg/dce/dce_11_2_sh_mask.h
1180  105   a438c2b6e2801327bff0746ec8e9b42f5cc2d70d  drivers/gpu/drm/amd/include/asic_reg/dce/dce_11_0_sh_mask.h
1086  453   1f22c9ab66d407accb797e532ab194b043970a85  drivers/net/bnx2x_init_values.h
1069  79    c3f4f4387e162caced02ad58dc36938345c1cd05  drivers/media/dvb/frontends/drxk_map.h
989   72    0bbd4ae1f52466923e9103db8a5b40c7cb822f58  drivers/media/dvb-frontends/drx39xyj/drxj_map.h
982   98    4509c8237db508c5e85d3a63dd38814c1e05238a  drivers/gpu/drm/amd/include/asic_reg/gca/gfx_7_2_sh_mask.h
977   404   63019055e4bb56962ba5767f662136a787921c4f  drivers/net/bnx2x_init_values.h
964   176   1e701bc075b907521c0e0cd4a06e41858cda0a5f  crypto/testmgr.h
⋊> ~/P/linux on master ⨯                                                                                         01:02:57

@akrueger
Copy link
Author

akrueger commented Mar 8, 2017

I tried with -p and I got similar results:

time python ~/git_find_big_blobs.py -p
Finding the 10 largest objects…
Finding object paths…

All sizes in kB. The pack column is the compressed size of the object inside the pack file.

size  pack  hash                                      path
2053  218   ee1da0cbc84d6ccf5d0714602e4364b6b8a85a32  drivers/gpu/drm/amd/include/asic_reg/bif/bif_5_1_sh_mask.h
1484  121   6fa98ea0ae40f9a38256f11e5dc270363f785aee  sound/soc/codecs/wm8962-tables.c
1254  110   1ddc4183a1c91cd08a05557d575b3bcdc90a1ea6  drivers/gpu/drm/amd/include/asic_reg/dce/dce_11_2_sh_mask.h
1180  105   a438c2b6e2801327bff0746ec8e9b42f5cc2d70d  drivers/gpu/drm/amd/include/asic_reg/dce/dce_11_0_sh_mask.h
1086  453   1f22c9ab66d407accb797e532ab194b043970a85  drivers/net/bnx2x_init_values.h
1069  79    c3f4f4387e162caced02ad58dc36938345c1cd05  drivers/media/dvb/frontends/drxk_map.h
989   72    0bbd4ae1f52466923e9103db8a5b40c7cb822f58  drivers/media/dvb-frontends/drx39xyj/drxj_map.h
982   98    4509c8237db508c5e85d3a63dd38814c1e05238a  drivers/gpu/drm/amd/include/asic_reg/gca/gfx_7_2_sh_mask.h
977   404   63019055e4bb56962ba5767f662136a787921c4f  drivers/net/bnx2x_init_values.h
python ~/git_find_big_blobs.py -p  552.18s user 56.91s system 135% cpu 7:28.68 total
time bash ~/git_find_big_blobs.sh
All sizes are in kB's. The pack column is the size of the object, compressed, inside the pack file.
size  pack  SHA                                       location
2053  218   ee1da0cbc84d6ccf5d0714602e4364b6b8a85a32  drivers/gpu/drm/amd/include/asic_reg/bif/bif_5_1_sh_mask.h
1484  121   6fa98ea0ae40f9a38256f11e5dc270363f785aee  sound/soc/codecs/wm8962-tables.c
1254  110   1ddc4183a1c91cd08a05557d575b3bcdc90a1ea6  drivers/gpu/drm/amd/include/asic_reg/dce/dce_11_2_sh_mask.h
1180  105   a438c2b6e2801327bff0746ec8e9b42f5cc2d70d  drivers/gpu/drm/amd/include/asic_reg/dce/dce_11_0_sh_mask.h
1086  453   1f22c9ab66d407accb797e532ab194b043970a85  drivers/net/bnx2x_init_values.h
1069  79    c3f4f4387e162caced02ad58dc36938345c1cd05  drivers/media/dvb/frontends/drxk_map.h
989   72    0bbd4ae1f52466923e9103db8a5b40c7cb822f58  drivers/media/dvb-frontends/drx39xyj/drxj_map.h
982   98    4509c8237db508c5e85d3a63dd38814c1e05238a  drivers/gpu/drm/amd/include/asic_reg/gca/gfx_7_2_sh_mask.h
977   404   63019055e4bb56962ba5767f662136a787921c4f  drivers/net/bnx2x_init_values.h
964   176   1e701bc075b907521c0e0cd4a06e41858cda0a5f  crypto/testmgr.h
bash ~/git_find_big_blobs.sh  1133.82s user 84.65s system 124% cpu 16:21.06 total

It appears that the 10th entry is missing from the python code, but I'm not sure why... either way this definitely seems to be the reason for the discrepancy. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment