Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save charlesroper/dd68945379334503835e8ef3d08efe83 to your computer and use it in GitHub Desktop.
Save charlesroper/dd68945379334503835e8ef3d08efe83 to your computer and use it in GitHub Desktop.
How to find what's using storage space in Linux.md

How to find what's using storage space in Linux

#linux #bash #cli #filesystem

See this post on GitHub for context.

We can use the du command to determine disk usage. We need to combine it with the find command to check only file sizes and exclude directory sizes.

Manual pages:

List directories, current level only (depth 1), in human readable format, then sort in reverse order in human format. Human sorting takes into account, e.g., order of SI prefixes such as K, MB, GB, etc:

du -h -d 1 | sort -hr

or

du --human-readable --max-depth=1 | sort --human-numeric-sort --reverse

Same as above, but include --all files as well:

du -ahd 1 | sort -hr

or

du --all --human-readable --max-depth=1 | sort --human-numeric-sort --reverse

Top 20 biggest directories:

du -ah | sort -hr | head -n 20

or

du --all --human-readable | sort --human-numeric-sort --reverse | head --lines=20

Find only files (-type f) and execute du (du -ah {} +) on them. Use human numbers (SI prefixes) for the size values, human sort them, and return just the top 50 results:

find . -type f -exec du -ah {} + | sort -hr | head -n 50

Exclude the .git directory:

find . -not -path "./.git/*" -type f -exec du -ah {} + | sort -hr

We can output the results to bat to make them easier to browse (bat is non standard and needs to be installed; if you can’t install it, try less or nano instead).

find . -not -path "./.git/*" -type f -exec du -ah {} + | sort -hr | bat

Bonus

We could also use the fd command, which is a more modern version of the old GNU find utility, is easier to use, and is faster.

fd . --hidden --type=file --exclude='.git' -x du --all --human-readable | sort -hr | bat

or

fd . -H -tf -E '.git' -x du -ah | sort -hr | bat

We can also do the same in Nushell. It’s a touch more verbose, but is very readable and easy to understand at a glance. In this case we’re

  • Using Nushell’s own ls command to find everything in current directory and under (using the glob pattern **/*)
  • Excluding everything with .git or node_modules in the name using a regex pattern
  • Excluding all directories (so we’re returning files only),
  • Sorting intelligently by size in reverse order (largest first),
  • Selecting only the size and the name columns
  • Converting to tsv format
  • Copy the result to the clipboard ready for easy pasting into a spreadsheet.
ls -a **/* | where name !~ '\.git\\|node_modules' and type != dir | sort-by size -r | select size name | to tsv | clip

The command is longer than the typical Linux command, but it’s much easier to read at a glance and remember how to write, especially as Nushell comes with excellent command completion support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment