Skip to content

Instantly share code, notes, and snippets.

@raghavsethi
Created November 12, 2020 00:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save raghavsethi/6a1e51a3eec76814228b43b933c09e6c to your computer and use it in GitHub Desktop.
Save raghavsethi/6a1e51a3eec76814228b43b933c09e6c to your computer and use it in GitHub Desktop.
Duplicate Files
Write a function that identifies sets of files with identical contents.
find_dupes(root_path) → sets/lists of file paths that have identical contents
find_dupes(“/home/whatever”) → [
[".bashrc", "Backups/2017_bashrc"],
["Photos/Vacation/DSC1234.JPG", "profile.jpeg", ".trash/lej2dp28/87msnlgyr"],
]
Context:
- Imagine this function being packaged up into a command-line tool and people running it on their laptops or servers.
- For scale, imagine hard drives with at most 2 TB of data and at most 1 million files.
For traversing the filesystem, use these library functions:
- list_folder(path) → list of immediate file and folder children
- is_folder(path) → boolean
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment