Skip to content

Instantly share code, notes, and snippets.

@GeorgiPachov
Created June 25, 2014 19:44
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save GeorgiPachov/039d2c339358dbfcc650 to your computer and use it in GitHub Desktop.
Save GeorgiPachov/039d2c339358dbfcc650 to your computer and use it in GitHub Desktop.
Spoilers on last task of Files-1 problems @ HackBulgaria, Core-Java1

Approach1: Group all files by the hashcode of their bytes.

Use a Map<Long, List<Path>> to represent them. Here, the key is the hashCode of all the bytes from a File.

The algorithm:

For each file(ancestor) of the folder given
   bytes = Files.read(file);
   hashCode = Arrays.hashCode(bytes);
   map.get(hashCode).add(path);
   

In this way you can form the groups. Do not worry about collisions. If you are done with every task - worry about them :D How can you quickly detect colissions with very high probability? Without reading and comparing the files?

Approach2: Group all path by file size. Two files cannot be duplicates if they are of different sizes, right?

So once we form the groups, in each group, every two files are either duplicates, or have different content with the same size. We start builiding the class for our first file in the group.

for file from 2... group.length
   if (getBytes(firstFile).equals(getBytes(file)) (1)
     groups.get(firstFile).add(file); //the currently checked file is in the group of the first file
   else //the currently checked file is not in the group of the first file, we start making his  group from the remaining files.

If you are having performance problems, (1) can be optimized not to read all the bytes, but to read them and compare them in chunks. If two files are different, usually the difference will be pretty early in their bytes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment