I was introduced to a simple optimization problem by one of my interns when working at Ericsson. He was about to participate in a friendly competition where they were asked to find a performant solution to the problem:
You have a file containing up to 6 million lines where each line has the following
form: ABC123
. Letters used are A-Z
except I
, Q
and V
. Digits used are 0-9
.
You only need to answer yes or no to whether the file contains duplicates.
The performance measurement shall include the cost of reading file to memory.
You can find test data here.