public
Last active

Easiest way to find duplicate values in a JavaScript array - Native unique function implementation

  • Download Gist
gistfile1.js
JavaScript
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
//lo sauer, 2011; lsauer.com
/**
* I saw this thread: http://stackoverflow.com/questions/840781/easiest-way-to-find-duplicate-values-in-a-javascript-array
* The solutions above lacked the elegance that can be done a with map-reduce-like operations
* Since this implementation works with native functions, the speed is in most circumstances faster
* than any solution using scripted-logic
* Additionally, I needed to quickly filter duplicate url-entries for: http://lsauer.github.com/chrome-session-restore/
*/
//copy and paste: without error handling
Array.prototype.unique = function(){return this.sort().filter( function(v,i,o){if(i>=0 && v!==o[i-1]) return v;});}
//copy and paste: with error handling
Array.prototype.unique = function(){if(!(this instanceof Array))throw TypeError('Not an Array!'); return this.sort().filter( function(v,i,o){if(i>=0 && v!==o[i-1]) return v;});}
 
 
/**
* Numbers
*/
var arr = [324,3,32,5,52,2100,1,20,2,3,3,2,2,2,1,1,1];
//1. sorting / map
var a = arr.sort();
>>>[1, 1, 1, 1, 2, 2, 2, 2, 20, 2100, 3, 3, 3, 32, 324, 5, 52]
//2. reduce
//Note: if you need to copy the array at any point use Array.slice()
a.filter( function(v,i,o){if(i>=0 && v!==o[i-1]) return v;});
[2, 20, 2100, 3, 32, 324, 5, 52]
 
 
/**
* Strings
*/
var a = 'Magic belongs to Jerry Harry Jerry Harry Potter and Banana Joe'.split(' ');
a = a.sort()
>>>["Banana", "Harry", "Harry", "Jerry", "Jerry", "Joe", "Magic", "Potter", "and", "belongs", "to"]
a.filter( function(v,i,o){if(i>=0 && v!==o[i-1]) return v;});
["Harry", "Jerry", "Joe", "Magic", "Potter", "and", "belongs", "to"]
 
/**
* Additional information
*/
//you can also use the compact implementation used in is-lib ( github.com/lsauer/is-library )
a.filter( function(v,i,o){ return 1+i&&v!==o[i-1]?v:0;});
//...or a case insensitive function
a.filter( function(v,i,o){ return !i||v&&!RegExp(o[i-1],'i').test(v)});
 
 
//example
var a = 'Magic belongs to Jerry Harry Jerry JERRY AND HARRY Harry Potter and Banana Joe'.split(' ');
a.sort()
>>>["AND", "Banana", "HARRY", "Harry", "Harry", "JERRY", "Jerry", "Jerry", "Joe", "Magic", "Potter", "and", "belongs", "to"]
a.filter( function(v,i,o){ return i&&v&&!RegExp(o[i-1],'i').test(v)?v:0});
>>>["Banana", "HARRY", "JERRY", "Joe", "Magic", "Potter", "and", "belongs", "to"]

Why are you returning the value in your filter function?
Ever tried [0,1,2].unique() ?

This doesn't work I'm afraid.
The filter function should return true or false, not the element itself. Filtering an array containing 0's would return an array with no 0. Simply remove the ?v:0 to fix it.
Also, I assume the i&& is for avoiding going out of bounds of the array, but it also means that the first element in the sorted array will not be included. For instance "Banana" is missing from the filtered string array. Remove the i&& to fix it. The array simply returns undefined when going out of bounds, so it will work unless you have an array where undefined is the first element in the sorted list, which fortunately only seems to happen when there are only undefineds in the array.

In conclusion return i&&v!==o[i-1]?v:0; should be return v!==o[i-1];

The RegExp version is a bit trickier but instead of checking for i&& you could check for !i|| which will always return the first element. It is also missing a semicolon.
So return i&&v&&!RegExp(o[i-1],'i').test(v)?v:0 should be !i||!RegExp(o[i-1],'i').test(v);

@mflodin. Thanks for your input, and I am sure it may be helpful to some.
Sorry for the delay, just looked up the filter-function and filtered 10.000 tags to ca 1000 in >100ms w. console output. Works fine. Since JS juggles with types like crazy this offers potential for errors but also code-sugar. In the context of the filter function post-evaluation within the function it not uncommon.
"Banana" is not missing, otherwise I wouldn't have posted the V8-engines JS 1.6 result. What JS version are you running, what engine?
i&& is simply heeding the left to right evaluation chain of the statements, such that the statement doesn't evaluate true nothing else is computed either. i.e.if( i ){....}

!i|| not helpful in this context, as OR evaluates the consecutive statements if the right statement evaluates to false. see above. All the more important for RegExp in a garbage collected memory context.
Cheers.lo

I'm using V8 3.8.9.19 at the moment. Not sure which version I was on last time, but your examples are still faulty. Even in your own gist: "Banana" is missing on line 35. "AND" is missing on line 51. There is an "and" on line 51, but the "AND" should take precedence. My guess is you haven't noticed this because you always had duplicates of the first item in the sorted array when using the filter live. Even though your examples don't.

To make it clearer, here are some examples with fewer elements.

// numbers
var a = [0, 1, 0];
a.sort();
>>>[0, 0, 1]
a.filter( function(v,i,o){if(i>0 && v!==o[i-1]) return v;}); // Your filter
>>>[1] // 0 is missing
a.filter( function(v,i,o){return v!==o[i-1];}); // My filter
>>>[0, 1]

// strings, case sensitive 
var a = "A B B".split(' ');
a.sort();
>>>["A", "B", "B"]
a.filter( function(v,i,o){if(i>0 && v!==o[i-1]) return v;}); // Your filter
>>>["B"] // A is missing
a.filter( function(v,i,o){return v!==o[i-1];}); // My filter
>>>["A", "B"]

// strings, case insensitive 
var a = "A B b".split(' ');
a.sort();
>>>["A", "B", "b"]
a.filter( function(v,i,o){ return i&&v&&!RegExp(o[i-1],'i').test(v)?v:0}); // Your filter
>>>["B"] // A is missing
a.filter( function(v,i,o){ return !i||v&&!RegExp(o[i-1],'i').test(v)}); // My filter
>>>["A", "B"]

But I must thank you for providing an excellent starting point. From your examples, I was able to make a version that worked correctly.

Cheers
/M

@mflodin: Have you already seen my fork?

@bergus Yeah, but I thought that was wrong too, remember? Turns out you had just forgotten to update the examples return lines (i.e. the >>> lines). But I actually ended up with basically the same functions as you anyway. I wish I had actually tried yours before discarding them based on the examples. =)

I just wanted to make @lsauer aware that his code might not work as expected.

Ok. To regard the index position 0, I could either remove 1+i&&.. or remove i&& altogether. 1+ to mflodin. thanks & cheers

I just saw my original filter function correctly stated 1+i&& all along, but got lost during its evolution in the altered filters underneath... go figure :).

What about strings with numbers such as Zone1, Zone 1 & 2, Zone 1, 2 & 3

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.