Skip to content

Instantly share code, notes, and snippets.

@syxanash
Last active January 5, 2023 17:04
Show Gist options
  • Save syxanash/8fd43d19ba52b2ff928630803f0634a2 to your computer and use it in GitHub Desktop.
Save syxanash/8fd43d19ba52b2ff928630803f0634a2 to your computer and use it in GitHub Desktop.
quick and dirty find duplicate urls in sample.json file
const fs = require('fs');
const _ = require('lodash');
fs.readFile('sample.json', 'utf8', (err, data) => {
if (err) {
console.error(err);
return;
}
const websiteLinks = JSON.parse(data);
const allItems = _.flatten(websiteLinks.map((item) => item.links));
const uniqueLinks = new Set();
allItems.forEach((item, i) => {
const website = item.url.match(/^(?:https?:\/\/)?(?:[^@\n]+@)?(?:www\.)?([^:\/\n?]*(\/.*)?)/mi)[1];
if (uniqueLinks.has(website)) {
console.log(`${website} is a duplicate`);
} else {
uniqueLinks.add(website);
}
})
});
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment