Skip to content

Instantly share code, notes, and snippets.

@jwmcgettigan
Last active June 30, 2024 16:31
Show Gist options
  • Save jwmcgettigan/0bf7cd39947764896735997056ca74d7 to your computer and use it in GitHub Desktop.
Save jwmcgettigan/0bf7cd39947764896735997056ca74d7 to your computer and use it in GitHub Desktop.
Identifies and removes duplicate 'items' and 'folders' from your Bitwarden vault. 🎃
# Copyright © 2023 Justin McGettigan
# Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
# The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
# THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
# This script will pull all of your vault 'items' using the 'bw list items' command and then it will compare
# all properties that are not inherently unique from the returned JSON to determine if they are duplicates.
# Note: It removes older duplicates first - the newest copy of the 'item' will be the only one to remain.
# You can simply flip the '>' sign to '<' if you want to preserve the oldest 'item' instead.
#
# Setup Steps
# 1. You must install Bitwarden CLI first: https://bitwarden.com/help/cli/#download-and-install
# 2. Login to the CLI with the 'bw login' command. You need your session key setup before continuing: https://bitwarden.com/help/cli/#using-a-session-key
# 3. Make sure to backup your 'items'. You can use the 'bw export' command to do so: https://bitwarden.com/help/cli/#export
# 4. Run this python script and your duplicate 'items' will start being deleted. https://bitwarden.com/help/cli/#delete
# Note: I am NOT using the '--permanent' flag. This means you can restore anything this script deletes within 30 days.
# Note2: The deletion process is pretty slow (1-2/items per second) so you'll likely need to let it run for a while.
import json
import hashlib
import subprocess
item_dict = {}
# Get the JSON data for each item in the vault
output = subprocess.check_output(['bw', 'list', 'items'])
items = json.loads(output)
for item in items:
# Remove unique fields from the item data
item_data = item.copy()
del item_data['id']
del item_data['folderId']
del item_data['revisionDate']
del item_data['creationDate']
del item_data['deletedDate']
# Calculate a hash of the item data
item_hash = hashlib.sha256(str(item_data).encode('utf-8')).hexdigest()
# Check if we've seen this item before
if item_hash in item_dict:
# Compare the revisionDate to see which item is newer
if item['revisionDate'] > item_dict[item_hash]['revisionDate']:
print(f'Duplicate item found: {item["name"]}')
subprocess.run(['bw', 'delete', 'item', item_dict[item_hash]['id']])
print(f'Deleted older item "{item_dict[item_hash]["name"]}".')
item_dict[item_hash] = item
else:
print(f'Duplicate item found: {item["name"]}')
subprocess.run(['bw', 'delete', 'item', item['id']])
print(f'Deleted older item "{item["name"]}".')
else:
item_dict[item_hash] = item
@DavideRedivaD
Copy link

Hi, is there a report at the end of the process that lets you know which one it's been removed and which has not?
Plus, does it compare also the additional information stored in the password tab? Like TOTP, notes, custom fields, etc?
In addition, does it compare and remove only passwords or also other items? Like cards and secret notes?
I ask just to know how it works and how to better use it.
It would be cool if it lets you choose what to keep and what to remove, but it doesn't seem to let you do that.

@jwmcgettigan
Copy link
Author

@DavideRedivaD

is there a report at the end of the process that lets you know which one it's been removed and which has not?

Not in its current form no, but the script does print out the 'name' of items as it deletes them.

does it compare also the additional information stored in the password tab? Like TOTP, notes, custom fields, etc?
In addition, does it compare and remove only passwords or also other items? Like cards and secret notes?

It should be yes to both. Since we use the bw list items command, the script should check for duplicates for anything that bitwarden considers an 'item'. https://bitwarden.com/help/managing-items/
Do note that I only created this script for my specific use case so it hasn't been thoroughly tested to determine if it works for all of the scenarios you mentioned - theoretically though, it should work for all of those scenarios.

Here is an example of a bitwarden item:

{
	"object": "item",
	"id": "1e113c10-881f-4f01-b88d-afd20162981b",
	"organizationId": null,
	"folderId": "6ab751e1-74ad-4c38-912a-afd20162981c",
	"type": 1,
	"reprompt": 0,
	"name": "",
	"notes": null,
	"favorite": false,
	"login": {
		"uris": [
			{
				"match": null,
				"uri": ""
			}
		],
		"username": "",
		"password": "",
		"totp": null,
		"passwordRevisionDate": null
	},
	"collectionIds": [],
	"revisionDate": "2023-03-27T21:31:02.240Z",
	"creationDate": "2023-03-27T21:31:02.240Z",
	"deletedDate": null
}

Before checking for duplicates, these lines in the script remove the fields from the 'item' objects that are inherently unique regardless of item content.

del item_data['id']
del item_data['folderId']
del item_data['revisionDate']
del item_data['creationDate']
del item_data['deletedDate']

It would be cool if it lets you choose what to keep and what to remove, but it doesn't seem to let you do that.

That is correct as I just threw this script together quickly for my specific use case. Feel free to copy this script and add that functionality if you would like - I personally don't have a need for it and don't plan to add such functionality to this script.

I ask just to know how it works and how to better use it.

Thank you for asking! I'm happy to help explain what I can. I didn't have much luck with the other solutions online so I created this one and hope that it might help others as well.

@rcurwen
Copy link

rcurwen commented Jun 11, 2023

Awesome - works a treat and I can see what it does! Thanks, mate.

@hoosierEE
Copy link

I had almost the same idea, but not using the CLI app.

  1. export old.json from bitwarden
  2. purge vault
  3. run script to make new.json
  4. import
import json,sys

with open(sys.argv[1]) as f:
    d = json.load(f)

dd = {}
for item in d['items']:
    dd[repr({**item,'id':0})] = item

d['items'] = list(dd.values())
with open(sys.argv[2], 'w', encoding='utf-8') as f:
    json.dump(d, f, indent=2)

Usage:

python3 dedup.py old.json new.json

Adding the other fields (inspired by your script):

import json,sys

with open(sys.argv[1]) as f:
    d = json.load(f)

dd = {}
for item in d['items']:
    remove = {'id':0,'folderId':0,'revisionDate':0,'creationDate':0,'deletedDate':0}
    dd[repr({**item, **remove})] = item

d['items'] = list(dd.values())
with open(sys.argv[2], 'w', encoding='utf-8') as f:
    json.dump(d, f, indent=2)

@iansquenet
Copy link

It keeps asking for the master password every time it finds a duplicate item.

@jwmcgettigan
Copy link
Author

jwmcgettigan commented Sep 6, 2023

It keeps asking for the master password every time it finds a duplicate item.

That shouldn't happen if you've setup the session key properly: https://bitwarden.com/help/cli/#using-a-session-key

image

@ewa
Copy link

ewa commented Sep 22, 2023

This is awesome. I'd like to use it and extend it a little bit. Is that okay? Do you want to assign a specific license to it?

@jwmcgettigan
Copy link
Author

This is awesome. I'd like to use it and extend it a little bit. Is that okay? Do you want to assign a specific license to it?

@ewa I've added MIT license text to the top of the gist. Feel free to use the script however you'd like! 👍

@jwmcgettigan
Copy link
Author

jwmcgettigan commented Sep 24, 2023

Since many people have found this gist useful and I was feeling motivated, I've finished a pretty major overhaul of this script.

To any future readers, all comments preceding this one were referring to this revision.

Changelog

  • The script now also deletes duplicate folders where the previous version only deleted duplicate items.
  • The script output is now much fancier with colors and loading bars but does now require installing the colorama and tqdm packages.
  • The script now automatically syncs your Bitwarden vault to ensure the data is up-to-date. This can be disabled with the --no-sync flag.
  • Session setup can now be taken care of by the script itself. If the user wants to opt-out of it they can use the --no-auth flag.
  • Added the --dry-run flag for if you want to run the script without actually deleting anything.
  • Added the --ignore-history flag for if you want to ignore password history when determining duplicates.
  • Added the --oldest flag for if you want to keep the older duplicates instead of the newer ones.
  • Added the --empty-folders flag for if you want empty folders to be included when determining duplicate folders.
  • The script now has a VERSION associated with it that can be checked with the --version flag.
  • Added a summary to the end of a successful run of the script.
  • All functions are well documented and the code should be very readable. Unfortunately the script is far larger as a result.

Example Usage (Preview Image)

image

@ewa
Copy link

ewa commented Sep 24, 2023

@jwmcgettigan thanks! Here's my forked version (merged up to before your overhaul) that I used. It's arguably cruftier than the original, but it includes a couple command-line options and automates the login/session-key process, as least in simple situations. Having used it successfully to solve my problem, I don't really expect to develop it any further, but if there's anything in there that you want to incorporate back into the real version, feel free to!

https://gist.github.com/ewa/f5e115628b955bf8cd1e0540116b135a

@shvchk
Copy link

shvchk commented Oct 28, 2023

May I suggest adding a shebang: #! /usr/bin/env python3 — so that script could be run using just filename, not python <filename>

@RalfZi
Copy link

RalfZi commented Nov 6, 2023

using the script I get the following error
File ".....bitwarden_duplicate_cleaner.py", line 317, in
def identify_duplicate_item(item: dict, item_dict: dict) -> dict | None:
TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'

I use Python 3.9.1
regards
Ralf

@jwmcgettigan
Copy link
Author

jwmcgettigan commented Nov 7, 2023

May I suggest adding a shebang: #! /usr/bin/env python3 — so that script could be run using just filename, not python <filename>

@shvchk Thanks! I've added it as you've suggested.

using the script I get the following error File ".....bitwarden_duplicate_cleaner.py", line 317, in def identify_duplicate_item(item: dict, item_dict: dict) -> dict | None: TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'

I use Python 3.9.1 regards Ralf

@RalfZi | was added in python 3.10 so any python version earlier than that will experience that error. Unfortunately, the script would have to be refactored to support earlier versions of python if you have that need.

@RalfZi
Copy link

RalfZi commented Nov 8, 2023

May I suggest adding a shebang: #! /usr/bin/env python3 — so that script could be run using just filename, not python <filename>

@shvchk Thanks! I've added it as you've suggested.

using the script I get the following error File ".....bitwarden_duplicate_cleaner.py", line 317, in def identify_duplicate_item(item: dict, item_dict: dict) -> dict | None: TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'
I use Python 3.9.1 regards Ralf

@RalfZi | was added in python 3.10 so any python version earlier than that will experience that error. Unfortunately, the script would have to be refactored to support earlier versions of python if you have that need.

Ok thanks so I had to update my python version

@JacobCarrell
Copy link

Can you please make this a proper repository? I think this has huge potential. However, I stumbled upon this deep into a link session, and it doesn't show up on github's default search. Additionally, instead of a long string of comments there is issue tracking, discussions, github pages for documentation, etc.

@iGallina
Copy link

I am having problems with the lockfile:

Error: Lock file is already being held
at /usr/local/Cellar/bitwarden-cli/2023.12.0/libexec/lib/node_modules/@bitwarden/cli/node_modules/proper-lockfile/lib/lockfile.js:53:43
at FSReqCallback.oncomplete (node:fs:191:23) {
code: 'ELOCKED',
file: '/Users/ian.gallina/Library/Application Support/Bitwarden CLI/data.json'

Even when I run the 'bw items list' manually, it does not work.
I even tried to close Chrome and Bitwarden to make sure there are no other processes using the file.

On the other hand, I don't find better information how to fix this on the Bitwarden docs.
Any help?

@iGallina
Copy link

I am having problems with the lockfile:

Error: Lock file is already being held at /usr/local/Cellar/bitwarden-cli/2023.12.0/libexec/lib/node_modules/@bitwarden/cli/node_modules/proper-lockfile/lib/lockfile.js:53:43 at FSReqCallback.oncomplete (node:fs:191:23) { code: 'ELOCKED', file: '/Users/ian.gallina/Library/Application Support/Bitwarden CLI/data.json'

Even when I run the 'bw items list' manually, it does not work. I even tried to close Chrome and Bitwarden to make sure there are no other processes using the file.

On the other hand, I don't find better information how to fix this on the Bitwarden docs. Any help?

Guys, just found a bug fixed a few hours ago on the Bitwarden that fix this.
bitwarden/clients#7126

@IvanLi-CN
Copy link

I am having problems with the lockfile:

Error: Lock file is already being held at /usr/local/Cellar/bitwarden-cli/2023.12.0/libexec/lib/node_modules/@bitwarden/cli/node_modules/proper-lockfile/lib/lockfile.js:53:43 at FSReqCallback.oncomplete (node:fs:191:23) { code: 'ELOCKED', file: '/Users/ian.gallina/Library/Application Support/Bitwarden CLI/data.json'

Even when I run the 'bw items list' manually, it does not work. I even tried to close Chrome and Bitwarden to make sure there are no other processes using the file.

On the other hand, I don't find better information how to fix this on the Bitwarden docs. Any help?

It seems that the currently released packages don't fix this issue. I was able to get it to work after returning bw from v2023.12.0 to v2023.10.0.

npm install -g @bitwarden/cli@2023.10.0

@jwmcgettigan
Copy link
Author

jwmcgettigan commented Dec 21, 2023

Can you please make this a proper repository? I think this has huge potential. However, I stumbled upon this deep into a link session, and it doesn't show up on github's default search. Additionally, instead of a long string of comments there is issue tracking, discussions, github pages for documentation, etc.

@JacobCarrell Thank you for the suggestion. Now that I have some time and there's sufficient interest, I'll spend some of it transitioning this gist to a repo.

It seems that the currently released packages don't fix this issue. I was able to get it to work after returning bw from v2023.12.0 to v2023.10.0.

@IvanLi-CN @iGallina Thank you for sharing the issue. It appears that the hotfix for bitwarden/clients#7126 was finally released so you should be able to use the latest version. I can run it without issue with version 2023.12.1.

@JakubHruska
Copy link

JakubHruska commented Mar 13, 2024

Hi, I'm having problem running the script.

==================================================
Bitwarden Duplicate Cleaner - Version 1.1.0
A script that deletes duplicate items and folders.
==================================================
Traceback (most recent call last):
  File "c:\Users\jakub\Downloads\bitwarden-dedup_python_script\bitwarden_duplicate_cleaner.py", line 641, in <module>
    check_bw_installed()
  File "c:\Users\jakub\Downloads\bitwarden-dedup_python_script\bitwarden_duplicate_cleaner.py", line 165, in check_bw_installed
    subprocess.check_output(['bw', '--version'])
  File "C:\Program Files\Python312\Lib\subprocess.py", line 466, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python312\Lib\subprocess.py", line 548, in run
    with Popen(*popenargs, **kwargs) as process:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python312\Lib\subprocess.py", line 1026, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\Program Files\Python312\Lib\subprocess.py", line 1538, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [WinError 2] Systém nemůže nalézt uvedený soubor

Tried running it in Python 3.10, but with same result.
I installed BitWarden CLI via npm, version 2024.2.1
Do I need to explicitely add bw to PATH? Because when I use it in any location, it is accessible - so I'm a bit confused.
Or perhaps did I the installation wrong?
I have a bit of expirience in Python and npm (when I was learning React at school), and it's my first time writing gist

Thank you for any help,
Regards, Jakub

@JakubHruska
Copy link

So, a little update.

I did a bit a digging and tinkering.
Found out that if I pass every usage of subprocess library with the shell=True argument, it fixes the error.
Although I'm not exactly sure why it works, I'm happy that I was able to get it working with my little knowledge of Python and cmd .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment