Skip to content

Instantly share code, notes, and snippets.

@Blakeinstein
Created January 11, 2023 19:50
Show Gist options
  • Save Blakeinstein/0cdf4a17ad3dc2e440e8dd389f5f7aba to your computer and use it in GitHub Desktop.
Save Blakeinstein/0cdf4a17ad3dc2e440e8dd389f5f7aba to your computer and use it in GitHub Desktop.
Move Microsoft EDGE collections to raindrop / csv

Move Microsoft edge collections to services like Raindrop (that can accept csv)

Get the sqlite db file

  1. You can find the db file in the following places
  • Windows: %LocalAppData%\Microsoft\Edge\User Data\Default\Collections
  • MacOS: ~/Library/Application Support/Microsoft Edge/Default/Collections
  • Linux: TBD
  1. Copy the files (esp. collectionsSQLite) to a seperate Folder.

Execute the script

  1. Install python 3.10 and then setup pandas via pip pip install pandas
  2. save collections_to_csv.py to the same folder with the db files from the previous step.
  3. Exec the script python collections_to_csv.py

Then you can simply import the file normally to raindrop.

import json
import sqlite3
import pandas as pd
con = sqlite3.connect("./collectionsSQLite")
cur = con.cursor()
df = pd.read_sql_query(
"""
SELECT i.title as title, c.title as folder, i.source as data, i.date_created as created
FROM collections as c, items as i, collections_items_relationship as r
WHERE r.item_id == i.id and r.parent_id == c.id
""",
con
)
def blob_to_json(col):
my_json = col.decode('utf8')
d = json.loads(my_json)
return d["url"]
df["description"] = ""
df["tags"] = ""
df["created"] = df["created"].astype(int)
df["url"] = df["data"].apply(blob_to_json)
df.to_csv(
"out.csv",
index=False,
columns=["url", "folder", "title", "description", "tags", "created"]
)
@ework221b
Copy link

thanks

@Kamek437
Copy link

You sir are my hero. Saved me like three days futsing with this. Genius, thanks man your my hero today.

@Kamek437
Copy link

Shoot it doesn't seem to have exported everything for some reason. I'm not sure what happened exactly. If I upload my db files can someone take a look? I'm not that great with sql.

@Blakeinstein
Copy link
Author

@Kamek437 I can do that for you! If possible, instead of sending the db file for privacy reasons, can you instead perhaps just send the table schema?

Assuming the script ran correctly with no errors (which means you have everything setup correctly)

import json
import sqlite3
import pandas as pd

con = sqlite3.connect("./collectionsSQLite")
cur = con.cursor()

for table_name in ['collections', 'items', 'collections_items_relationship']:
	df = pd.read_sql_query(
    f"PRAGMA table_info({table_name});",
    con
  )
  print(df, end="\n\n");

@Kamek437
Copy link

Thanks man! Sorry work stuff kept me. Here is the output:

    cid                    name         type  notnull dflt_value  pk
0     0                      id  LONGVARCHAR        0       None   1
1     1            date_created         REAL        1       None   0
2     2           date_modified         REAL        1       None   0
3     3                   title  LONGVARCHAR        1       None   0
4     4                position      INTEGER        1       None   0
5     5             is_syncable      INTEGER        0          1   0
6     6          suggestion_url  LONGVARCHAR        0       None   0
7     7    suggestion_dismissed      INTEGER        0       None   0
8     8         suggestion_type      INTEGER        0       None   0
9     9               thumbnail         BLOB        0       None   0
10   10     is_custom_thumbnail      INTEGER        1          0   0
11   11                     tag  LONGVARCHAR        0       None   0
12   12           thumbnail_url  LONGVARCHAR        0       None   0
13   13  is_marked_for_deletion      INTEGER        0       None   0

    cid                    name         type  notnull dflt_value  pk
0     0                      id  LONGVARCHAR        0       None   1
1     1            date_created         REAL        1       None   0
2     2           date_modified         REAL        1       None   0
3     3                   title  LONGVARCHAR        0       None   0
4     4                  source         BLOB        0       None   0
5     5             entity_blob         BLOB        0       None   0
6     6             favicon_url  LONGVARCHAR        0       None   0
7     7    canonical_image_data         BLOB        0       None   0
8     8     canonical_image_url  LONGVARCHAR        0       None   0
9     9            text_content  LONGVARCHAR        0       None   0
10   10            html_content  LONGVARCHAR        0       None   0
11   11                    type  LONGVARCHAR        0       None   0
12   12             progressing      INTEGER        0       None   0
13   13             is_syncable      INTEGER        0          1   0
14   14                   color  LONGVARCHAR        0       None   0
15   15        third_party_data         BLOB        0       None   0
16   16              remote_url  LONGVARCHAR        0       None   0
17   17                     tag  LONGVARCHAR        0       None   0
18   18  is_marked_for_deletion      INTEGER        0       None   0

   cid       name         type  notnull dflt_value  pk
0    0    item_id  LONGVARCHAR        1       None   0
1    1  parent_id  LONGVARCHAR        1       None   0
2    2   position      INTEGER        1       None   0


@Kamek437
Copy link

Kamek437 commented Sep 21, 2023

If you polished this up a bit and did a Medium article or something I bet you you'd get a ton of views man.

@jbrown1597
Copy link

Jesus, THANK YOU.

Microsoft is so frustrating sometimes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment