Skip to content

Instantly share code, notes, and snippets.

@dexter-stpierre
Last active December 17, 2023 14:07
Show Gist options
  • Save dexter-stpierre/85cdff7e36d0c2dcf1b1aefa5ee4a47c to your computer and use it in GitHub Desktop.
Save dexter-stpierre/85cdff7e36d0c2dcf1b1aefa5ee4a47c to your computer and use it in GitHub Desktop.
Readwise-Obsidian Workflow

Motivation

I decided to create this workflow based on my belief that if information is synced into my Obsidian vault then that data should be completely owned by the tool that synced it. It should never be edited within my vault, and it should be able to be deleted and resynced (hard resync) from that tool with little to no impact to my vault. If you don't agree with this mindset then no worries! This workflow can still help you keep track of which articles you have processed and which ones have new highlights to process. Even if you think this workflow isn't for you, you might learn something in this that might help you solve some problems in your own vault.

Since I only edit highlights and notes from Readwise in Readwise I had to have a way of knowing which highlights I have reviewed/processed and which highlights I still need to do that for. Since I wanted to be abnle to do a hard resync that had to live completely outside of the information that Readwise tracks in my vault. So I created a system in which I can have a list of processed documents with the date that I processed them and use Dataview to compare the dates that an article was processed on with the latest highlights taken.

I also had never used Dataview and this project gave me a good opportunity to do a deep dive into the Dataview API and I learned a ton while doing it.

The Readwise Setup

This workflow requires 2 simple things to be in your Readwise export

  1. You must use the Readwise highlight_id as the block id for your highlights. This creates a stable reference to the block so when you write notes based on your highlights and create a block reference to the highlight it will not be broken when do a hard resync. This strategy was invented by @TfTHacker and is detailed in this tweet
  2. You must include the highlight_date as part of your Readwise export. Most of this workflow will work out fine even if you don't include this, you just won't be able to see that new highlights were added to a document. This value must be set up to be a Dataview Date field in the format of [Date:: {highlight_date}] or Date:: {highlight_date}

Here's the preview of my Readwise export (the only important things are the things listed above):

image

The Log File

This entire workflow hinges on a single file where you track the documents that you have processed in Obsidian. It can be created anywhere in your vault and have any name. Mine is Processed Document Log, but yours can be called whatever you like This file uses Dataview tasks to be able to track all of the individual documents that you have processed. Each task is simply comprised of a Document field, and a processedAt field. As far as I can tell these both have to be inline fields to work properly. The task can also me marked as complete, or incomplete, so hopefully it won't impact your other workflows if you are currently using Dataview to manage tasks. A little tip: if you want a processed document to never show up in the Needs Additional Processing category simply remove the date from the entry and it will always show up in Processed. Another tip: if you only got part way through processing a document you can reference the date of the last highlight you processed an the article will show up under the Needs Additional Processing category. Just make sure there is a highlight dated for after the date you put down. Here's a screenshot of my log file that I made while testing this:

image

And here is a template for the log item that I made using the core template plugin [ ] [Document:: [[]]] [processedAt:: [[{{date}}]]]

The Query

And here is where everything gets connected. A 42 line JS query that pulls in all of your Readwise documents and sorts them into 3 groups: Processed, Needs additional Processing, and Unprocessed. All you should need to edit is the first 3 lines to match your date format that you have for your dates (dateFormat), your folder where Readwise places your documents (readwiseFolder), and the name of your file where you log processed documents using the process outlined above (processedLog). As a little tip, if you want to have different pages for each document type you can simply target the subfolders that Readwise creates. Without further ado, here is the query that powers this whole workflow:

```dataviewjs
const dateFormat = 'YYYY-MM-DD';
const readwiseFolder = 'Readwise';
const processedLog = 'Processed Document Log'

const groupDocuments = (documents, processedDocumentsLinks) => {
  const sortedDocuments = documents.reduce((sortedDocuments, document) => {
    const processedLink = processedDocumentsLinks.find((processedDocument) => processedDocument.link.equals(document.file.link));

    const highlightedAtArray = document.date;

    if (!processedLink) sortedDocuments.unprocessedDocuments.push(document);
    else if (processedLink.processedDate.isBefore(moment(highlightedAtArray[highlightedAtArray.length - 1], dateFormat))) sortedDocuments.needsAdditionalProcessing.push(document);
    else sortedDocuments.processedDocuments.push(document);
    return sortedDocuments;
  }, {processedDocuments: [], needsAdditionalProcessing: [], unprocessedDocuments: []});

  return {
    processedDocuments: dv.array(sortedDocuments.processedDocuments),
    needsAdditionalProcessing: dv.array(sortedDocuments.needsAdditionalProcessing),
    unprocessedDocuments: dv.array(sortedDocuments.unprocessedDocuments),
  };
};

const processedDocumentsPage = dv.page(processedLog);

const links = processedDocumentsPage.file.tasks.map((task) => ({link: dv.page(task.document.path).file.link, processedDate: moment(task.processedAt, dateFormat)}));

const documents = dv.pages(`"${readwiseFolder}"`);

const {
  processedDocuments,
  needsAdditionalProcessing,
  unprocessedDocuments,
} = groupDocuments(documents.array(), links);

dv.header(1, `Processed Documents (${processedDocuments.length})`);
dv.list(dv.array(processedDocuments).file.link);

dv.header(1, `Needs additional Processing (${needsAdditionalProcessing.length})`);
dv.list(dv.array(needsAdditionalProcessing).file.link);

dv.header(1, `Unprocessed Documents (${unprocessedDocuments.length})`);
dv.list(dv.array(unprocessedDocuments).file.link);
```

I'll do a quick walk through of what exactly this query is doing:

  1. We assign a few variables so we can reuse them throughout the query, and to make customization easier
  2. Create the groupDocuments query, which I will walk though in a little bit.
  3. Query the processed document log so we can extract the tasks from it.
  4. Extract the document and date links from each task, and query the page that the task is attached to sand pull the link out of it. Transform the processedAt date to a moment object for easy comparing later on
  5. Query all documents from the Readwise folder
  6. Use groupDocuments to sort the documents into 3 groups: Processed, Needs additional Processing, and Unprocessed. Here's a walkthrough of that process
    1. groupDocuments receives all of the Readwise documents as an array, and the links/dates that we created earlier.
    2. Loop through the objects (using Array.reduce) which will sort them into those groups. We do that through the following:
      1. Loop through the processed documents to see if there are any matches using Dataviews link compare method
      2. Grab the array of dates referenced in the document
      3. If there is no processed link (document is not listed in your processed log) then add the document to the unprocessed list
      4. If the date that the document was processed is before the date of the newest highlight (new highlights) then add to the needs additional processing list
      5. Otherwise add to the processed list
    3. Transform the created arrays into Dataview arrays for display
  7. Display each group under a header that shows the total count in the group. If you want them to be in a different order or want to have a group not be displayed, simply move or delete the lines. But make sure that you keep the headers and lists together, or move both if you delete them.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment