jaanli/accounting-with-large-language-models.md

## accounting-with-large-language-models.md

      
    Raw
  

              accounting-with-large-language-models.md
            
          
    Cents & Sensibility: The Ledger Luminaries' Revolution

I'm scared to account; scared of statements of business dealings, credits and debits, and tracking of all this money-related stuff. Scared of reciting these transaction histories, and yet reckoning with these computations and enumerations is as deathly certain as taxes.
It is silly that this problem has not been solved given fancy new technology like AI and large language models. It's also silly how much big institutions skim off the top, like 1.5–3.5% per credit card transaction, and our expanding web of subscriptions to Substacks, Youtubes and the like that become more numerous and Medusa-like the harder they are to track, aggregate, categorize, and analyze across instutitions. I'm also scared of Mint.com (I used it for many years) as they broker our data, but still need the help of software and technology to manage money. This feels like a data problem that should have been solved long ago 🤯
So the goal here was to spend a few hours on a weekend to see whether I could save my partner and I time in having visibility to dozens of accounts over multiple countries, currencies, and types of investments. Surprisingly, few companies have incentive to solve this problem in a cheap and accessible way, but large language models such as ChatGPT and the plethora of open source plain text accounting tools like beancount (https://beancount.github.io/) and Fava (https://beancount.github.io/fava/) make this substantially easier. I've read about these tools for years on Hacker News (https://news.ycombinator.com/) but have been too intimidated to try. It turns out that large language models, with the right prompt, can be coaxed into automating a lot of the features that Mint, Tillery, and other money management tools build.
For the goal of automatic categorization of transactions that is maintainable over time, this workflow saves time from worrying about building an artificial intelligence or machine learning model. That's because we all have access to a big AI model in the form of a large language model (Google/OpenAI are paying the electricity cost; or yourself if you use an open source local large language model like llamafile). Such a large language model removes the overhead of a custom machine learning model (purely for categorizing financial transactions) by creating a "deterministic" classifier in the form a python program that is likely more concise, maintainable, and interpretable than machine learning models trained purely for this task.
Results: transactions automatically categorized in a spreadsheet that can then be visualized (I don't own Microsoft Excel, so this is with Google Sheets; transactions anonymized and de-identified by GPT-4):

Google Sheets example:
https://docs.google.com/spreadsheets/d/1GUhttMom0QoMbHQ3M02zqtSh6UxE7dKnz0g3_H73xEM/edit?usp=sharing
Raw transactions data:
date,account,number,currency
2023-12-13,Expenses:Transfer               ,   150.00,USD
2023-12-13,Assets:Checking:BankName        ,  -150.00,USD
2023-12-15,Assets:Checking:BankName        ,    15.00,USD
2023-12-15,Expenses:Cash-withdrawal        ,   -15.00,USD
2024-01-04,Expenses:Deposit                ,    40.00,USD
2024-01-04,Assets:Checking:BankName        ,   -40.00,USD
2024-01-30,Assets:Checking:BankName        ,     9.99,USD
2024-01-30,Expenses:Utilities              ,    -9.99,USD
2024-01-30,Expenses:Healthcare-contribution,     0.01,USD
2024-01-30,Assets:Checking:BankName        ,    -0.01,USD
2024-02-15,Expenses:Healthcare-contribution,     5.90,USD
2024-02-15,Assets:Checking:BankName        ,    -5.90,USD
2024-02-26,Assets:Checking:BankName        ,     3.00,USD
2024-02-26,Expenses:Bank-fees              ,    -3.00,USD
2024-02-26,Assets:Checking:BankName        ,     3.50,USD
2024-02-26,Expenses:Bank-fees              ,    -3.50,USD
2024-02-28,Assets:Checking:BankName        ,     9.99,USD
2024-02-28,Expenses:Utilities              ,    -9.99,USD
2024-02-28,Assets:Checking:BankName        ,     6.08,USD
2024-02-28,Expenses:Miscellaneous          ,    -6.08,USD
2024-02-29,Expenses:Healthcare-contribution,     5.90,USD
2024-02-29,Assets:Checking:BankName        ,    -5.90,USD
2024-03-05,Expenses:Deposit                ,     5.00,USD
2024-03-05,Assets:Checking:BankName        ,    -5.00,USD
2024-03-06,Assets:Checking:BankName        ,     4.11,USD
2024-03-06,Expenses:Rent                   ,    -4.11,USD
2024-03-06,Expenses:Healthcare-contribution,     6.37,USD
2024-03-06,Assets:Checking:BankName        ,    -6.37,USD
2024-03-06,Expenses:Healthcare-contribution,     1.31,USD
2024-03-06,Assets:Checking:BankName        ,    -1.31,USD

Core criteria were:

automatic categorization of transactions that is flexible and easily editable across all historical transactions
transparency
security
ease of use
ease of changing tools
compatibility with Visual Studio Code or other plain text editors / viewers of large volumes of raw transactions data
< 1 hour per quarter for adding new transactions
plain text and export to CSV (for maximum interoperability and compatibility with current and future large language models)
export to Excel/Google Sheets-compatible format like Excel for making dashboards / visualizations / using Business Intelligence tools as needed
something my partner could learn (their background is not technical and they are proficient in prompt engineering).

Please note, this involves sharing private transaction data with for-profit companies that have lax security and are vulnerable to prompt injection attacks and other scary lacks of security practices. The security of closed-source large language models such as OpenAI's ChatGPT or Anthropic's Claude is unclear and I don't recommend sharing financial data with them (and leaks have been reported, in addition to lax security practices).

Assume that any information you provide these models will be leaked, including sensitive health & financial data, or their intersection.

Paid alternatives include Tiller (https://www.tillerhq.com/pricing/ - $79/year), and more secure alternatives involve hosting a large language model locally, such as via Llamafile (https://github.com/Mozilla-Ocho/llamafile).
Open source alternatives include recent tools like Actual (https://github.com/actualbudget/actual) and Maybe (https://github.com/maybe-finance/maybe) but these do not reliably support multiple currencies or investment accounts. That narrowed it down to Beancount and Ledger, both of which seem fine. Beancount has Fava (a web interface: https://beancount.github.io/fava/) and Ledger has Paisa (https://paisa.fyi/); converting between these formats should be fine to then get the best user experience & dashboards.
Instructions for use with a bank such as Chase


Sign in and download all transactions as a CSV. Copy the contents of the CSV and paste them into a prompt then.
Copy and paste the below prompt into a large language model such as GPT-4 or Claude Opus:

Develop a Python function named `categorize_descriptions` to categorize transactions based on keywords inferred from the CSV content below. This function aims to establish a simple and efficient categorization system that aligns with the principles of Beancount format and double-entry bookkeeping. Follow these guidelines for categorization:

- Infer concise, single-word descriptions that resemble the type found in the first sentence of a generic Wikipedia article, such as "Rent" or "Payroll". Convert these categories to lowercase, then capitalize the first letter of each word in the final category name, including after hyphens. For multi-word category names, replace spaces with hyphens (not underscores) to comply with the Beancount format, which does not support spaces in category names.
- Determine whether a category is associated with a debit or credit to accurately classify reimbursements versus contributions. This distinction is crucial for adhering to the double-entry bookkeeping system, where every transaction is represented by equal and opposite entries (debits and credits), ensuring that expenses and assets are correctly balanced.

Following the categorization, use the provided CSV file to convert its contents to the Beancount format using the `categorize_transactions` function. Adhere to the following conversion rules strictly:

- Ensure all category names and entries are lowercase execpt for the first letter of each word, which must be capitalized (including after hyphens). This formatting should be consistent across the entire file.
- Abide by the double-entry bookkeeping system of Beancount for plain-text accounting. Every entry must include a corresponding credit and debit, reflecting the fundamental principle that every financial transaction involves equal and opposite entries in at least two different accounts.
- Parse the description column into a string format accurately, maintaining the integrity of the transaction details.
- Format all dates as `YYYY-MM-DD` and provide the resulting Beancount file for download with the `.bean` extension. The filename should follow the format: `institution=institutionname_type=accounttype_lastfour=lastfourdigits_start=YYYY-MM-DD_end=YYYY-MM-DD.beancount`, ensuring it is entirely lowercase. This format includes the institution name, account type (checking, savings, or investment), the last four digits of the account number, and the start and end dates representing the earliest and most recent transactions in the file.
- Separate transactions with a double space, adhering to Beancount conventions and syntax. Use `rjust` and `ljust` to align the amount numbers vertically, considering the string lengths of categories and account naming schemes.
- Represent checking account transactions with at least one entry for `f"Assets:{accounttype}:{institution}-{lastfour}"`, where `lastfour` are the last four digits of the account number.
- Only use `csv.DictReader` for CSV processing; do not use `pandas.read_csv`. Ensure that the column names and datatypes are accurately handled during conversion.

As a principal software engineer at Google, execute each step with precision, crafting high-quality code that closely adheres to each criterion. Emphasize the importance of the double-entry bookkeeping system, where each credit must have a corresponding debit, and expenses and assets are tied in equal but opposite transactions, throughout your coding and categorization process. Please remember to provide the final file for download appropriately named and formatted.


Download the resulting .beancount file, and in the same folder create a file called transactions.beancount with the following contents:

include institution=institutionname_type=accounttype_lastfour=lastfourdigits_start=YYYY-MM-DD_end=YYYY-MM-DD.beancount

(Replacing this with the file you downloaded from GPT-4 or Claude Opus.)
The contents of the transactions file should include entries like this:
2024-02-28 * "COMPANY"
  Expenses:Utility-Payment        12.99 USD
  Assets:Checking:Bank


Then submit the following prompt to save a copy of the function used to categorize and transform the raw transaction data into beancount format:

Great, now provide the full python file for exactly reproducing this entire procedure, which accepts as input a single command line argument and yields this exact same output, the path to the CSV file, and provide a short example of its usage and documentation in markdown I can copy and paste into the README. Do not forget to include the specific file name, schema, and other naming conventions throughout that exactly replicate this process (and test it yourself first if able).

You should receive output similar to this; copy and paste it somewhere safe for reproducibility (or for aligning future schema or categories):
import csv
from datetime import datetime
import sys

def categorize_descriptions(description):
    keywords_to_categories = {
        "RENT": "Rent",
        "Health": "Health-Reimbursement",
        "PAYROLL": "Payroll",
        "PAYM": "Utilities",
        "ECO": "Electricity",
        "FEE": "Bank-Fee",
        "WITHDRAW": "Cash-Withdrawal",
        "ONLINE DEPOSIT": "Deposit",
        "WITHDRAWAL": "Withdrawal",
        "Online Transfer": "Transfer"
    }
    default_category = "Miscellaneous"
    for keyword, category in keywords_to_categories.items():
        if keyword.lower() in description.lower():
            return category
    return default_category

def format_category(category):
    return '-'.join(word.capitalize() for word in category.split())

def convert_csv_to_beancount(csv_file_path, institution, account_type, last_four, start_date, end_date):
    transactions = []
    with open(csv_file_path, newline='', encoding='utf-8') as csvfile:
        reader = csv.DictReader(csvfile)
        for row in reader:
            category = format_category(categorize_descriptions(row['Description']))
            account_type = "Expenses" if row['Type'].startswith("DEBIT") or row['Type'] == "FEE_TRANSACTION" else "Income"
            date = datetime.strptime(row['Posting Date'], "%m/%d/%Y").strftime("%Y-%m-%d")
            amount = "{:.2f}".format(abs(float(row['Amount'])))
            description = row['Description'].replace('"', "'")
            entry = f"""
{date} * "{description}"
  Assets:{account_type.capitalize()}:{institution}-{last_four}  {('-' if account_type == 'Expenses' else '') + amount} USD
  {account_type}:{category}  {('' if account_type == 'Expenses' else '-') + amount} USD
"""
            transactions.append(entry)
    beancount_content = "\n".join(transactions)
    return beancount_content

def main(csv_file_path):
    # Extract required details for the file naming convention
    institution = "chase"
    account_type = "checking"
    last_four = "XXXX"
    start_date = "2023-12-13"
    end_date = "2024-03-06"

    beancount_content = convert_csv_to_beancount(csv_file_path, institution, account_type, last_four, start_date, end_date)
    beancount_file_path = f"/mnt/data/institution={institution}_type={account_type}_lastfour={last_four}_start={start_date}_end={end_date}.beancount"

    with open(beancount_file_path, 'w', encoding='utf-8') as beancount_file:
        beancount_file.write(beancount_content)
    print(beancount_file_path)

if __name__ == "__main__":
    if len(sys.argv) != 2:
        print("Usage: python convert_to_beancount.py <path_to_csv_file>")
        sys.exit(1)
    csv_file_path = sys.argv[1]
    main(csv_file_path)


Open the .beancount file in Visual Studio Code (https://code.visualstudio.com/) and install the Beancount Formatter extension and the Beancount extension.
Click on View then Open command palette (or use the Cmd + Shift + P shortcut), and type in Format document, and press enter. This will format the document.
On the command line, install beancount in python using pip install beancount.
Convert the beancount transactions to a CSV by executing the following command on the command line:

bean-query transactions.beancount "SELECT date, account, number, currency WHERE account ~ '' " --format=csv > transactions.csv


Now you can open the categorized transactions in Google Sheets, Excel, Mac's Numbers app, Tableau, Python Jupyter notebooks, using DuckDB's read_csv functions, etc. for visualization, or use a private dashboard with the Observable Framework (https://observablehq.com/framework/) for more interactivity. Beancount comes built-in with Fava support (demo here: https://fava.pythonanywhere.com/huge-example-file/ and repo here: https://beancount.github.io/fava/).
Repeat this process monthly/quarterly/annually (including the previous transactions and categories that were inferred to ensure the same schema is used).
Consider using a version control system to ensure you can merge and balance the accounts accurately, across multiple currencies, investments, and account types. GitHub may not be the safest option for storing financial data, and self-hosted solutions like GitLab may be best.

Note: If you are storing backups on Dropbox, Google Drive, or other cloud storage systems, ensure you have 2-factor authentication enabled and preferably not using a cellphone number (but an authenticator app or security key such as a Solokey).
Tested with the following large language models:


 GPT-4
 Claude Opus (still blocked due to Anthropic's automatic new account flag disabling my account)

Tested with the following financial institutions' data


 Chase
 Bank of America
 Schwab
 Wise
 International

TODO


 Test with DSPy to have a more complex prompt chain for various financial institutions, currencies, types of accounts.

Note: I have a math, physics, and machine learning background and am not an accountant (credits and debits still confuse me), so the contents here are very much thanks to GPT-4 holding my hand and spending as little time as possible on this so that I don't have to do it again for a decade (similar to making my website ten years ago and not touching the design since, despite continuously benefiting from the search engine optimization, lead generation, and friendship generation it has contributed to over my career).