Skip to content

Instantly share code, notes, and snippets.

Fine-tuning llama 2 7B to analyze financial reports and write “funny” tweets

Sharing some insights from a recent weekend fun project where I tried to analyze and summarize financial reports using a fine-tuned LLM.

My initial goal was to train a model to summarize the annual/quarterly financial reports of public companies (aka 10-K / 10-Q). But, realizing that straightforward financial summaries are boring, I thought of tuning LLM to generate sarcastic summaries of these reports. Something short I could post on Twitter.

Data exploration and dataset prep

Working with financial reports ain’t easy. You download them in html format, they’re pretty dense with ~100 pages filled with tables that can be tough to parse, many legal disclaimers and various useless info. I knew I wanted to get 3-5 funny tweets as an output from a report. But I spent quite some time figuring out what data to actually input to get the result - a page, a section, a table?

@miagkyi
miagkyi / 10k_reports_to_tweets.py
Last active March 22, 2024 20:40
Create funny tweets from 10k/q financial reports using gpt
from datetime import datetime, timedelta
import concurrent.futures
import csv
import html
import os
import time
from bs4 import BeautifulSoup
from dotenv import load_dotenv
import nltk