Skip to content

Instantly share code, notes, and snippets.

@simonw
Created December 31, 2024 05:19
Show Gist options
  • Save simonw/89c358ac3617b38afc41c79c995a4ebe to your computer and use it in GitHub Desktop.
Save simonw/89c358ac3617b38afc41c79c995a4ebe to your computer and use it in GitHub Desktop.

Blog Entries from 2024 with "llms" Tag

human (Dec 30, 2024, 09:12 PM)

  • blog_entry id, title, slug, body, created, metadata, search_document, tweet_html, import_ref, extra_head_html, card_image, custom_template, series_id, is_draft, live_timezone
  • blog_entry_tags id, entry_id, tag_id
  • blog_tag id, tag, description

SQL query postgresql to return id, title, created for every blog_entry in 2024 that was tagged with tag=llms

assistant (Dec 30, 2024, 09:12 PM)

I'll help you write a SQL query to get blog entries from 2024 tagged with "llms". We'll need to join all three tables to connect the entries with their tags.

SELECT DISTINCT 
    be.id,
    be.title,
    be.created
FROM blog_entry be
JOIN blog_entry_tags bet ON be.id = bet.entry_id
JOIN blog_tag bt ON bet.tag_id = bt.id
WHERE 
    bt.tag = 'llms'
    AND EXTRACT(YEAR FROM be.created) = 2024
ORDER BY be.created DESC;

This query:

  1. Joins blog_entry with blog_entry_tags and then with blog_tag
  2. Filters for entries tagged with "llms"
  3. Uses EXTRACT to filter for entries created in 2024
  4. Uses DISTINCT to avoid duplicate entries in case a post has multiple tags
  5. Orders results by creation date (newest first)

human (Dec 30, 2024, 09:14 PM)

SELECT DISTINCT 
    be.created,
    be.slug,
    be.title,
    be.id
FROM blog_entry be
JOIN blog_entry_tags bet ON be.id = bet.entry_id
JOIN blog_tag bt ON bet.tag_id = bt.id
WHERE 
    bt.tag = 'llms'
    AND EXTRACT(YEAR FROM be.created) = 2024
ORDER BY be.created

Now add a url column - you'll need to calculate that from the be.created and the be.slug where the URL format is e.g. https://simonwillison.net/2024/Dec/9/llama-33-70b/ - note the Dec month and the no-leading-0 9 for the day

assistant (Dec 30, 2024, 09:14 PM)

I'll help you construct the URL using date extraction functions. We'll need to:

  1. Extract the month as a 3-letter abbreviation (like 'Dec')
  2. Extract the day without leading zeros
  3. Combine with the slug

Here's the modified query:

SELECT DISTINCT 
    be.created,
    be.slug,
    be.title,
    be.id,
    'https://simonwillison.net/' || 
    EXTRACT(YEAR FROM be.created) || '/' ||
    TO_CHAR(be.created, 'Mon') || '/' ||
    TRIM(LEADING '0' FROM TO_CHAR(EXTRACT(DAY FROM be.created), '99')) || '/' ||
    be.slug || '/' as url
FROM blog_entry be
JOIN blog_entry_tags bet ON be.id = bet.entry_id
JOIN blog_tag bt ON bet.tag_id = bt.id
WHERE 
    bt.tag = 'llms'
    AND EXTRACT(YEAR FROM be.created) = 2024
ORDER BY be.created

This query:

  1. Uses TO_CHAR with 'Mon' format to get the 3-letter month abbreviation
  2. Uses EXTRACT(DAY) to get the day number
  3. Uses TO_CHAR and TRIM to remove leading zeros from the day
  4. Concatenates all parts with '/' between them
  5. Keeps all your existing columns and ordering

human (Dec 30, 2024, 09:16 PM)

Now it's this:

SELECT DISTINCT 
    be.created,
    be.slug,
    be.title,
    be.id,
    'https://simonwillison.net/' || 
    EXTRACT(YEAR FROM be.created) || '/' ||
    TO_CHAR(be.created, 'Mon') || '/' ||
    EXTRACT(DAY FROM be.created) || '/' ||
    be.slug || '/' as url
FROM blog_entry be
JOIN blog_entry_tags bet ON be.id = bet.entry_id
JOIN blog_tag bt ON bet.tag_id = bt.id
WHERE 
    bt.tag = 'llms'
    AND EXTRACT(YEAR FROM be.created) = 2024
ORDER BY be.created

But I need a query that outputs the following:

- January
  - [Title of post](URL to post)
  - [Title of post](URl to post)
- February
  - ...

So it needs to do some fancy aggregations and grouping by month to generate that text

assistant (Dec 30, 2024, 09:16 PM)

I'll help create a query that generates Markdown-formatted output grouped by month. We'll use string_agg to concatenate the entries and format them as Markdown links:

WITH monthly_posts AS (
  SELECT 
    EXTRACT(MONTH FROM be.created) as month_num,
    TO_CHAR(be.created, 'Month') as month_name,
    string_agg(
      '  - [' || be.title || '](https://simonwillison.net/' || 
      EXTRACT(YEAR FROM be.created) || '/' ||
      TO_CHAR(be.created, 'Mon') || '/' ||
      EXTRACT(DAY FROM be.created) || '/' ||
      be.slug || '/)',
      E'\n'
      ORDER BY be.created
    ) as post_links
  FROM blog_entry be
  JOIN blog_entry_tags bet ON be.id = bet.entry_id
  JOIN blog_tag bt ON bet.tag_id = bt.id
  WHERE 
    bt.tag = 'llms'
    AND EXTRACT(YEAR FROM be.created) = 2024
  GROUP BY 
    EXTRACT(MONTH FROM be.created),
    TO_CHAR(be.created, 'Month')
  ORDER BY month_num
)
SELECT 
  '- ' || trim(month_name) || E'\n' || post_links as markdown
FROM monthly_posts;

This query:

  1. Creates a CTE that groups posts by month
  2. Uses string_agg to create the bullet-pointed links for each post
  3. Formats each month as a heading with its posts underneath
  4. Orders everything chronologically by month
  5. Returns a single column containing the formatted Markdown text

The output will look exactly like your example, with each month as a heading and the posts as nested bullet points with proper Markdown link formatting.

human (Dec 30, 2024, 09:18 PM)

perfect - now add the day of the month like this:

- January
  - 1st: [Title of post](URL to post)
  - 14th: [Title of post](URl to post)

assistant (Dec 30, 2024, 09:18 PM)

Here's the modified query that adds the day with the correct ordinal suffix (st, nd, rd, th):

WITH monthly_posts AS (
  SELECT 
    EXTRACT(MONTH FROM be.created) as month_num,
    TO_CHAR(be.created, 'Month') as month_name,
    string_agg(
      '  - ' || 
      EXTRACT(DAY FROM be.created)::integer || 
      CASE 
        WHEN EXTRACT(DAY FROM be.created)::integer % 10 = 1 
          AND EXTRACT(DAY FROM be.created)::integer != 11 THEN 'st'
        WHEN EXTRACT(DAY FROM be.created)::integer % 10 = 2 
          AND EXTRACT(DAY FROM be.created)::integer != 12 THEN 'nd'
        WHEN EXTRACT(DAY FROM be.created)::integer % 10 = 3 
          AND EXTRACT(DAY FROM be.created)::integer != 13 THEN 'rd'
        ELSE 'th'
      END ||
      ': [' || be.title || '](https://simonwillison.net/' || 
      EXTRACT(YEAR FROM be.created) || '/' ||
      TO_CHAR(be.created, 'Mon') || '/' ||
      EXTRACT(DAY FROM be.created) || '/' ||
      be.slug || '/)',
      E'\n'
      ORDER BY be.created
    ) as post_links
  FROM blog_entry be
  JOIN blog_entry_tags bet ON be.id = bet.entry_id
  JOIN blog_tag bt ON bet.tag_id = bt.id
  WHERE 
    bt.tag = 'llms'
    AND EXTRACT(YEAR FROM be.created) = 2024
  GROUP BY 
    EXTRACT(MONTH FROM be.created),
    TO_CHAR(be.created, 'Month')
  ORDER BY month_num
)
SELECT 
  '- ' || trim(month_name) || E'\n' || post_links as markdown
FROM monthly_posts;

The key changes are:

  1. Added the day number using EXTRACT(DAY FROM be.created)
  2. Added a CASE statement to handle the ordinal suffixes correctly:
    • st for numbers ending in 1 (but not 11)
    • nd for numbers ending in 2 (but not 12)
    • rd for numbers ending in 3 (but not 13)
    • th for all other numbers (including 11th, 12th, 13th)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment