Matin Mahmood mnm-matin

## chunks.py
"""Per-chunk token cost, side by side. Accurate counts via count_tokens.

BPE with pre-tokenization handles whitespace-delimited chunks independently,
so probing each chunk's token cost in isolation gives the true per-chunk
budget. We display each chunk as a tile labelled with its token count,
side by side for 4.6 and 4.7, so readers can see *where* the extra cost
falls inside a real line of code.
"""
from __future__ import annotations

## probe.py
"""Atom-level probe: where did Opus 4.7's tokenizer break merges that 4.6 had?

For each "atom" string we ask both models: how many tokens?
- If both report 1 token, both vocabs contain that exact merge.
- If 4.6=1 and 4.7=N>1, the merge was dropped in the new vocab.
- If 4.6=K and 4.7=K, atom is fragmented identically (likely byte-level
  fallback or character-level, so vocab changes don't affect it).

Prepending a sentinel char and subtracting 1 isolates the atom from BOS/system
overhead. We do this for every atom and tag each with its category.

## tokens_per_byte.py
"""Tokens-per-byte follow-up: who actually got more expensive?

The first post measured Δ% in token count. That's a ratio. This measures
the absolute efficiency of each tokenizer — tokens per UTF-8 byte — across
a broader language set.

Usage:
    export ANTHROPIC_API_KEY=sk-ant-...
    uv run --with anthropic python tokens_per_byte.py
"""

## opus_4_7_tokenizer_tax.py
"""Compare claude-opus-4-6 vs claude-opus-4-7 tokenizer across content types.

Usage:
    export ANTHROPIC_API_KEY=sk-ant-...
    uv run --with anthropic python opus_4_7_tokenizer_tax.py

Hits api.anthropic.com/v1/messages/count_tokens for each sample and prints
the per-sample delta plus a dollar estimate at 1B input tokens/month.
"""
from __future__ import annotations
	"""Per-chunk token cost, side by side. Accurate counts via count_tokens.

	BPE with pre-tokenization handles whitespace-delimited chunks independently,
	so probing each chunk's token cost in isolation gives the true per-chunk
	budget. We display each chunk as a tile labelled with its token count,
	side by side for 4.6 and 4.7, so readers can see where the extra cost
	falls inside a real line of code.
	"""
	from __future__ import annotations
	"""Atom-level probe: where did Opus 4.7's tokenizer break merges that 4.6 had?

	For each "atom" string we ask both models: how many tokens?
	- If both report 1 token, both vocabs contain that exact merge.
	- If 4.6=1 and 4.7=N>1, the merge was dropped in the new vocab.
	- If 4.6=K and 4.7=K, atom is fragmented identically (likely byte-level
	fallback or character-level, so vocab changes don't affect it).

	Prepending a sentinel char and subtracting 1 isolates the atom from BOS/system
	overhead. We do this for every atom and tag each with its category.
	"""Tokens-per-byte follow-up: who actually got more expensive?

	The first post measured Δ% in token count. That's a ratio. This measures
	the absolute efficiency of each tokenizer — tokens per UTF-8 byte — across
	a broader language set.

	Usage:
	export ANTHROPIC_API_KEY=sk-ant-...
	uv run --with anthropic python tokens_per_byte.py
	"""
	"""Compare claude-opus-4-6 vs claude-opus-4-7 tokenizer across content types.

	Usage:
	export ANTHROPIC_API_KEY=sk-ant-...
	uv run --with anthropic python opus_4_7_tokenizer_tax.py

	Hits api.anthropic.com/v1/messages/count_tokens for each sample and prints
	the per-sample delta plus a dollar estimate at 1B input tokens/month.
	"""
	from __future__ import annotations