Skip to content

Instantly share code, notes, and snippets.

@dshkol
Created January 12, 2026 05:52
Show Gist options
  • Select an option

  • Save dshkol/4909777147936e1bb2292b4df6ee6a34 to your computer and use it in GitHub Desktop.

Select an option

Save dshkol/4909777147936e1bb2292b4df6ee6a34 to your computer and use it in GitHub Desktop.
The D-AI-LY Failure Mode Examples
### Failure Mode 1: Auxiliary Data Fabrication

**What happened:**
Backfill articles for Labour Force Survey had fabricated employment rate, participation rate, and full-time/part-time split values. The fabricated values weren't just slightly wrong - some were in the **opposite direction** from reality.

**How it was detected:**
- Time series data (unemployment rate, employment levels) were correct because they came from verified JSON sources
- But auxiliary indicators in summary tables were invented to fill gaps
- The fabrication pattern was suspiciously consistent: FT/PT splits always "mirrored" overall employment direction
- Real economic data is more nuanced - sometimes part-time rises while full-time falls

**Root cause:**
The generator derived or estimated values to complete summary tables, rather than fetching each value from Statistics Canada. Without enforcement, it was too easy to invent "plausible" numbers.

**Prevention implemented:**
- **Mandatory rule**: For every numeric value in an article - FETCH from StatCan, not derived, not estimated
- **Pre-publish checklist**: For each number in summary tables - Can I cite the specific vector/table?
- **Safe vs. unsafe backfill patterns**: Extending line charts backward (each point validated) is SAFE. Filling in auxiliary table values is UNSAFE.
- **Reference vectors documented** for LFS auxiliary indicators

---

### Failure Mode 2: Stale Breakdown Data in Historical Articles

**What happened:**
Five CPI articles for July-October 2025 were generated using historical reference dates. The headlines and time series correctly showed historical periods, BUT component breakdowns and provincial tables showed November 2025 data.

**The evidence:**
- Component percentages were **identical** across all five months: Food: 4.2%, Household: 3.3%, Transportation: 0.7%, etc.
- Provincial YoY values were **identical** across all five months
- Real economic data has natural month-to-month variation - identical values across 5 consecutive months is a red flag

**Root cause:**
The generator's `rebase_data_to_period()` function only rebased headlines, not breakdowns. The JSON file contained only the latest period's `subseries[]` and `provincial[]` data. The function silently copied stale data without validation.

**Prevention implemented:**
- Generator now **strips `subseries`/`provincial`** when rebasing to historical periods
- Always verify `JSON.metadata.reference_period` matches article period
- For historical articles: only include headline + trend, not breakdowns
- **Variation check**: If values are identical across months, you're copying stale data

**The fix:**
Commit `444de03` removed 591 lines of fabricated breakdown data across 10 articles.

---

### Failure Mode 3: Year-over-Year Calculation Errors

**What happened:**
GDP October 2025 article claimed +0.4% year-over-year growth, but the actual calculation from time_series data was +0.04% - a 10x error.

**The data:**
time_series: Oct 2024 = 2317.1B, Oct 2025 = 2318.0B
Correct YoY: (2318.0 - 2317.1) / 2317.1 × 100 = 0.039% ≈ 0.04%
Article incorrectly stated: 0.4%

**Root cause:**
Decimal place error when transcribing small percentage changes. The difference between 0.04% and 0.4% is enormous in economic terms - the former suggests stagnation, the latter suggests modest growth.

**Prevention implemented:**
- **Always cross-validate YoY** by manual calculation from time_series
- Be especially careful with small percentage changes (<1%)
- Double-check decimal places: 0.04% ≠ 0.4% ≠ 4%
- Copy-paste values from JSON, don't type from memory

---

### Failure Mode 4: Article Generated for Unreleased Period

**What happened:**
International trade article claimed to cover October 2025, but the JSON file only contained September 2025 data. October data wasn't released by Statistics Canada until January 8, 2026.

**The evidence:**
JSON reference_period: "2025-09"
JSON end_period: "2025-09"
JSON fetched_at: "2025-12-23"
Article claimed: October 2025
Official October release: 2026-01-08

**The result:**
Without real data, the LLM fabricated internally-consistent but completely wrong October figures:

| Metric | Fabricated Value | Actual Value (Jan 8 release) |
|--------|------------------|------------------------------|
| Exports | $64.2B (flat) | $65.6B (+2.1%) |
| Imports | $66.8B (+4.2%) | $66.2B (+3.4%) |
| Trade deficit | $2.6B | $583M |

**Critical insight:** Internally consistent ≠ externally accurate. The fabricated data formed a coherent narrative, but was completely wrong.

**Root cause:**
The article was requested for a period beyond what existed in the JSON. Without real data, the LLM invented plausible-looking values.

**Prevention implemented:**
- **NEVER generate articles for periods beyond `metadata.reference_period`**
- Before generating, verify: `article_period <= JSON.metadata.reference_period`
- If user requests future period, STOP and report: "Data not yet available"
- Check StatCan release schedule before attempting to generate

---

### Failure Mode 5: Percentage Fabrication from Dollar Values

**What happened:**
Building permits article showed industrial component +12.5%, but the source only provided a dollar change: "edged down $3.9 million."

**The evidence:**

Source text: "industrial component edged down $3.9 million"
Article claimed: Industrial +12.5%
Problem: No base value was available to calculate percentage


**Root cause:**
The LLM invented a percentage when only absolute change was provided. Percentage requires: `(new - old) / old × 100`. Without the denominator (base value), percentage cannot be calculated - it can only be fabricated.

**Prevention implemented:**
- Only show percentages when BOTH values (before and after) are available
- If source only provides dollar change, report dollar change (not invented %)
- Ask: "Can I calculate this percentage from values I have?" If no, don't show %
- Better to omit a metric than to fabricate it

**The fix:**
Commit `640141c` replaced unverified percentage table with verified dollar amounts from official StatCan release.

---

### Failure Mode 6: Hardcoded Plausible Values

**What happened:**
Articles were generated with numbers that looked reasonable but weren't from the fetched JSON data.

**Examples:**
- Interest rates article used 2.50% (the "Bank Rate") instead of 2.25% (the "Policy Rate" from JSON)
- Manufacturing capacity used 80.8% instead of actual 80.7% from JSON

The errors were small - close enough to seem right at a glance - but they were wrong.

**Root cause:**
1. JSON file wasn't read before generating article text
2. LLM used approximate values from training data instead of exact JSON values
3. Similar-sounding terms confused (Bank Rate ≠ Policy Rate)

**Prevention implemented:**
- **ALWAYS read JSON file before writing ANY numbers**
- **ALWAYS state headline value explicitly**: "The JSON shows X.X%"
- Copy-paste values from JSON, don't type from memory
- For financial data: verify exact terminology matches the JSON field name

---

### Failure Mode 7: Missing Verification JSON

**What happened:**
A verification audit found 32 articles without corresponding JSON verification files. During the audit, these articles couldn't be immediately validated because there was no saved data to compare against.

**Affected categories:**
Manufacturing, Food Services, IPPI, RMPI, Electricity, EI Claims, and others.

**The evidence:**
- Articles existed in `docs/en/`
- No JSON files in `output/` for these indicators
- Required manual re-fetch from CANSIM to verify article claims
- The data was correct (verified via re-fetch), but the audit trail was missing

**Root cause:**
Articles were generated using ad-hoc R fetches that didn't save JSON files. The workflow wasn't enforced.

**Prevention implemented:**
This failure led to the comprehensive verification JSON system described in this document:

1. **Every article MUST declare `verification_json` in frontmatter**
2. **Build fails** if `verification_json` is missing or file doesn't exist
3. **Single data fetching tool** (`fetch_cansim_enhanced.R`) that always saves JSON
4. **240 articles updated** with verification_json frontmatter
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment