Engineering

Why We Built a MISMO Format Registry

TrueABS Engineering · June 2026

The Problem: Agency Schemas Change Silently Across Vintages

Fannie Mae's loan-level disclosure format hasn't stood still since 1999. Field names get renamed, delinquency status codes get re-bucketed, new modification flags get introduced mid-stream — and the agencies don't always flag these changes loudly. A field like dq_status can mean something subtly different in a 2008 origination file than it does in a 2022 one, and nothing in the raw data tells you that happened.

If you're querying across 20+ years of vintage data — which is exactly what credit and prepay analysis requires — that silent drift turns into silently wrong answers.

What Most Platforms Do: Patch the ETL, Document Nothing

The common pattern: someone notices a field changed, patches the ingestion script to handle the new format, and moves on. The fix lives in a commit message or a Slack thread, not in anything queryable. Six months later, a different engineer hits the same vintage boundary, doesn't know the history, and either reintroduces the bug or burns a day rediscovering it.

What TrueABS Does: Every Schema Change Is a Versioned Registry Entry

We externalize format history as a database object — the Format Registry. Every agency format change becomes a new registry entry: which fields existed, what they meant, and exactly when the change took effect. It's bi-temporal (it tracks both when a format was valid in the source data and when we recorded that fact) and fully replayable — we can reconstruct exactly how any vintage of loan data should be parsed and interpreted, audit-proof, without code archaeology.

Staging retains the native agency format as originally published; the semantic layer is MISMO-aligned on top of that, with the Format Registry as the bridge between the two.

Why It Matters for AI Agents

Stable field semantics across vintages aren't a nice-to-have when an LLM is the one writing the query. An agent that doesn't know dq_status changed meaning in 2015 won't ask — it'll just hallucinate a consistent-sounding answer from inconsistent data. MISMO alignment plus the Format Registry means a query spanning multiple vintages returns what it actually claims to return, whether a human or an agent wrote it.

Want to see the Format Registry in action across Fannie Mae, Freddie Mac, and Ginnie Mae data?

Request Early Access →