Fannie Mae's loan-level disclosure format hasn't stood still since 1999. Field
names get renamed, delinquency status codes get re-bucketed, new modification
flags get introduced mid-stream — and the agencies don't always flag these
changes loudly. A field like dq_status can mean something subtly
different in a 2008 origination file than it does in a 2022 one, and nothing in
the raw data tells you that happened.
If you're querying across 20+ years of vintage data — which is exactly what credit and prepay analysis requires — that silent drift turns into silently wrong answers.
The common pattern: someone notices a field changed, patches the ingestion script to handle the new format, and moves on. The fix lives in a commit message or a Slack thread, not in anything queryable. Six months later, a different engineer hits the same vintage boundary, doesn't know the history, and either reintroduces the bug or burns a day rediscovering it.
We externalize format history as a database object — the Format Registry. Every agency format change becomes a new registry entry: which fields existed, what they meant, and exactly when the change took effect. It's bi-temporal (it tracks both when a format was valid in the source data and when we recorded that fact) and fully replayable — we can reconstruct exactly how any vintage of loan data should be parsed and interpreted, audit-proof, without code archaeology.
Staging retains the native agency format as originally published; the semantic layer is MISMO-aligned on top of that, with the Format Registry as the bridge between the two.
Stable field semantics across vintages aren't a nice-to-have when an LLM is the
one writing the query. An agent that doesn't know dq_status changed
meaning in 2015 won't ask — it'll just hallucinate a consistent-sounding answer
from inconsistent data. MISMO alignment plus the Format Registry means a query
spanning multiple vintages returns what it actually claims to return, whether a
human or an agent wrote it.
Want to see the Format Registry in action across Fannie Mae, Freddie Mac, and Ginnie Mae data?
Request Early Access →