IRRBB Regulation: All Model, No Plumbing

The Bottom Line

IRRBB regulation devotes extraordinary attention to modelling methodology (scenarios, behavioural assumptions, optionality, CSRBB) while the data and operational infrastructure that underpins every metric receives a fraction of the prescriptive rigour. The result: supervisory scrutiny, budget, and talent gravitate to model sophistication while production robustness, data quality, and operational reliability are left to generic frameworks that haven't been implemented. A team producing clean, decision-useful metrics within hours has a genuinely advanced capability. A team running elaborate stochastic engines that take days and require extensive manual intervention does not, regardless of what the sophistication matrix says.

Pick up any IRRBB regulation and you’ll find the same structural imbalance. Modelling methodology dominates: scenario design, behavioural segmentation, prepayment calibration, non-maturity deposits, optionality, credit spread risk. All of it gets extensive supervisory attention. The requirement to have robust systems and data infrastructure, the thing that actually determines whether your numbers are right, is comparatively underdeveloped.

And this isn’t some abstract observation. It has real consequences. Everyone who reviews you (supervisors, internal audit, model validation, external reviewers) benchmarks against the regulatory text. If that text devotes dozens of paragraphs to modelling and a handful to operational robustness, every oversight function weights its scrutiny accordingly. Budget follows. Headcount follows. Management attention follows. The regulation doesn’t just describe priorities. It creates them.

The evidence

Basel’s IRRBB standards (BCBS d368) dedicate the bulk of their content to modelling methodology, scenarios, and behavioural assumptions. Data integrity and systems infrastructure don’t get their own Principle. They share Principle 6 with model governance and validation. There’s no standalone Principle on data quality, production robustness, or operational reliability. Not one.

The EBA Guidelines follow the same pattern. Measurement gets over thirty paragraphs and two technical annexes. IT systems and data quality get nine paragraphs tucked inside the Governance section, requiring systems to “capture,” “record,” and “compute” without specifying thresholds, tolerances, or timeliness standards against which compliance could actually be assessed. Neither framework cross-references BCBS 239.

The asymmetry plays out across five dimensions:

Specificity: the modelling guidance prescribes scenarios, behavioural treatments, and valuation methods. The data guidance says systems should “capture” and “compute,” and leaves it at that.
Tiering: there’s a sophistication matrix classifying institutions 1 to 4 by model complexity. There’s no equivalent for data quality or production reliability.
Frequency: the framework requires scenarios to be run at least quarterly. It’s silent on production cycle times, rerun capability, or production KPIs.
Validation: there are dedicated paragraphs on backtesting, model validation, and sensitivity analysis. Nothing on data reconciliation, error tolerance, or remediation tracking.
Cross-references: the modelling framework links extensively to scenario design, stress testing, and supervisory frameworks. Neither Basel nor the EBA cross-references BCBS 239, the one document that’s supposed to deal with this stuff.

The most revealing exhibit is the EBA’s sophistication matrix. Category 1 institutions, the most sophisticated, are expected to run Monte Carlo simulations, full optionality valuation, and daily risk factor updates. Category 4 can get away with gap analysis and standard shocks.

The matrix is defined entirely by modelling technique. There is no equivalent for data quality, production timeliness, or the ability to produce trusted numbers on time.

So a bank running Monte Carlo on unreliable data, with a production process that takes weeks and requires extensive manual intervention, would be classified as more sophisticated than a bank producing clean, decision-useful metrics from a robust and timely deterministic process. The regulatory text doesn’t merely fail to prevent this inversion. It makes it the path of least resistance.

How we got here

The Basel Committee’s original IRRBB guidance, the 2004 Principles (BCBS 108), was actually quite proportionate. Both modelling and data requirements were high-level. The measurement section outlined gap analysis, duration, and simulation as available techniques without prescribing which to use. No sophistication matrix, no standardised framework, no prescribed behavioural modelling for NMDs or prepayments. Data and systems received comparable treatment: paragraph 50 addressed the “integrity and timeliness of data” in broadly similar terms to what exists today.

When Basel revised the standards in 2016, the modelling content was comprehensively expanded: the standardised framework, the six prescribed shock scenarios, the detailed NMD treatment, the prepayment calibration, the optionality framework. The data and systems requirements? Carried forward unchanged. Copy-paste. The EBA then amplified the gap further with the sophistication matrix and technical annexes. At no point did the operational and data infrastructure requirements receive equivalent attention. The operational and data infrastructure requirements weren’t deliberately deprioritised. They simply didn’t receive the same revisiting.

Since 2016, the direction of travel has been unambiguous: a decade of pressure for more sophisticated models. Budget has followed, specialist teams have been built, and management attention gravitates to where supervisory scrutiny falls. The data and operational infrastructure underneath hasn’t had the same pressure. Not because anyone decided it was unimportant, but because nobody’s been asking about it with the same intensity. And in banking, things that aren’t being asked about don’t get fixed.

Every incremental layer of modelling complexity multiplies the data inputs and the points at which something can break. The EBA’s own ITS on supervisory reporting introduced thousands of reporting data points per institution, and the templates are materially revised annually, making stable technical implementation a moving target. A deterministic scenario EVE calculation is straightforward to validate and debug.

CSRBB: a case study in misallocated effort

If the general imbalance between modelling and infrastructure is the structural problem, credit spread risk in the banking book (CSRBB) is perhaps its clearest illustration. The EBA gave it a dedicated framework within EBA/GL/2022/14, with its own scope rules, measurement criteria, and sophistication expectations. The title itself reflects the elevation: the 2018 “Guidelines on the management of interest rate risk arising from non-trading activities” became “Guidelines on IRRBB and CSRBB”, with credit spread risk promoted to co-equal billing.

A framework in search of a risk

The concept behind CSRBB is straightforward in a trading context: if an instrument is marked to market and its spread widens, the instrument devalues and the institution takes a loss. Fair enough. But most banking book instruments (customer loans, retail deposits, commercial lending) are originated by the institution, priced internally, and held to maturity. They were never priced off a market credit curve and their value doesn’t fluctuate with market credit spreads. You can decompose a customer loan into a risk-free rate and a “credit spread” component if you really want to, but you’re reverse-engineering a market concept onto an instrument that never had one.

The European Banking Federation’s July 2023 paper on CSRBB, published as a “Banking Industry Common Understanding”, makes this point precisely. That the paper exists at all is revealing: eight months after the guidelines were finalised, the European banking industry needed to write its own interpretive guide because the regulatory framework didn’t provide sufficient clarity on which instruments are actually in scope. The fact that the industry felt the need to produce its own interpretive guide suggests the regulatory text left meaningful ambiguity around scope.

The answer, for the core of most banking books, is unambiguous. The EBF states that customer loans and deposits “granted and priced by the institutions, without using references to market liquid instruments” fall outside the scope of CSRBB. For the typical retail and commercial bank, the vast majority of the balance sheet is excluded. What remains is primarily the bond portfolio, the liquidity buffer, and a narrow subset of market-proximate instruments.

The regulatory capacity that went into designing, consulting on, implementing, and now supervising an entire CSRBB framework could have been spent on prescriptive data quality and production robustness standards that would improve every IRRBB number every institution produces. Instead, it was directed at a framework that, for the majority of banking book positions at the majority of institutions, doesn’t apply.

The EBF’s own reliability section effectively concedes the problem, warning that “complex approaches would be an indication that identified CSRBB is not reliable” and advising institutions to limit the number of reference curves used. The implication is worth considering: if measuring this risk requires elaborate decomposition, it may not be present in the banking book in the form the framework assumes.

The opportunity cost

Every hour a team spends building and maintaining a CSRBB calculation is an hour not spent automating a data reconciliation, reducing the production cycle, improving data quality, or building the capacity to rerun scenarios at short notice. And these aren’t hypothetical trade-offs. Teams with finite headcount must choose between a new modelling capability the regulator has asked about and fixing the data pipeline that determines whether any of their existing models produce reliable output. The regulatory incentive structure makes the choice for them: the CSRBB gap will appear in the next supervisory review; the data pipeline will not.

Imagine the planning meeting. Someone raises the data issue. Everyone agrees it’s important. Then someone points out the CSRBB finding from the last supervisory review. Guess which one gets the resource.

Why this matters

The fragility of net metrics

Anyone who has actually run an IRRBB production process, not designed one, not reviewed one, but been responsible for getting the numbers to tie on deadline, knows that the binding constraint is almost never the model. It’s the data, and the operational machinery required to get that data from source systems into a risk engine, through a calculation, and out the other side in a form that’s accurate, explainable, and timely.

The reason is structural. Interest rate risk is measured on a net basis: billions or trillions in notional exposure on each side of the balance sheet are offset against each other and netted down to a comparatively small risk number. This netting makes the metric useful but also inherently fragile. A handful of transactions with missing or incorrect data on one side can cause the whole thing to fall apart, because the error isn’t offset on the other. A misclassified repricing date on a mortgage portfolio, a missing maturity field on wholesale funding, a missing start date on a forward starting swap. Any of these can move the EVE or NII result by a material amount relative to the net position.

When you’re dealing with millions of transactions across dozens of source systems, the production process can feel less like risk analysis and more like data triage. Fixing one break reveals another, and the whole structure can shift when you touch it. Teams spend a disproportionate share of their production cycle on data remediation rather than analysis. It’s unglamorous, thankless work. Nobody’s writing papers about it. But it determines whether the numbers reaching ALCO and the board can actually be trusted.

The validation gap

The regulatory imbalance shapes the entire internal oversight architecture. The IRRBB frameworks explicitly require independent model validation. Basel’s Principle 6 (paragraphs 58–65) and the EBA’s model governance section (paragraphs 71–79) set out detailed expectations: independent review of model inputs, assumptions, and methodologies; formal approval processes; ongoing monitoring; exception triggers; version control. This has given rise to specialist treasury model validation teams whose remit is to provide independent challenge on model design, calibration, and conceptual soundness, with the operative word being model.

BCBS 239, the Basel Committee’s Principles for effective risk data aggregation and risk reporting published in 2013, should fill the gap. It has fourteen principles covering governance, data architecture, accuracy, completeness, timeliness, and adaptability. It even calls for independent validation of risk data aggregation. In practice, it doesn’t. For four reinforcing reasons.

First, BCBS 239 is structurally siloed from the IRRBB frameworks. Neither Basel’s d368 nor the EBA Guidelines cross-reference it. The teams responsible for BCBS 239 compliance are typically central data governance functions; they understand data lineage but not how a misclassified repricing date flows through an EVE calculation and moves the net position by tens of millions. Second, the specialist model validation teams that do understand IRRBB context have a remit focused on the model. They’ll challenge your NMD decay assumptions thoroughly but are less likely to ask whether the deposit balances feeding those assumptions are complete and correctly sourced.

Third, and most practically: BCBS 239 is not IRRBB-specific. It applies to every risk type in the bank. Its implementation budget sits centrally and gets carved up across credit risk, market risk, liquidity, operational risk, and everything else. When the sophistication matrix says your function needs sophisticated modelling, that budget flows directly to your team. When BCBS 239 says your data needs fixing, you’re competing with every other risk area for a fraction of whatever’s been allocated centrally. Your repricing date issues queue behind credit risk’s exposure aggregation problems and the liquidity team’s cash flow mapping gaps. The regulation that creates modelling demand is IRRBB-specific and creates direct accountability. The regulation that should address data quality is generic and creates a shared queue. The result is predictable.

Fourth, and this one rarely gets said out loud, nobody’s career was ever made by fixing data plumbing. The model validators who challenge your NMD assumptions get to write interesting papers, present at conferences, and build a reputation. The person who finally sorts out why the mortgage system keeps sending you repricing dates in the wrong format gets a “thanks” in a team meeting, if they’re lucky. The incentive structure isn’t just regulatory. It’s human.

The result is a validation gap where independent challenge focuses deeply on model sophistication with only rudimentary coverage of data quality. And where remediation is underway, IRRBB is fighting for priority in someone else’s backlog.

The compliance record confirms it. As of the ECB’s 2023 progress report, not a single one of the twenty-five significant institutions it examined had fully implemented BCBS 239. A decade after publication. The ECB noted that weaknesses “stem mainly from a lack of clarity regarding responsibility and accountability for data quality.” The framework designed to ensure data robustness hasn’t been implemented. The framework that depends on it doesn’t provide a read-across to it.

What sophistication should mean

Data remediation is not glamorous. Modelling is more technical, more specialised, and, let’s be honest, more intellectually rewarding. It’s natural for IRRBB professionals to gravitate towards it. I certainly did, earlier in my career. Which is precisely why the regulatory framework’s role in setting priorities matters so much. Without clear regulatory emphasis, the natural gravitational pull will always be toward model sophistication rather than operational foundations.

The total regulatory burden doesn’t need to increase. It needs to be rebalanced. Some of the modelling prescription in the EBA text, particularly in the measurement annexes, could be reduced in favour of strengthening data and operational infrastructure requirements. CSRBB is the clearest candidate, but the same logic applies to optionality valuation, and elaborate dynamic behavioural models for items that can’t be reliably modelled from historical data.

The rebalancing doesn’t require invention. It requires specifying, with the same prescriptive rigour already applied to modelling, what good operational practice looks like. That means data quality thresholds on the fields that drive IRRBB metrics: maturity dates, repricing dates, balances, rate indices. It means requirements to trend-analyse data quality issues over time. Not just fix them ad hoc each production cycle, but track whether the same breaks recur and whether the underlying causes are being addressed. It means formal governance: data quality forums with defined escalation paths, ALCO reporting of recurring data issues and their impact on risk metrics, and visibility of whether the numbers being reviewed were produced cleanly or patched together under time pressure.

It means KPIs on production timeliness and data quality reported with the same rigour as model performance metrics. And it means monitoring of compensating controls and data fallbacks: the manual overrides, the hardcoded fixes, the “temporary” workarounds that persist for years. So that supervisors and boards can see how much of the production process depends on interventions that wouldn’t survive a key-person departure or a compressed reporting deadline. Most IRRBB teams will recognise at least one of these in their own production environment.

If I could set the examination agenda, the questions I’d ask any IRRBB team are not about model architecture:

How quickly can you produce your numbers?
How frequently do you run them?
How much of your team’s time is consumed by data remediation rather than analysis?
Do the metrics produce meaningful results that you trust and that enable you to make risk management decisions effectively?
How do you represent your non-maturing deposits in the model?

The answers reveal the true sophistication of an IRRBB function far more than any discussion about Monte Carlo paths or optionality valuation. A team that can produce clean, decision-useful metrics within hours and rerun them under ad hoc scenarios at short notice has a genuinely advanced capability, and none of that requires 10,000 simulation paths. A team running an elaborate stochastic engine that takes days to produce, requires extensive manual intervention, and whose outputs can’t be easily decomposed or explained to ALCO does not, regardless of what Annex II says about methodological classification.

Regulatory frameworks that rate the latter as the more advanced institution are measuring sophistication by methodology rather than reliability of output. That may be the most consequential gap in the current framework. And it’s one that practitioners on the production side have been observing for some time.

For a full index of IRRBB regulatory documents across jurisdictions, see the regulatory tracker. For more on how emerging tools could reshape IRRBB workflows, see LLMs: A Practical Guide for Banking Professionals.