Back Testing Value-at-Risk

by Romain Berry
J.P. Morgan Investment Analytics & Consulting
romain.p.berry@jpmorgan.com

This article is the sixth in a series of articles exploring risk management for institutional investors.

Over the past five articles, we have covered the basics about computing Value-at-Risk (VaR) to assess the market risk of a portfolio of traditional financial instruments. We explained at a high level the pros and cons of the three main methodologies, namely Analytical, Historical and Monte Carlo Simulations. In our last article we described how one can stress test a portfolio to perform some sensitivity analysis or to account for extreme movements in the markets. But how accurate are these various VaR methodologies? At the end of the day, all these models may look elegant and quite sophisticated, but do they really do what they say they do - identify some areas in a portfolio where risks may arise and eventually affect the overall performance of the portfolio? Back Testing has been designed to answer this particular question. We present in this article the pros and cons of the two main techniques used to back test a VaR model.

Background

Back Testing is a technique used to reconcile forecasted losses from VaR with actual losses at the end of the time horizon (generally 1 day, 1 week, 2 weeks, 1 month, 1 quarter, 6 months or 1 year). VaR estimates with a given probability (generally 1% or 5%) the loss that a given portfolio may experience by the end of the analysis horizon under normal market conditions. For instance, if a portfolio has a monthly VaR of $5 million at 99% confidence level, this means that the portfolio has a 1% chance of experiencing a loss of at least $5mm by the end of the month under normal market conditions. At the end of the month, if the portfolio experiences a loss greater than $5mm, then we might conclude that the VaR model has not been very accurate in estimating the potential loss, and therefore risk, that this portfolio was exposed to. On the contrary, if the portfolio has made a gain or experienced a loss lower than $5mm, then the chosen VaR methodology might have been appropriate and seems to have been accurate in predicting the potential loss exposure for that particular portfolio.

In cases where the VaR has been underestimated and thus when the portfolio has experienced a loss greater than VaR, we say that VaR has been "breached", and such an event is called a "breach" or "violation" of VaR. If there have been too many violations, then the VaR model may not be adequate for the instruments composing the portfolio. For instance, using Analytical VaR for a portfolio heavily weighted in derivatives may most likely encounter a higher number of VaR breaches than normal. But how many violations are required before one starts challenging the VaR methodology underlining the risk engine? That depends on the frequency to which the Back Testing is performed and the employed confidence level. For instance, if we run a Back Testing every day (therefore, requiring daily VaR computations) at 99% confidence level, then there should theoretically be one violation over a window of 100 trading days. If we conduct a monthly Back Testing on a VaR (irrespective of the frequency it is calculated since we will only compare the values at the beginning and end of the month) at a 95% confidence level, then we could expect to have a maximum of five violations over the next 8.33 years. If a portfolio has registered more violations that it was supposed to have, then one should review the pricing models used for each instrument, the volatility clustering models, the VaR methodology, the number of simulations, the state of the economy, and so on and so forth to identify the source(s) of the breach.

We reproduce in Exhibit 1 the graph of a Back Testing exercise. In this example, we have computed a VaR at 95% confidence level over a period of one year and reconcile ex ante VaR estimates with ex post P&L values on a daily basis. Five breaches have been recorded but this is still less that the theoretical 12.5 violations (5% x 250 days) we could have expected and eventually tolerated. The conclusion of this exercise is that the VaR model may seem appropriate for that particular portfolio.

image

Having said that, since Back Testing is a statistical technique, it is prone to estimation errors. We could have Type I or II errors - respectively, the error of observing a violation of VaR when in truth there is none, and the error of failing to observe a VaR violation when in truth there is one. Furthermore, the Back Testing may identify or fail to identify a VaR breach because there has been an error in the set-up (operational risk) of the VaR engine, because the VaR model is not accurate enough (model risk) due to data or computational limitations, because the markets have been extremely volatile or because the correlations have changed. In other words, there is no clear-cut method to say when there is an actual violation until the source of the breach has been identified, investigated and validated.

There are basically two main approaches to perform Back Testing on VaR: dirty and clean. We describe the two techniques as well as provide some pros and cons for each.

Dirty Back Testing

Dirty Back Testing consists of comparing the VaR estimates with the actual P&L values at the end of the time horizon.

The main pitfall of dirty Back Testing is that it will have little value if the portfolio has changed drastically between the beginning and the end of the considered period of time. Indeed, VaR is based on the past history of the assets present in the portfolio at the beginning of the time period. If the portfolio has been actively traded or has received substantial subscriptions and/or redemptions, the end portfolio would be so different from the beginning portfolio that every kind of comparison between the two would be meaningless. Similarly to Stress Testing, Back Testing starts with a given portfolio and heavily relies on the assumption that this portfolio will not evolve too much. Without disregarding dirty Back Testing any further, the important question here is to what extent has the portfolio changed that would justify that the results of the Back Testing may not be relevant anymore. It is difficult to answer this question, though there are several places to start looking. First, you could check the number of derivatives, which can exhibit sudden jump in prices, and further check the weight of all derivatives compared to the fund NAV. Second, you could also look at the number of instruments that have an individual VaR much higher than the VaR of the portfolio (very risky assets, so therefore very volatile). One solution would be to perform Back Testing and eventually the VaR computations more often, thus reducing the number of incremental trades that may affect the portfolio too drastically.

Having said that, dirty Back Testing has the main advantage that it is really easy to implement - for instance, it can be implemented on an Excel spreadsheet, where you can store in one column the VaR estimates and in another column the actual P&L after removing any cashflow effects (redemptions/subscriptions). A third column would simply indicate if the VaR estimate is higher or lower than the actual P&L, and therefore if there has been a breach of VaR or not.

Clean Back Testing

Clean Back Testing consists of comparing, still at the end of the time horizon, the VaR estimates with some hypothetical P&L values of the portfolio, having kept its composition unchanged. In that case, we simply re-price the same portfolio at the end of the time interval, calculate the returns and then compare them with the ex-ante values of VaR.

This technique may seem worthless since the current portfolio may be quite different from the portfolio we started with at the beginning of the time interval. However, the fact that the portfolio may have drastically changed during that period of time does not make clean Back Testing irrelevant at all since this method tries to assess the predictive ability of VaR irrespective of the composition of the portfolio. Back Testing answers the question of how accurate was the VaR estimate we calculated at the beginning of the period of time. The best way to check VaR predictive ability is to compare apples with apples and therefore compare the identical portfolio at the beginning and at the end of the time period. If the VaR model has failed to foresee a loss greater than VaR, then irrespective of the trading activity that may have taken place between these two points in time, clean Back Testing tells us that there may be something wrong with the VaR model that would require further investigation.

The main advantage of this type of Back Testing is that it respects the fundamental assumption that when we compute VaR, the portfolio must remain unchanged over the selected time period. But some difficulties may make that exercise a bit more complicated that it appears to be. Indeed, some instruments may have matured between the beginning and the end of the chosen time interval, while some others may have defaulted (credit risk). How can one account for these pitfalls? Several solutions exist like rolling these instruments over by one period of time or removing the instruments that defaulted (and consequently recalculate the VaR at the beginning of the period without them if these positions were substantial).

Conclusion

Back Testing is not the panacea to identify flaws in a VaR model but rather a required technique that indicates whether a VaR methodology is being used appropriately or not to a given portfolio. As the composition of the portfolio evolve, Back Testing lies in a constant reminder that the VaR model should also be reviewed and eventually readjusted if VaR has lost a bit of its predictive power. When applicable, it would be best practice to perform both clean and dirty Back Testing as they bring to light different flaws in the VaR computations.

 
Up

Copyright © 2013 JPMorgan Chase & Co. All rights reserved.