Data-driven and market-driven models


  • Data driven models take historical data set as input.
  • Market driven models relies on the current status of the market as input.
  • Required skills and theoretical machinery are different as well as the goal of each model.

Hello!

When working in finance, one faces at least two (say) categories of models with respect to the input. One relies on historical data while the other employs current state information from the market.  Here, I will call them data-driven models and market-driven models.
In conceptual terms, data-driven models are those that depend on a historical data set. Models to determine creditworthiness, for instance, are usually largely based on historical data. On the opposite direction, market-driven models depend on the data set corresponding to the latest information available in the market. This information can be current price, current yield curve, current spread, and so on. The most famous market-driven model is the one proposed by Black and Scholes for vanilla European options (derivation, analytical and numerical solution).
On one hand, data-driven models encompasses the history, assuring continuity of parameters corresponding to the phenomena being modeled. However, these models might not capture fast changes of scenario. An opposite situation will be observed with market driven models. There are other fundamental differences between them as well as the required skills and tools that should be used to handle them. In this post, I intend to briefly discuss such aspects in a fairly informal approach as usual.
Resultado de imagem para rome timeline
Historical data

Fundamental assumptions: Markov property

Resultado de imagem para casino dados
Only "now" matters
In principle, a model relying on historical data assumes that future results depends on aspects that can be induced from past events. These models believe that historical data can reveal tendencies that are sustained during determined future. A quite simple example is the graphical analysis of stock prices which assumes that certain patterns repeat themselves systematically. Models for insurance premium  and mortgage credit are also famous for the high dependency on historical data.
Conversely, market-driven models depends on instantaneous information. It assumes that markets are self-contained, meaning that any aspect such as expectations are already priced in the market. Only the current information matters. In particular, financial derivatives are the great consumers of this kind of model.
Market-driven models present Markov property, which can be read as
$g(X(t))=E[h(X(T)) | F(t)]$
where $X(t)$ denotes the set of variables that are monitored, $g(\bullet)$ and $h(\bullet)$ are deterministic functions, $E[\bullet]$ is the expectation operator, and $F(t)$ is a filtration. Filtration can be understood as an abstract entity that represent all the information available up to time $t$.
Well, how can we interpret equation above? It is simple. The expected value of the function $h(X(T))$ at a future time $T$ depends uniquely on a function $g(X(t))$ evaluated at the current time $t$. Only the "now" information matters to infer the future!
Notice that if we want to formalize data-driven models we should write something like
$g(X(t), X(t-a), X(t-b), X(t-c), \ldots )=E[h(X(T)) | F(t)]$
where $a, b, c, \ldots$ indicate how long "in the past" should we look in order to obtain the expected value at the future time $T$. These cases do not present Markov property.

Market-driven models are more complicated.

The theoretical machinery to handle data-driven models is less complicated and time-consuming than market driven models. I even read somewhere that only "rocket scientists" could handle market driven models (I don´t agree with this comment, although market driven models are really rather more complicated).
In principle, the main theoretical entities employed in market driven models are differential stochastic equations (SDEs). The corresponding dynamics can be solved directly either using SDEs or using a well-known connection with deterministic partial differential equations. Core libraries for solving and calibration of these models are usually written (when in production) in some compiled language (mostly, C or C++) because of computational costs as well as accuracy issues. One can manage, for instance, variables of type double in C++ which provides 16(±1) decimal digits. On the other hand, data driven models requires more data science-like skills, applying more machine learning tools and more statistic formalism.
I will definitely come back to this topic in some future post.
In order to answer a naive question regarding why should we choose more complicated models, let´s take a look at the Black-Scholes model (always Black-Scholes...aff :) ). When calculating an option price using this model one needs three exogenous parameters, named risk-free interest rate, spot price, and volatility. While the first two can be directly obtained by "taking a look" at a market graph, the volatility is rather more complicated. In principle, one uses a fitting formula to recover the volatility implied by the current prices of options as a function of the strike for different maturities. This whole process has important goals: guarantees prices free of arbitrage opportunities with respect to the current status of the market and provides well valuated hedges. 
Other aspect already mentioned above is related to the fact that market driven models can naturally capture fast changes of scenario. Such features would be really more complicated and cumbersome task to achieve using data-driven models. To be honest, I have never seen something like this (but it does not mean that it is not possible).

Neither witches nor muggles

It does not make sense asking which model is the best. In fact, models can use elements of both worlds. It always depends on the goal. Libor market model, for instance, might be considered the ultimate approach for interest rate-based derivatives. It uses market prices to obtain implied volatility and at the same time uses historical correlation data between forward rate of different tenors.
Another interesting example in credit modeling. One could use some fundamental and historical analysis to estimate creditworthiness of a company. But one could also calibrate the hazard rate of a company (associated with default probability) using liquid instruments that relies on the trustfulness of the market on a company. These are the Credit Default Swap (CDS), as detailed exemplified by Brigo and Mercurio in their must read book.

I hope you enjoyed the post. Leave your comments and share.

May the Force be with you.

#=================================
Diogo de Moura Pedroso


Comments