Synthetic Equity Market Data

Synthetic equity market data contains simulated time series of spot and option prices for a given asset. Spot is one-dimensional while options are defined on a high-dimensional grid of relative strikes (e.g. [80%, 90%, 100%, 110%, 120%]) and floating maturities (e.g [20, 40, 60, 120]). The time series is on daily interval.

Simulated data is generated by a machine learning model which is trained on data derived from historical spot and option prices. Historical prices are sourced from Bloomberg via RMDS. For spot, we adjust raw prices by removing dividend, borrow and rates impact. For options, an internal vol fitting process is used to convert raw prices to implied volatilities which are then transformed to discrete local volatilities (DLVs). The transformation is mainly to remove possible static arbitrage from the implied vol surface.     

The machine learning model is then developed using adjusted spot and DLVs data. In the pipeline, preprocessing is first done to compress high-dimensional data to some low-dimensional representations via an auto encoder. Neural network based generative model is trained on the low-dimensional data. The generative model takes inputs from random noise plus some initial state up to time t, and generates next state at t+1. The objective function is to minimize the distance between the generated (fake) and historical (real) conditional distributions. Once the model is trained, it can generate synthetic low-dimensional data, which is then reconstructed to high-dimensional data via the decoder in auto encoder. The generated high-dimensional data contains synthetic spot and DLVs. DLVs are then converted back to option prices.

The shape of the generated data set is (num_paths, num_days, num_variables). For example, if we want to simulate 10000 paths of an asset’s spot and call option prices for the next 252 days. Using the aforementioned option grid, the shape will be (10000, 252, 21) where 21 is for spot and 20 call options. By default we include put options too, so the shape will be (10000, 252, 41).

Synthetic Equity Market Data

Synthetic Equity Market Data


1.  Deep Hedging: Learning to Simulate Equity Option Markets.
M Wiese, L Bai, B Wood, H Buehler.

2.  Conditional Sig-Wasserstein GANs for Time Series Generation..
H Ni, L Szpruch, M Wiese, S Liao, B Xiao.

Would you like to know more about AI Research at J.P. Morgan?

For upcoming workshops and updates, visit: