Misleading Effects of Backtest Overfitting in Investment Strategy

Backtesting is a process where a financial strategy is tested on a predictive model that is built based on historic data. The strategy is validated based on its performance on the model.

An article published in the Notices of the American Mathematical Society looks into this investment practice and highlights the perils of performing a large volume of backtest on a historical data set that it would skew the results and favor the data set but would not be as practical or efficient in another out of sample set.

This is known as backtest overfitting. Since the model is based on past financial portfolios and tweaked to single out the best performing stocks, it may not be an efficient one and may just end up highlighting the extremes and vagaries in the financial data.

The article, "Pseudo-Mathematics and Financial Charlatanism: The Effects of Backtest Overfitting on Out-of-Sample Performance" was written by David H. Bailey, Jonathan M. Borwein, Marcos Lopez de Prado, and Qiji Jim Zhu.

Founded in 1888 to further mathematical research and scholarship, today the more than 30,000 member American Mathematical Society fulfills its mission through programs and services that promote mathematical research and its uses, strengthen mathematical education, and foster awareness and appreciation of mathematics and its connections to other disciplines and to everyday life.

Misleading Practices

-Your financial advisor calls you up to suggest a new investment scheme. Drawing on 20 years of data, he has set his computer to work on this question: If you had invested according to this scheme in the past, which portfolio would have been the best? His computer assembled thousands of such simulated portfolios and calculated for each one an industry-standard measure of return on risk. Out of this gargantuan calculation, your advisor has chosen the optimal portfolio. After briefly reminding you of the oft-repeated slogan that "past performance is not an indicator of future results", the advisor enthusiastically recommends the portfolio, noting that it is based on sound mathematical methods. Should you invest?

The somewhat suprising answer is, probably not. Examining a huge number of sample past portfolios---known as "backtesting"---might seem like a good way to zero in on the best future portfolio. But if the number of portfolios in the backtest is so large as to be out of balance with the number of years of data in the backtest, the portfolios that look best are actually just those that target extremes in the dataset. When an investment strategy "overfits" a backtest in this way, the strategy is not capitalizing on any general financial structure but is simply highlighting vagaries in the data.

The perils of backtest overfitting are dissected in the article "Pseudo-Mathematics and Financial Charlatanism: The Effects of Backtest Overfitting on Out-of-Sample Performance", which will appear in the May 2014 issue of the NOTICES OF THE AMERICAN MATHEMATICAL SOCIETY. The authors are David H. Bailey, Jonathan M. Borwein, Marcos Lopez de Prado, and Qiji Jim Zhu.

Video: Backtesting for Options Traders

"Recent computational advances allow investment managers to methodically search through thousands or even millions of potential options for a profitable investment strategy," the authors write. "In many instances, that search involves a pseudo-mathematical argument which is spuriously validated through a backtest."

Unfortunately, the overfitting of backtests is commonplace not only in the offerings of financial advisors but also in research papers in mathematical finance. One way to lessen the problems of backtest overfitting is to test how well the investment strategy performs on data outside of the original dataset on which the strategy is based; this is called "out-of-sample" testing. However, few investment companies and researchers do out-of-sample testing.

The design of an investment strategy usually starts with identifying a pattern that one believes will help to predict the future value of a financial variable. The next step is to construct a mathematical model of how that variable could change over time. The number of ways of configuring the model is enormous, and the aim is to identify the model configuration that maximizes the performance of the investment strategy. To do this, practitioners often backtest the model using historical data on the financial variable in question. They also rely on measures such as the "Sharpe ratio", which evaluates the performance of a strategy on the basis of a sample of past returns.

But if a large number of backtests are performed, one can end up zeroing in on a model configuration that has a misleadingly good Sharpe ratio. As an example, the authors note that, for a model based on 5 years of data, one can be misled by looking at even as few as 45 sample configurations. Within that set of 45 configurations, at least one of them is guaranteed to stand out with a good Sharpe ratio for the 5-year dataset but will have a dismal Sharpe ratio for out-of-sample data.

The authors note that, when a backtest does not report the number of configurations that were computed in order to identify the selected configuration, it is impossible to assess the risk of overfitting the backtest. And yet, the number of model configurations used in a backtest is very often not revealed---neither in academic papers on finance, nor by companies selling financial products. "[W]e suspect that a large proportion of backtests published in academic journals may be misleading," the authors write. "The situation is not likely to be better among practitioners. In our experience, overfitting is pathological within the financial industry." Later in the article they state: "We strongly suspect that such backtest overfitting is a large part of the reason why so many algorithmic or systematic hedge funds do not live up to the elevated expectations generated by their managers."

Probably many fund managers unwittingly engage in backtest overfitting without understanding what they are doing, and their lack of knowledge leads them to overstate the promise of their offerings. Whether this is fraudulent is not so clear. What is clear is that mathematical scientists can do much to expose these problematic practices---and this is why the authors wrote their article. "[M]athematicians in the twenty-first century have remained disappointingly silent with regard to those in the investment community who, knowingly or not, misuse mathematical techniques such as probability theory, statistics, and stochastic calculus," they write. "Our silence is consent, making us accomplices in these abuses."

Quantum Day

10 April 2014

Misleading Effects of Backtest Overfitting in Investment Strategy