About our Backtests (the Nuts & Bolts):
What assets do you trade?
We represent each asset class using the most liquid ETF in that space. For example, when testing a strategy that trades the S&P 500, we apply the strategy to the ETF SPY, because it’s the largest and most liquid. In most cases however, there is at least one ETF that has performed similarly, and could be used as a replacement with a negligible effect on performance. For example, IVV has been a close replacement for SPY.
In some instances, vehicles besides even ETFs, such as mutual funds or futures, would have also performed similarly. We do not list every possible alternative asset, but key considerations would include:
- How well the asset tracks the underlying asset class or index
- Liquidity and ease of trading
- Any additional costs associated with the asset, such as a significant increase in the expense ratio, penalties for over-trading in the case of mutual funds, and transaction fees or slippage beyond what we’ve accounted for in our tests. See Backtest Assumptions.
(*) We use the term ETF here for simplicity’s sake to mean either an ETF (exchange traded fund) or, less commonly, an ETN (exchange traded note).
When do you trade?
We assume that all strategies trade at the market close (4:00 pm ET). We update strategies for members throughout the trading day in near real-time, so that members are prepared in advance for what each strategy, as well as their own custom model portfolio, will signal for that day’s close.
This raises the possibility that a strategy could change position in the final moments of the trading day, and that a trader might not have time to execute the correct trade. In practice however, given the slow moving nature of the strategies that we track, this is a very infrequent occurrence, and historically, any drift introduced by correcting for such discrepancies when the market reopens, has had a negligible impact on returns over the long run.
Why do we assume trades are executed at the close, as opposed to the next open or at some other point during the day? Because it allows us to extend our backtests much further into history (see Simulated Asset Data). All things being equal, having more historical data to test is better than having less, as it allows us to see how the strategy has performed during a broader variety of market conditions.
What strategies do you track?
Click for the full list of strategies that we track.
We are continuously improving AllocateSmartly for our members, so expect this list to continue to grow in the months and years ahead.
What is a “custom model portfolio”?
Members are able to create their own custom model portfolio.
A member’s model portfolio is like any other portfolio, but rather than selecting assets to trade, the member selects individual strategies and how much to allocate to each. Members can backtest how the model portfolio would have performed historically, and then follow the portfolio in near real-time.
When following their model portfolio, members see the weighted asset allocation of the underlying individual strategies.
For example, consider a portfolio with just two strategies: half of the portfolio is allocated to Strategy A and half to Strategy B. If today Strategy A is 100% long SPY (S&P 500), and Strategy B is 100% in cash, the model portfolio would call for 50% SPY and 50% cash.
This is an extremely oversimplified example. Imagine a more realistic scenario, with a portfolio of 10 or 20 individual strategies, each trading a diverse array of assets. Keeping track of all those disparate strategies and diverse assets without the benefit of a tool like ours, would be unmanageable.
Note: Custom model portfolios are only available to paid members. Free members are provided with a sample model portfolio.
Why are your results different than I’ve seen elsewhere?
Of course, we can never 100% rule out the possibility that we’ve interpreted a strategy incorrectly. We do our very best to stay true to developers’ original design, but many of the strategies that we track are complex, and it’s always possible that we’ve made a mistake.
Having said that, generally speaking our results are more pessimistic than you’ll find elsewhere because our test assumptions tend to be stricter. Some examples:
- We account for transaction costs, which developers often do not. See Backtest Assumptions.
- When calculating indicator values, we assume that an investor did not know about a dividend until 3-days after the ex-dividend date. See Raw vs Dividend-Adjusted Data.
- When simulating historical asset data (read more), we only use indices or other data sources that are closely related to a large, liquid ETF trading today. Developers will often use data from French, Ibbotson, etc., and while those data sources have value in terms of strategy development, they often do not translate in to large, liquid ETFs that can be traded in the real-world today. We discuss this issue further on our blog: The Perils of Backtesting with Unrealistic Data
These types of strict assumptions mean that our tests are often more pessimistic than you’ll find elsewhere. We’re okay with that. We believe that it’s in our members’ best interest to see these strategies through as realistic of a lens as possible.
Why are your backtests shorter than I’ve seen elsewhere?
In most cases this is because we’ve taken a stricter approach to simulating historical asset data, i.e. when simulating data from prior to the launch of a given ETF (read more).
When simulating historical asset data, we only use indices or other data sources that are closely related to a large, liquid ETF trading today. Developers will often use data from French, Ibbotson, etc., and while those data sources have value in terms of strategy development, they often do not translate in to large, liquid ETFs that can be traded in the real-world today. We discuss this issue further on our blog: The Perils of Backtesting with Unrealistic Data
That means that our data set is often smaller than you’ll find elsewhere. We’re okay with that. We believe that it’s in our members’ best interest to see these strategies through as realistic of a lens as possible.
In order to provide an apples-to-apples comparison between strategies, we make certain simplifying assumptions that all backtests share, unless specifically noted otherwise. We assume that:
- All strategies trade at the market close (4:00 ET). Read more about why we make this assumption, potential issues it creates, and how we mitigate those issues.
- Transaction fees plus slippage total 0.10% per trade (0.20% round-trip). This may be too high or too low given the size of the trading account and the broker used, but should be in the ballpark for most investors. There are two factors that tend to keep trading frictions low for the type of strategies that we track: (1) the assets traded tend to be large and liquid, and (2) all strategies trade at the close, meaning that often MOC orders can often be employed, where slippage is usually minimal or non-existent.
- Both dividends and gains are reinvested.
- Return on cash (i.e. the return on any portion of the portfolio not invested) is equal to the 3-month US Treasury rate.
- We do not account for taxes, as this is highly specific to the individual.
Simulated Asset Data
In order to extend our backtests as far into the past as possible, we often make use of simulated data.
For example, a strategy trading the S&P 500 ETF SPY could only be tested back to 1993, when SPY began trading, using actual data. But using alternate data sources, such as the S&P 500 mutual fund VFINX, we’re able to extend that data much further into the past.
This is inappropriate for hyperactive strategies that rely on small price changes to generate return, because this simulated data is not sufficiently accurate. It is also inappropriate for very illiquid assets where, had traders actually been trading the asset, it would have significantly moved the price. In our case, neither is true. The strategies that we track are relatively slow moving, capturing broad trends rather than quickly capturing small price changes. And the assets those strategies trade tend to be very broad in nature, like stock indices, bond indices, or gold.
In short, we believe that the benefit of having more data to consider far outweighs the potential drawbacks of using simulated data.
Note that when creating simulated data, we apply an expense ratio equal to that of the most liquid similar ETF. So for example, when simulating data for an S&P 500 ETF, we apply the same expense ratio as SPY, the most liquid ETF in that space.
Also note that we tend to be much stricter when simulating asset data than you’ll see elsewhere. We discuss this issue further on our blog: The Perils of Backtesting with Unrealistic Data
Monthly vs Daily Asset Data
Many of the strategies that we track were designed to only trade on the last day of the month. In these instances, we usually show two versions of our backtests, one performed on monthly data (i.e. using month-end values only) and the other on daily data. Each has unique benefits.
We use monthly data because it often allows us to extend our test farther back into history (see Simulated Asset Data). For example, the real estate ETF VNQ began trading in 2004. The underlying index that VNQ tracks, the MSCI US REIT Index (Bloomberg: RMS/RMZ), is available as daily data from 1995. But the FTSE NAREIT Index (Bloomberg: FNER/FNERTR), a very close proxy, is available as monthly data from 1972.
Even when backtesting with monthly data, we provide a second test over daily data. The daily data backtest may not extend as far back into history, but it allows us to show how the strategy has performed when trading on other days of the month. If a strategy has performed poorly on other days of the month, it could be a sign that the strategy is overfit and unlikely to perform as expected in the future.
Note that when adding these monthly trading strategies to your custom model portfolio, you have the option of either trading at month-end (the default), or choosing an alternate trading day.
Also see: Normalizing the Days of the Month
Raw vs Dividend-Adjusted Data
Prices that you see quoted in the financial press are often actually raw prices (also known as cash or nominal prices), meaning they don’t account for an important driver of total return: dividends. The backtested returns that we show on this site are always adjusted for dividends.
A more murky issue is whether to use raw or dividend-adjusted data when calculating the indicators that each strategy employs. To illustrate, consider a strategy that goes long asset X when asset X closes above its 200-day moving average (MA). To truly capture what the 200-day MA is intended to capture (i.e. the average value of asset X over the last 200 days), we should use dividend-adjusted data. The problem is that there is often a delay in that dividend adjustment being reflected in past and current prices by data providers. So if we were to assume that, historically, the 200-day MA had always accounted for all of the dividends that we know about today, we might be introducing look-ahead bias, meaning our test is based on data that wasn’t easily available at that moment in time.
Yes, an investor could spend a lot time manually monitoring and adjusting historical data in real-time, but practically speaking, this is difficult to do when trading quantitative strategies across a broad range of assets as we do on this site.
To control for this, when calculating historical indicator values, we assume that we did not know about a dividend until 3-days after the ex-dividend date. This is a very pessimistic assumption to be sure, but one that we believe more than controls for any potential look-ahead bias.
The good news is that this approach has had a negligible affect on long-term performance for the vast majority of strategies that we track. That’s because tactical asset allocation strategies, by their very nature, tend not to be overly sensitive to small differences in price.
Trades vs Rebalances
All of the strategies that we track include both trades and rebalances.
A “trade” is a change in the optimal allocation of a strategy. Over time though, the strategy will drift from that optimal allocation due to differences in each asset’s performance. A “rebalance” is done to bring the strategy back to that optimal allocation, even if the optimal allocation remains unchanged.
When tracking a strategy in near real-time, only trades (i.e. changes in optimal allocation) are signaled to members. Rebalance assumptions are explained in the strategy description and included in results, but are not signaled.
That’s because signaling rebalances would be impossible. We do not know when a member entered a strategy or the price at which they purchased assets. We leave it to members’ discretion to determine if and when to rebalance.
Normalizing the Days of the Month
Many of the strategies that we track were designed to only trade on the last day of the month. In most cases though, there’s nothing particularly special about the last day of the month, so we also show the results of trading on other days of the month as well.
The problem of course is that every month has a different number of trading days. For example, a member might opt to trade a strategy on day 20, but not every month has 20 trading days.
To account for this, we normalize the number of trading days each month to 21 (21 being the average number of trading days per month). Shorter months get stretched, and longer months get compressed to fit that number. We normalize the day of the month as follows (in Excel parlance for simplicity):
= Round ( ( Trading Day of Month / Total Trading Days in Month ) * 21, 0 )
Note that day 1 will always be the first trading day of the month, and day 21 will always be the last.
We understand that this might be a bit confusing initially, but it’s a necessary evil to ensure that we’re comparing trading day performance across months accurately.
Normalized Trading Days and Month-End Indicators
Backtests are further complicated when calculating month-end indicators (ex. a 10-month moving average) for other days of the month.
Consider a strategy that was originally designed to trade on the last day of the month using a 10-month moving average. If the member opted to instead trade that strategy on day 15 of the month, we would use the last 10 day 15 values to calculate that moving average.
In essence, we’re remaining as close as possible to the original intent of the developer, while still allowing members to trade the strategy on alternate trading days.
For a more in-depth discussion of alternate trading days, and how they can be used to better assess a strategy, please see our blog: Alternate Trading Days: An Important Analytical Tool
Benchmark Selection and Calculation
Throughout this site, we use a benchmark of 60% S&P 500 (SPY), 40% 10-year US Treasuries (IEF), rebalanced monthly. We selected this benchmark both because of its ubiquitousness in the industry, and because it’s a difficult bar to beat given the steady downtrend in Treasury rates over the last 30+ years.
There are two alternative benchmarks that we think also have value, but have opted not to use:
Benchmark to the Global Market Portfolio:
The global market portfolio (GMP) represents, in theory, all financial assets held globally, and we believe makes the ultimate benchmark. Unfortunately, in practice, the GMP is difficult to accurately calculate for the distant past with any granularity, because of a lack of data for many of the constituent asset class. In short, the GMP is an excellent benchmark, but one that is difficult to employ with very long backtests.
Create a custom benchmark for each strategy:
There is a school of thought that a benchmark should be directly related to the strategy in question. So if, for example, Strategy X rotates between five asset classes, the benchmark should be comprised of a static mix of those same five asset classes. By doing so, we are able to isolate the impact of the strategy itself. While we think this approach has value, we’ve opted to instead use one benchmark across all strategies to make it easier for users to make apples-to-apples comparisons between strategies.
Backtesting Your Custom Model Portfolio
A member’s custom model portfolio is like any other portfolio, but rather than selecting assets to trade, the member selects individual strategies and how much to allocate to each. See: What is a Custom Model Portfolio?
When backtesting your model portfolio, we make all of the same assumptions that we do when testing individual strategies, including transaction fees and slippage, dividends, return on cash, and taxes (see Backtest Assumptions).
In addition, there are two unique aspects to backtesting your model portfolio: portfolio rebalancing and the backtest start date.
We assume that a trader rebalanced between the individual strategies in the model portfolio at the close on the last trading day of the calendar month. This is in addition to any rebalancing done within the individual strategies themselves.
For example, assume a trader’s model portfolio consisted of 50% allocated to Strategy A and 50% to Strategy B. Over the course of a calendar month, because of differences in how each of those strategies performed, the portfolio is now 52% allocated to Strategy A and 48% to Strategy B. Our backtest assumes that at the close on the last trading day of the month, the trader rebalanced the portfolio back to 50/50%. We make the same assumptions about transaction costs and slippage for this special monthly rebalance as we do when testing individual strategies.
This special rebalance is a simplifying assumption to allow for an apples-to-apples comparison between portfolios, but practically speaking, very similar results would be achieved rebalancing much less frequently (quarterly, annually, etc.)
Backtest Start Date:
When backtesting individual strategies, we usually begin our test as far back in history as data allows. All things being equal, having more historical data to test is better than having less, as it allows us to see how the strategy has performed during a broader variety of market conditions.
When backtesting your model portfolio however, determining the start date is a bit trickier.
Consider a portfolio that’s 99% invested in Strategy A, which began trading in 1970, and 1% invested in Strategy B, which began trading in 2010. It wouldn’t make sense to begin a backtest of our combined portfolio in 2010, losing 40 years of historical data, because Strategy B likely had little practical impact on our results.
Our solution to this problem is to begin our portfolio backtests when either at least 80% of the required strategy data (weighted by the user’s allocation) is available, or January of 1990, whichever date is later.
For example, consider a portfolio split evenly between five individual strategies (i.e. 20% invested in each). When backtest data for at least four of those strategies is available (20% x 4 = 80%), we would begin the portfolio backtest. Until all five strategies became available, the unallocated portion of the portfolio (20%) would be split, weighted by the user’s allocation, between the first four strategies.
We set the additional requirement that the portfolio backtest begin on or after January of 1990 simply to prevent the backtest from jumping around too frequently as the user makes changes to their portfolio.
We understand that this might be a bit confusing initially, but it’s a necessary evil to provide users with the most consistent experience as they adjust and test their custom model portfolio.
Glossary of Statistics
This is a brief glossary of less commonly used statistics found on this site.
Drawdown Curve: For any given date, shows the percentage loss for the strategy relative to the strategy’s previous all time high. A value of -10% would mean that the strategy was down 10% from its previous all time high. A value of 0% would mean that the strategy was at a new all time high.
Longest Drawdown: The longest drawdown ever suffered by the strategy, measured from the start of the drawdown (i.e. the day of the previous all time high) until the end of the drawdown (the day a new all time high was recorded).
Max Drawdown: The worst loss ever suffered by the strategy, relative to a previous all time high. A value of -50% would mean that, at some point in the test, the strategy lost 50% of it’s value relative to its previous all time high.
Sharpe Ratio: A measure of a strategy’s historical return relative to volatility. Higher values are better than lower values. This is the most common measure of a strategy’s risk-adjusted performance. It’s often criticized for considering both upside and downside volatility equally.
Sortino Ratio: A measure of a strategy’s historical return relative to downside volatility (i.e. the volatility exhibited on just losing months). Higher values are better than lower values. It’s considered by some to be superior to the Sharpe Ratio because it excludes upside volatility.
Ulcer Performance Index (UPI): A measure of a strategy’s historical return relative to the length and depth of drawdowns. Higher values are better than lower values. This is the least commonly used of the risk-adjusted performance stats we provide, but we think it’s just as important, if not more. Read more about UPI.
% Time in Market: The percentage of days that the strategy had at least a partial position on. Put another way, the percentage of days when the strategy was not entirely in cash.