This post gets a bit down into the weeds of how our backtest engine works, but we’ve received this question a few times recently, so we thought a blog post was in order for readers who are (like us) nerds.
We conform all of the strategies that we track to a common set of assets to represent each asset class. We override the specific asset that an author may have modelled and instead represent each asset class using the largest, most liquid ETF (or ETN) in that space.
For example, US aggregate bonds are always represented by AGG (as opposed to say, BND), and US large cap stocks are always represented by SPY (versus IVV, et al.) There are currently 37 assets in this common list, so it isn’t small mind you, but it is much smaller than if we allowed for a free-for-all.
We do this quite intentionally for two reasons. The first is practical, and the second analytical.
The practical reason:
We assume that members are combining multiple strategies together into what we call “custom model portfolios” (it’s a core feature of what we do). If we allowed authors to select a very specific asset to represent each asset class it would quickly become unmanageable as users added more and more strategies together.
Consider a member who has combined 3 strategies, each trading a different ETF to represent the S&P 500. That member’s combined portfolio could call for 3 different S&P 500 ETFs (increasing transaction costs and the likelihood of short-term cap gains), even though practically-speaking, we can expect nearly identical returns from all of them.
The analytical (and perhaps more important) reason:
Conforming strategies to a reasonable set of common assets is another protection against overfitting (along with other things we do like “alternate trading days“)
If a strategy sings on IVV but flops on SPY, that’s a pretty strong indicator that your strategy is overfitted to IVV and likely to disappoint out of sample. There’s just not enough informational difference between the two to justify one being a significantly better proxy than the other.
We can’t eat historical returns, so precisely matching an author’s historical record is of no value to our members. The only question that’s relevant is understanding how a strategy is going to perform in the future. And in order to do that correctly, we need to whittle the performance record down to the most robust results.
The devil in the details:
We wanted to keep this post short with the 1-2 punch laid out above, but there are always devils in the details:
- All of this is only true for strategies trading in the timeframe that TAA trades (usually monthly). If you’re trading some hyperactive scalping strategy, then yes, IVV can be massively different than SPY.
- We model each asset class using the largest, most liquid ETF/ETN in that space, but how an investor actually opts to execute a strategy is of course an entirely separate decision. There are a host of reasons why an investor might prefer one ETP over another within the same asset class (for example, because the investor’s broker gives preferential treatment to certain ETPs).
- Getting the number of assets in our common list just right is a challenge. Many asset classes can be divided and subdivided ever smaller. Why not expand our current list of 37 asset classes to 50 or limit it to 20? It’s part art and part science. For example, we do allow for separate developed international (EFA) and emerging markets assets (EEM), because there’s enough informational difference between the two to justify it. We don’t however allow for both a developed international (EFA) and world ex-US asset (ex. VEU), because there’s not. Have we set the boundaries correctly? We think so, but of course Allocate Smartly is constantly evolving and we’re always open to expanding the list if enough strategies call for it.