Part II, Classical TA Chapter 6

Trend Tools: Moving Averages, MACD, ADX

A moving average is a transformation of price. It cannot tell you something the price did not already say. The question is whether the transformation makes the message easier to read.

6.0Why this chapter exists

Moving averages, MACD, and ADX are the three most-used trend indicators in retail TA. They appear on more screens than any other tool. They also generate, on liquid futures intraday and traded as standalone signals, win-rate distributions that are not statistically distinguishable from coin-flip after slippage and commission. This is not a controversial claim; it is documented in the literature (Faber 2007 for the long-horizon ensemble case where the edge survives at portfolio level but not per-instrument; Hurst-Ooi-Pedersen 2017 for the institutional time-series momentum case which, again, is portfolio-level).

The reason this chapter exists, and is in the book at all, is that the indicators do have value. Not as crossover triggers, but as:

  1. Regime filters (price above/below a long MA).
  2. Dynamic deviation references (how far is price from its 20-period mean?).
  3. Decision-rule explicitness aids (an MACD cross gives a named moment when momentum changed; this is a discipline tool, not a profit signal).

This chapter treats the math, dismisses the mythology, and rebuilds each indicator as a context layer. It is shorter than the chapters before and after because there is less to say once the mythology is set aside.


6.1Moving averages: math and the smoothing question

Simple Moving Average (SMA)

SMA_t = (1 / N) × Σ price_(t−i+1) for i = 1..N

Equal-weighted average of the last N prices. The SMA has two well-known weaknesses: it gives equal weight to the oldest and newest data point in the window, and it has no memory beyond N bars (a single huge bar moves the SMA when it enters the window and again when it leaves, producing the famous "drop-off" effect).

Exponential Moving Average (EMA)

EMA_t = α × price_t + (1 − α) × EMA_(t−1)
where α = 2 / (N + 1)

Recursive weighting that emphasizes recent prices. The EMA has unbounded memory (every prior price contributes, with exponentially decaying weight). For the same N, the EMA reacts faster to current price changes than the SMA, at the cost of sensitivity to noise.

Wilder's smoothing

Wilder_t = ((N − 1) × Wilder_(t−1) + price_t) / N
which is equivalent to EMA with α = 1 / N

Wilder's smoothing is an EMA with a slower α (effectively a longer half-life for the same N). It is the smoothing used inside ATR, RSI, and ADX. The reason it appears so often is that Wilder calibrated his original RSI and ADX formulas with this smoothing; using a standard EMA with the same N produces materially different values.

Which to use

In practice: - SMA is fine for long horizons (200-day, 50-day) where the smoothing detail matters less than the round number. - EMA is the default for shorter horizons and for trend filters. The faster reaction is generally an advantage. - Wilder smoothing appears specifically inside indicators that Wilder originally defined; do not substitute it casually elsewhere.

The choice between SMA and EMA is not the source of any meaningful edge. Both produce qualitatively similar charts at the same N. Spend the analytical budget elsewhere.


6.2The crossover myth

The standard retail framing: "buy when fast MA crosses above slow MA, sell when it crosses below." The most-cited variants are the 9/21 EMA cross (intraday) and the 50/200 SMA cross ("golden cross" / "death cross", end-of-day).

What the evidence actually says

  • Faber (2007), "A Quantitative Approach to Tactical Asset Allocation": documents that a simple "be in the asset when above its 10-month SMA, be in cash otherwise" rule, applied across an ensemble of asset classes, produces favorable risk-adjusted returns. The edge is at the portfolio level, not per-instrument. Applied to a single liquid future, the rule is marginal.
  • Hurst, Ooi, Pedersen (2017), "A Century of Evidence on Trend-Following Investing": documents that time-series momentum (a generalization of MA cross) has produced durable returns across 67 markets and a century of data, but again at the portfolio level with diversification benefits, not as a single-instrument intraday strategy.
  • Practitioner backtests on liquid futures intraday: the 9/21 EMA cross on ES 5-min, traded mechanically with realistic slippage of 1 to 2 ticks per round-trip, produces win rates around 35 to 45% with R:R near 1:1, which is below break-even net of cost.

The mythology survives because: - Crossovers visually align with major trends in hindsight (selection bias). - Practitioners point to chart segments where it worked (cherry-picking). - The edge that does exist (portfolio-level time-series momentum at multi-month horizons) gets confused with the edge that does not exist (single-instrument intraday crossovers).

What MAs are useful for

Three roles:

Role 1: Regime filter. The 200-period MA on a daily chart, used as a binary directional filter, is the most defensible use of an MA. The rule "long-only when price > 200-day MA, short-only when price < 200-day MA" filters out roughly half of all setups, and the half it filters out has historically been the half with worse expectancy. This filter does not generate trades; it conditions other strategies.

Role 2: Dynamic deviation reference. "Price is 3% above its 20-period EMA" is information about the current trend's pace and overextension. On any given chart, the current price's distance from a long MA is a measure of stretch. Combined with regime classification (Chapter 2), it can flag overextended conditions where mean-reversion setups have higher expectancy.

Role 3: Decision-rule explicitness aid. An MACD cross gives a named moment when momentum changed. This is not a tradeable signal in isolation, but it is a useful tag for journaling: "I exited at the MACD cross because that was my predefined exit rule." The discipline of a named rule beats discretionary "I felt like it was time."


6.3MACD: the second derivative of price

Construction

MACD line = EMA(12) − EMA(26)
Signal line = EMA(9, of the MACD line)
Histogram = MACD line − Signal line

MACD is the difference between a fast EMA and a slow EMA, smoothed once more with a signal-line EMA. Mathematically, it is a low-pass-filtered first derivative of price; the histogram is something close to a second derivative.

What MACD captures

The MACD shows trend strength relative to a baseline of slower trend strength. When MACD is rising, price is gaining momentum relative to its 26-period mean. When MACD is falling, momentum is decaying.

What MACD does not capture

  • Direction (it does not distinguish a strong up-trend from a strong down-trend in the histogram's sign alone, since both can have the same magnitude relative to baseline).
  • Whether the move is structural (trend) or noise (range chop with momentum oscillating around zero).
  • Regime (it does not know whether the current market is the kind where MACD is informative).

MACD divergence

A widely-cited use: "bearish MACD divergence" (price prints a higher high, MACD prints a lower high) signals reversal. The empirical reality:

  • MACD divergence is approximately as informative as RSI divergence (Chapter 7), which is to say, valuable at structure, weak elsewhere.
  • MACD divergence in a strong trend is a continuation indicator more often than a reversal indicator; the trend extends through the divergence.
  • The few academic studies of MACD divergence (Chong and Ng, 2008; others) find weak in-sample effects and weaker out-of-sample.

The honest framing: MACD divergence is a variant of the divergence concept, not a separately powerful signal. Treat it as one of many possible divergences (CVD divergence is more directly informative, per Chapter 3 and Chapter 11).


6.4ADX: trend strength without direction

Construction

The ADX is constructed in three steps:

  1. Compute +DM and −DM (directional movement up and down) per bar.
  2. Compute +DI and −DI as smoothed ratios of +DM and −DM to true range.
  3. Compute DX = |+DI − −DI| / (+DI + −DI) × 100, then smooth (Wilder) over N=14 to get ADX.

ADX is bounded in [0, 100]. It captures trend strength by comparing the magnitude of directional movement on one side versus the other, normalized by total range.

How to read it

  • ADX < 20: ranging or weak trend.
  • ADX 20 to 25: transitional.
  • ADX > 25: trending.
  • ADX > 40: strongly trending; often pre-exhaustion.
  • ADX > 50: very strong; rare and frequently a sign of a near-term peak.

ADX is direction-blind

This is the key conceptual point: ADX rises during both up-trends and down-trends. It tells you a trend is in effect, not which direction. To use ADX directionally, pair it with the +DI / −DI separation (or just look at price's recent sign).

Where ADX is useful in the stack

ADX is one of the four components of the regime composite from Chapter 2 (alongside ATR percentile, BBW percentile, KER). Its role there is as a tiebreaker for "trend strength" when KER and ATR disagree. Its threshold of 25 is widely used in practitioner literature, and using it as the regime cutoff aligns the indicator with the broader community's reference.

ADX is mostly not useful as a standalone trade trigger. "Buy when ADX crosses 25 from below" is a backtest-favorite that does not survive walk-forward validation on liquid futures intraday; the cross itself is lagging, and by the time ADX confirms the trend, the move is well underway.


6.5The 200-day MA filter, treated honestly

The rule

"Trade long-only when price closes above the 200-day SMA on the daily chart; trade short-only when it closes below." Apply this filter to whatever entry strategy you use.

What it accomplishes

The filter aligns trades with the long-term trend. In bull-market regimes, it restricts shorts to opportunistic counter-trend setups; in bear regimes, the symmetric.

Empirical performance

When applied as a filter on top of an otherwise neutral strategy on liquid futures (ES, NQ daily), the filter has historically improved Sharpe by 20 to 40% relative to unfiltered, primarily by avoiding catastrophic counter-trend exposure during major bear moves. The filter is not the strategy; the strategy is the entry method, the filter is what keeps the strategy from being wrong-way during regime trouble.

Limitations

  • Whipsaws around the MA in transitional periods. Price oscillates above and below the 200-day at regime turns; the filter alternates between long-only and short-only multiple times. The fix: require a buffer (e.g., price must be at least 1% above to be "long-only," at least 1% below to be "short-only," and "neutral" in between).
  • Lag. The 200-day SMA reacts slowly to regime changes. In a 2008 or 2020-style downturn, the filter remains "long-only" for weeks into the move. The fix: combine with a faster filter (50-day) for confirmation.
  • Per-instrument basis. A 200-day MA filter on a single contract is less informative than the same filter applied across an ensemble of instruments. The portfolio-level edge documented in Faber (2007) is what actually compounds.

The filter combined with the regime composite

The 200-day MA is a long-horizon regime filter. The composite from Chapter 2 is a real-time regime classifier. They serve different purposes:

  • 200-day MA: directional bias for the swing-trading or end-of-day decision.
  • Composite (ATR, BBW, KER, ADX): regime label for the intraday tactical decision.

A trader can use both. Start with the 200-day MA to set directional bias for the day, then use the composite to select which playbook to load when the session opens.


6.6The traps of trend tools

  1. Crossover system worship. The 9/21 EMA cross is plotted on more retail charts than any other indicator. It does not produce a sustainable edge on liquid futures intraday after slippage. Use MAs as filters and references; do not trade their crosses mechanically.

  2. MACD divergence at random points. Divergence at structure is informative; divergence in the middle of a range or trend means little. The distinction matters.

  3. ADX as a directional signal. ADX rises on both directions of trend. Reading ADX without the +DI/−DI split, or without price context, leads to misclassification.

  4. Over-fitting MA periods. Some traders backtest 7/15 EMAs vs 9/21 vs 8/17 looking for the "best" cross. The differences are noise; the edge is not in the parameter choice but in whether the cross system has an edge at all (it generally does not for intraday).

  5. Confusing the long-horizon trend-following edge with intraday MA crosses. Time-series momentum at multi-month horizons across an ensemble of futures markets has documented institutional pedigree. That fact is not evidence that 9/21 EMA crosses on ES 5-min produce returns. The horizons and instrument-counts are completely different.

  6. Treating a flat MA as no information. A flat 50-day MA on a chart that is also showing range-bound action confirms the range regime. The flat MA is signal, not absence of signal; it is the absence of trend.


6.7The integrated stack treatment

Where do trend tools fit in the institutional stack from §10 of the Research Summary?

  • Layer 6 (Regime composite): ADX is one of four inputs.
  • Outside the listed layers: the 200-day MA filter is a directional-bias overlay, not part of the structural stack. It serves as a binary filter on whatever the stack produces.
  • Not in the stack: 9/21 EMA crossover signals, MACD signal-line crosses traded mechanically. These do not earn screen real estate.

The right way to think about trend tools: they are secondary. The primary information is structure (Chapter 4), levels (Chapter 5), volume (Chapter 9), VWAP (Chapter 10), and order flow (Chapter 11). Trend tools are filters and references that condition the primary signals.


6.8Diagram concepts referenced in this chapter

  • D6.1: SMA vs EMA vs Wilder smoothing comparison. Three lines on the same price series at the same N, showing the difference in reactivity.
  • D6.2: 9/21 EMA cross system performance plot. Equity curve from a backtest of the mechanical crossover on ES 5-min over 5 years, with realistic slippage. The flat or slightly-negative line is the diagnostic.
  • D6.3: MACD construction visualization. Three sub-panes: price with EMA(12) and EMA(26); the MACD line (their difference); the histogram (MACD minus signal line). Illustrating the layered derivative structure.
  • D6.4: ADX direction-blindness illustration. Two panels, one up-trend and one down-trend, both showing rising ADX. The reader should see that ADX alone does not distinguish.
  • D6.5: 200-day MA regime filter overlay. A multi-year ES daily chart with the 200-day SMA and shaded regions for "long-only" / "neutral" / "short-only" based on the buffered filter.


6.10Exercises

Exercise 6.1: 9/21 EMA cross backtest. Implement a 9/21 EMA crossover system on ES 15-min over the most recent 3 years. Apply realistic slippage (1.5 ticks per round trip) and commission ($2.50 per contract round trip). Compute equity curve, win rate, average R, and Sharpe. The result should be approximately break-even or slightly negative; if it is materially positive, examine your slippage model for optimism.

Exercise 6.2: 200-day MA filter overlay. Take any entry strategy you currently use on ES daily. Add the 200-day MA filter (long-only above, short-only below, with a 1% buffer for "neutral"). Compute the difference in equity curve and Sharpe with and without the filter over 5 years. The filter should improve Sharpe; quantify by how much.

Exercise 6.3: MACD divergence at structure vs elsewhere. On a recent NQ session, identify five MACD divergences. For each, classify whether it occurred at a high-score level (per Chapter 5) or in the middle of a range. Tabulate the subsequent price reaction within 20 bars. The conditional reaction rate at structure should visibly exceed the rate elsewhere.

Exercise 6.4: ADX threshold sensitivity. Plot the regime composite from Chapter 2 with the ADX threshold varied across {20, 22, 24, 25, 28, 30}. For each threshold, compute the percentage of bars classified as "Trend" over a 60-day window. The plateau (range of thresholds where the percentage is stable) is your robust default.

Exercise 6.5: SMA vs EMA in the regime composite. Substitute SMA for EMA in the composite's filter logic. Re-run the regime classification over 60 sessions. Compare the regime tags from SMA vs EMA. The differences should be small; if they are large, the composite is over-sensitive to the smoothing choice and needs rebuilding.


Next chapter: oscillators, RSI, Stochastics, CCI, and the regime-conditioning rule that makes them tradeable.