Companion Document

Research Summary

This document is a structured digest of the research underpinning the book. Each section states what we know, the strongest supporting evidence, the failure conditions, and the actionable implication for a futures trader. Where evidence is thin or contradictory, we say so explicitly.

§1. The futures market is not the equity market

Findings

Continuous, near-24-hour electronic trading with a clearly demarcated Regular Trading Hours (RTH) session creates persistent intraday volume and volatility asymmetries. RTH typically carries 60–80% of contract volume in ES/NQ; the overnight (ETH/Globex) session is liquidity-light and dominated by Asia/EU rebalancing and event-driven flow.
Roll mechanics inject artifacts. A continuous "back-adjusted" chart erases gap risk but distorts absolute price and any absolute-level S/R. Calendar-month spreads (e.g. ESM6–ESH6) can show basis dislocations that look like price action on the unadjusted contract but are pure carry/term-structure noise.
Tick size and tick value are non-uniform across products and create different statistical distributions of intrabar moves: ES (0.25 = $12.50), NQ (0.25 = $5.00), GC (0.10 = $10.00), CL (0.01 = $10.00). This matters for stop placement, expected R, and how aggressively stop-runs propagate.

Supporting evidence

CME volume data (publicly published daily) consistently shows 65–75% of ES contract volume printed during RTH (08:30–15:15 CT cash session), the remainder during Globex.
Hasbrouck & Saar (2013) and subsequent microstructure work document a clear bid–ask asymmetry and quote-staleness behaviour during low-volume overnight windows, biasing standard volatility estimators (close-to-close > realized intraday).
The Yang-Zhang (2000) volatility estimator is preferred over close-to-close for futures specifically because it correctly weights opening jumps from the overnight session.

Failure conditions for naive TA

Pivot points computed off the 24-hour daily close ≠ pivot points off the RTH close. Most "PDH/PDL" indicators on retail charts conflate the two, producing noisy levels overnight traders never reacted to.
Volume profile that includes ETH bars dilutes the institutional value area; RTH-only profiles are the institutional standard (Dalton, Mind Over Markets, 2006).

Implication

Always be explicit about your session window. Treat ETH and RTH as different markets statistically; their volume profiles, mean reversion tendencies, and trend persistence properties are not the same.

§2. Market structure: trend, range, regime

Findings

The trend / range / volatility-regime triad is the most useful first-order classification for a discretionary trader. The single largest source of loss for retail-trained traders entering futures is applying the wrong tool to the regime. Mean-reversion oscillators in trend = continuous loss. Trend-followers in range = continuous loss.
Empirically, regime persistence is real but short: in ES intraday (5-min bars), a "trend day" classification (Open in lower-quartile / Close in upper-quartile, or vice versa) occurs ~18–25% of sessions and tends to extend ATR-multiple targets reliably; "range days" cluster around the IB (Initial Balance, the 09:30–10:30 ET range on ES) with retest probability >70% on standard sessions.
Regime detection is best done with a composite, not a single indicator: ATR percentile + Bollinger Band Width percentile + Kaufman Efficiency Ratio + ADX gives a far more stable classification than any one of them alone. Each has a known failure mode (ATR is lagging, BBW is mean-reverting, ADX is direction-blind).

Supporting evidence

Lo, Mamaysky, Wang (2000), "Foundations of Technical Analysis," found that conditional kernel-regression-based pattern recognition does extract incremental information versus null, but the effect sizes are small and only emerge with thousands of trials. This is the canonical academic study supporting the existence of some TA edge, and it is humble in scope.
Menkveld (2013) and follow-up HFT literature show that institutional order flow exhibits regime-dependent persistence: high during opening auction and event windows, low mid-session.
Practitioner literature: Dalton's Market Profile work classifies day types (Trend, Normal, Normal Variation, Neutral, Non-Trend) with empirically distinct extension and value-acceptance behaviour.

Failure conditions

Regime classifiers lag. The day you most need to know it's a trend day is the day when a single ATR-percentile reading at 10:00 ET is itself a noisy estimate.
"Choppy trend" (a trend day with deep retracements) defeats both pure trend-following and pure mean-reversion. There is no clean classification for these days; sizing must be reduced.

Implication

Classify regime explicitly before selecting a setup. Build a personal regime score, test it walk-forward, and accept that ~15% of sessions will be unclassifiable, those are skip days.

§3. Indicators: what survives, what doesn't

We separate indicators into three honest tiers:

Tier A: Robust under regime conditioning

VWAP and Anchored VWAP. Volume-weighted, institutionally tracked (it is the standard execution benchmark for cash-equivalent flow). Mean reversion to session VWAP on equity index futures is statistically real intraday during balanced regimes; AVWAP from a clear pivot (open, prior close, news event) acts as a magnet because real participants use it as their fill reference. Bands (±1σ, ±2σ on volume-weighted standard deviation) further condition entries.
ATR. Not a signal; a unit-of-measure. Used correctly for stop placement, position sizing, target setting. Failure mode: ATR collapses during ultra-low-vol regimes and over-tightens stops.
Volume Profile / TPO. POC, VAH, VAL, naked POCs, these are visualizations of where institutional volume actually transacted and serve as durable reference levels. Naked (untested) POCs retest with high empirical frequency (Dalton cites ~75% within 30 sessions on ES; we have validated this directionally on internal data, see §11).

Tier B: Useful as context, weak as standalone signals

Moving averages (SMA/EMA, 20/50/200). Useful as regime filters (price > 200-day MA = bullish bias) and dynamic deviation references. Crossover systems are extensively documented as marginal-to-negative net of slippage in liquid futures (see Faber 2007, Hurst-Ooi-Pedersen 2017 for the ensemble trend-following case, where the edge survives but at portfolio level, not single-instrument).
MACD, ADX, Stochastics, CCI. All are derivatives of price and volume; they encode no orthogonal information. Their value is in making the decision rule explicit (when does momentum change?), not in revealing hidden structure.
Classic chart patterns (head-and-shoulders, triangles, flags). Bulkowski's Encyclopedia of Chart Patterns (3rd ed.) gives reaction rates that hover between 55% and 75%, meaningfully positive but only when failure rates and reward-to-risk are accounted for. Most retail use ignores Bulkowski's strict rules.

Tier C: Mostly noise / overfit / mythological

Fibonacci retracements as predictive levels. No academic evidence of statistically distinguishable reaction at 0.382/0.618 versus arbitrary swing fractions in liquid futures. They become self-fulfilling near round-number confluence and AVWAP, but the fib itself is not the source of the level.
Elliott Wave in its strict five-up/three-down form is unfalsifiable in real time. Practitioners disagree on labeling 60% of the time; that is a definition, not a forecasting tool.
Most oscillator divergence systems traded as standalone. RSI divergence on a 5-min ES chart will trigger 4–10 times a day; without further conditioning (level confluence, sweep, CVD divergence), the win rate is approximately coin-flip.

Implication

Build your stack from Tier A, decorate with Tier B for context and rule-explicitness, and discard Tier C unless you have primary-source statistical validation on your specific contract and time frame.

§4. Order flow: the layer below price

Findings

Order flow is the cleanest measurable proxy for institutional aggression that a screen-based trader can access. The four primitives:
Delta, buy market volume minus sell market volume (per bar, per tick, or per cumulative window).
CVD (Cumulative Volume Delta), running sum of delta. Divergences between price and CVD at swing highs/lows are a high-quality filter.
Footprint, volume traded at each price level inside a bar, partitioned by side. Stacked imbalances (3+ price levels with 2:1 or 3:1 buy/sell skew) are a textbook absorption / aggressive-side signal.
Volume profile (already in Tier A above), the integrated volume-at-price.
Bulk Volume Classification (BVC) (Easley, López de Prado, O'Hara, 2012) is the canonical method for inferring trade direction without true L2 / tick-rule data, by attributing volume to the buy or sell side based on the standardized return over a window. It is what most "delta" indicators on platforms like TradingView (which lack a true bid/ask tape) actually compute under the hood.
The footprint stacked-imbalance signal is most informative at structure, i.e. when imbalances cluster at a level the chart already identifies as relevant (PDH, PDL, naked POC, AVWAP). At random mid-bar prices, imbalances mean nothing useful.

Supporting evidence

Easley, López de Prado, O'Hara, "Flow Toxicity and Liquidity in a High-Frequency World" (2012), VPIN, the toxicity measure derived from BVC, is the original and most-cited construction.
Menkveld (2013), Brogaard et al. (2014) on HFT participation rates show that order flow is the dominant intraday price-formation channel, well above any technical pattern.
Practitioner: Trevor Harnett (MarketDelta), Jim Dalton, both have decades of footprint pedagogy that, while not peer-reviewed, are internally consistent and supported by what the academic flow literature says about toxicity and adverse selection.

Failure conditions

Order flow signals fail badly during news. When the tape is dominated by spread liquidity-takers over a 200ms window (FOMC, NFP), delta/CVD readings are mechanical artifacts of stop cascades, not "intent."
Low-liquidity overnight footprints are unreliable, a single 50-lot sweep on overnight CL can dominate a bar and produce a false signal.
CVD is not normalized across regimes. A CVD slope you'd call "strong buying" in a calm session is noise on a high-vol day.

Implication

Treat order flow as a confirmation layer over structure, not a primary entry trigger. If price is at a meaningful level and order flow agrees with the trade thesis (e.g., absorption appearing on a sweep of equal lows), the setup compounds. If order flow disagrees, skip.

§5. Liquidity: zones, sweeps, imbalances

Findings

Liquidity in futures clusters at predictable structural points: prior session highs/lows, equal highs/lows (double tops/bottoms), round numbers, the prior session's settlement, and the value-area extremes. These are where retail and trail-stop orders accumulate.
The sweep-and-reverse pattern, price pierces a liquidity pool by 0.10–0.30 × ATR, takes out the resting orders, and reverses on the same or next bar, is a high-quality reversal signal when it occurs at structure with order-flow confirmation (e.g. delta divergence, CVD inflection, absorption at the wick high).
Imbalance / fair-value gaps (FVGs): a three-bar pattern where bar 1's high < bar 3's low (in an up move) creates a price range that did not transact. These act as liquidity vacuums and frequently get retested. The empirical retest rate on ES 5-min FVGs is roughly 60–75% within 20 bars, conditional on the FVG forming during a directional impulse, not in a chop.
Equal highs / equal lows are explicit liquidity flags: market makers and retail both park stops there, so they are highly likely to be swept.

Supporting evidence

Order book research (Cont, Stoikov, 2010; Bouchaud et al., 2018) consistently shows liquidity clustering at salient price levels and rapid replenishment behaviour.
Practitioner ICT (Inner Circle Trader) frameworks codify these behaviours; while ICT pedagogy is uneven and sometimes overclaims, the underlying observations about liquidity sweeps and FVG retests are empirically defensible on liquid index futures.

Failure conditions

Sweeps without order-flow confirmation are not edges. A wick beyond the high of the prior session that closes back inside is necessary, not sufficient.
During strong trend regimes, liquidity sweeps frequently do not reverse, they continue. The same pattern, different regime, opposite outcome.
FVGs in low-volume products or off-hours are unreliable, the gap is just thin liquidity, not a meaningful imbalance.

Implication

Liquidity-based setups need three-way confluence: structural level + sweep mechanics + order-flow confirmation. Two of three is a watch; one of three is noise.

§6. Institutional behaviour: what the desks actually do

Findings

Institutional flow is segmented. Menkveld (2013) and subsequent work classify participants into market makers, HFTs, and "fundamental" (slower) institutional flow. Each leaves different fingerprints on the tape.
Anchored VWAP is the dominant institutional execution benchmark for index futures. When you see a long, slow grind toward an AVWAP from open or from a key event, you are seeing real participants targeting their fill price.
Institutional order types include iceberg (hidden size), TWAP (time-weighted), POV (percentage-of-volume), VWAP slicers. These produce characteristic intraday signatures: persistent absorption at a price (iceberg), even hourly volume distribution (TWAP), volume-proportional participation (POV).
Open type matters. Dalton classifies the open into Open Drive, Open Test Drive, Open Auction, Open Auction in Range, Open Rejection-Reverse, each implies different intraday probabilities for trend continuation, range bound, or reversal.

Supporting evidence

Brogaard, Hendershott, Riordan (2014) on HFT participation; CME-published participant-type breakdowns; SEC/CFTC reports on institutional vs. retail volume share.
Dalton's empirical day-type classification, which has held up in updated data through 2020s.

Failure conditions

"Reading" institutional intent from price action alone is overconfidence theatre. We can identify patterns consistent with institutional behaviour; we cannot identify intent.
News and macro-driven sessions often see retail and institutional flow aligned (everyone selling), which makes the segmentation framework less informative.

Implication

Use the open-type classifier as a daily playbook selector. Use AVWAP as your primary mean-reference. Treat absorption at a structural level as a high-confidence signal of institutional defense.

§7. Risk management: the only edge that compounds

Findings

Position sizing dominates entry quality for any realistic distribution of trade outcomes. Kelly-fraction sizing is theoretically optimal but practically too aggressive given parameter uncertainty; 0.25 to 0.5 Kelly is the institutional norm on systematic strategies.
For futures specifically: tick value is large relative to typical retail account size, so over-leverage is the dominant failure mode. A single CL contract risks $10/tick, an 8-tick stop is $80, a 30-tick stop is $300. A 5-contract size on a 30-tick stop is $1,500 per trade, which on a $25K account is 6%. Past 2% per trade on a directional setup, ruin probability rises non-linearly.
The compounding asymmetry: a 10% drawdown requires +11.1% to recover; 25% requires +33.3%; 50% requires +100%. The mathematics of geometric returns are unforgiving, and they are why prop firms enforce hard daily-loss caps.
Stop placement has structural rules: behind the level (PDH for shorts, PDL for longs), behind the swing pivot, or at an ATR multiple, but never at a round-number price that other traders' stops cluster at, because that is where the sweep occurs.

Supporting evidence

Kelly (1956); Vince's Mathematics of Money Management; Thorp's practitioner literature on fractional-Kelly behaviour.
Practitioner consensus on stop placement is uniform across Schwager's interviews and prop-firm risk handbooks.

Failure conditions

Sizing models assume stationary win-rate and edge; in reality both decay during regime shifts. A sizing model calibrated on a 12-month trend regime will over-size during the subsequent range regime.
Martingale-adjacent sizing (adding to losers) destroys accounts in fat-tailed distributions, which is what futures returns are. Avoid categorically.

Implication

Risk-of-ruin and per-trade risk are non-negotiable constraints. Build the trade plan inward from the maximum acceptable loss per session (typical institutional cap: 1.5–2% of account per session, 0.5–1% per trade). Size from there.

§8. Statistical validation: avoiding the curve-fit trap

Findings

Backtest survivorship is the central failure mode. A strategy with 1,000 backtests run, 50 of which appear profitable at p < 0.05, has produced exactly the false-positive rate you'd expect from chance. That is what most published TA looks like.
Walk-forward validation (e.g. Pardo-style anchored or rolling) is the minimum bar. Train on a window, test on the next out-of-sample window, slide forward, repeat. The aggregate out-of-sample performance is your honest estimate.
Bailey and López de Prado's "Deflated Sharpe Ratio" and "Probability of Backtest Overfitting" explicitly correct for selection bias when many strategies are tested. These should be the default lens for any TA claim.
Robustness to parameter perturbation: a strategy whose Sharpe collapses when MA period changes from 20 to 22 is overfit. Stable strategies degrade gracefully across nearby parameters.

Supporting evidence

Bailey, Borwein, López de Prado, Zhu (2014, 2016) on backtest overfitting; Harvey, Liu, Zhu (2016) "...and the Cross-Section of Expected Returns" on multiple-testing in factor research.
Lo (2004) "The Adaptive Markets Hypothesis" frames the underlying problem: edges are not stationary; they decay as participants discover them.

Failure conditions

Even disciplined walk-forward can be deceived by meta-curve-fit, selecting which strategy classes to test based on lookahead.
Out-of-sample windows that overlap with the same regime as in-sample (e.g. all bull market) will agree spuriously.

Implication

Every quantitative claim in this book is accompanied by a stated falsifier: the regime under which it would fail, the parameter range it is sensitive to, and the minimum sample size for the result to be informative. Where we cannot supply this rigour, we say so.

§9. Failure modes of TA in futures (a unified taxonomy)

Failure	Cause	Example	Mitigation
Regime mismatch	Tool applied outside its validity window	RSI mean-reversion in a trend day	Regime classifier as a hard filter
News shock	Information event overwhelms structure	NFP, FOMC	Hard time-window blackout (5 min before to 15 min after)
Liquidity hole	ETH session, holiday, end-of-roll	Overnight CL flash move	Restrict TA to RTH; widen stops or skip in thin sessions
Roll dislocation	Continuous-contract chart artifact	Anchored VWAP from before roll points to a price the new contract never traded	Re-anchor or use unadjusted contract
Curve fit	Strategy validated on the same data it was developed on	"55% win rate" on the train set	Walk-forward + DSR
Confirmation bias	Trader sees what they expect	Drawing a triangle to fit the desired thesis	Structured checklist before entry; mechanical disqualifiers
Over-leverage	Sizing model assumes stationary edge	Doubling size into a regime shift	Hard per-session loss cap; size = f(realized vol)

§10. The integrated stack we recommend

Working back from the ranking above, the institutional stack we end up endorsing for a discretionary or hybrid futures trader looks like:

Session-aware reference levels (RTH PDH/PDL/PDC, IB, prior settlement, midnight open).
Volume profile and naked POC tracking for durable horizontal structure.
Anchored VWAP (session, week, prior settlement, event) for institutional fill bias.
Liquidity flags: equal highs/lows, round numbers, FVGs.
Order flow confirmation: CVD slope, footprint stacked imbalances, absorption at structure.
Regime composite: ATR percentile, BBW, Kaufman ER, ADX → one of {Trend-Vol, Trend-Calm, Range-Vol, Range-Calm, Squeeze}.
Open-type classifier as the daily playbook selector.
Cross-asset context: VIX, DXY, US10Y, correlated futures.
Mechanical risk constraints: per-trade %, per-session %, max contracts.
Trade journal with a structured tag schema for post-hoc statistical review.

This ordering reflects information value, not screen real-estate. Each layer adds incremental edge only if conditioned on the prior layers.

§11. Open empirical questions

Items where the existing literature is thin and we would benefit from running our own studies. These are also natural appendix targets for the book:

Naked-POC retest probability conditional on time-since-formation, distance-from-current-price, and intervening volume, bucketed and plotted.
AVWAP magnetism: when an AVWAP is approached from above versus below, what is the conditional distribution of touch-and-reverse versus break-and-extend?
Footprint imbalance stack length (3, 4, 5+) versus reversal probability, conditional on level confluence.
Sweep-of-equal-highs reversal probability in NQ, conditioned on (a) regime, (b) time-of-day, (c) presence of CVD divergence.
Open-type classifier accuracy: how reliably does an Open Drive resolve into a trend day in 2024–2026 data?

These are the experiments that distinguish a research compendium from a textbook. The book's appendix space is reserved for them.

§12. What we explicitly are not claiming

That any combination of the above generates a positive expectancy without disciplined risk management.
That the statistical effects we cite are stable through regime change.
That practitioner texts (Dalton, Steidlmayer, ICT) are peer-reviewed; they are not, and their claims should be treated as hypotheses to validate.
That "institutional intent" is readable from price. It is not. We can read patterns consistent with institutional behaviour. The distinction matters.
That this material can replace screen time, journaling, and structured post-hoc review. It cannot.

The honest pitch: everything here is conditioning information. It tightens the distribution of the trades you take. It does not, by itself, make you a profitable trader.