The edge was never in the model
A machine-learning system can be genuinely right about Bitcoin's next move and still lose almost everything — because the cost of acting on each small correct call exceeds the call itself.
There is a vast academic literature in which someone trains a model to predict the next move in a market, runs a backtest, and reports a dazzling return — hundreds of percent a year, spectacular Sharpe ratios. A huge fraction of it shares one quiet, fatal omission: it ignores what it actually costs to trade. This paper takes machine-learning trading systems and asks the unglamorous question almost nobody asks properly. What survives once you pay realistic costs? The answer completely relocates where the edge lives.
A predictable return is not a profitable trade
That single sentence is the whole argument. Every time you buy or sell, you pay — exchange fees, the bid-ask spread you cross, slippage as your order moves the price. Small per trade, but a model that trades constantly pays them constantly.
The setup is rigorous, and the rigour is part of the story. The models forecast the next hour's Bitcoin return: up or down over the next sixty minutes, and by how much. Hourly crypto returns are notoriously noisy — a thin thread of signal buried in a great deal of randomness. That low signal-to-noise ratio is the battlefield.
To evaluate honestly, the authors use walk-forward testing. Train on a window of past data, tune on the next slice, test on a further slice the model has never seen — strictly later in time — then slide the whole window forward and repeat. They do this twenty-seven times, marching across the data in sequence. A single lucky train-test split can make a mediocre model look brilliant if it happens to land in one favourable regime; rolling forward forces it to prove itself across bull markets, bear markets, and the flat, grinding sideways stretches. They explicitly forbid ordinary random cross-validation — the default in most machine learning — because shuffling time-series data leaks information from the future into the past.
Costs are imposed at ten basis points per trade — a tenth of one percent — as an all-in proxy for fees plus spread plus slippage, set deliberately a touch above the cheapest exchange fees to be realistic rather than flattering. And the bar for genuine quality is a Sharpe ratio above one: return per unit of risk, high enough to suggest skill rather than lucky volatility.
The gut-punch
The data is about seventy thousand hours of Bitcoin from Binance, spanning roughly 2018 to 2026 — a period in which the price ranged from around three thousand dollars to over a hundred and twenty-six thousand, with every crash and rally in between. Three families of model are thrown at it: XGBoost, which builds ensembles of decision trees; an LSTM recurrent network; and an attention-based iTransformer. They trade the forecasts two ways — long-only, and long-short.
Look at the gross returns, before costs, and the models look phenomenal. XGBoost, long-only, returns over seventy-three percent a year. The iTransformer, long-short, returns over a hundred and eighty percent, with a gross Sharpe approaching three. Stop reading there and you'd think you'd found a money machine.
Now switch on the ten-basis-point cost. XGBoost's seventy-three percent collapses to minus sixty-four percent. The iTransformer's spectacular hundred-and-eighty becomes minus ninety-nine. Every single strategy in the study, long-only and long-short alike, flips from impressive gains to catastrophic loss. Most draw down nearly a hundred percent — wiped out.
Why? Turnover. The XGBoost strategy placed over ten thousand six hundred trades across the test period; the iTransformer placed nearly eighteen thousand. The cost drag worked out to somewhere between one and a half and six basis points per hour — bleeding, every hour, for years. The crucial diagnostic detail is that the models were not wrong. Their gross signals were genuinely positive; they had real predictive edge about direction. They didn't die from bad predictions. They died from trading too much. For contrast, a simple buy-and-hold — buy Bitcoin once and sit on it — barely trades, so costs barely touch it, and it returned around fifty-six percent over the same window. Before costs, the fancy models beat it. After costs, it obliterated them.
Being right on average is simply not the same thing as making money after the house takes its cut.
Picture a coupon that saves you a tenth of a percent — a genuine discount. But to redeem it you have to drive across town. Do that ten thousand times and you spend vastly more on petrol than you ever save. The forecast wins on paper; the act of acting on it loses in reality.
The boring knob that dominates
So is machine-learning trading hopeless once costs are real? No — and this is the constructive heart of the paper. The fix is a cost-aware execution filter, and it is almost embarrassingly simple.
The naive strategy trades on the sign of the forecast: predict a hair above zero, go long; a hair below, flip. It reacts to every tiny flicker around zero, churning constantly. The cost-aware filter instead trades only when the forecast is large enough to be worth the cost it will incur — with their settings, the model must predict a move bigger than about two-tenths of a percent before the strategy will bother to act. If the predicted edge can't clear the toll booth, you don't make the trip.
The effect is staggering, because nothing about the model changes — same XGBoost, same forecasts, same data. Only the decision of when to act changes. That XGBoost long-only strategy that lost sixty-four percent now returns positive sixty-five percent, with a Sharpe of one point zero nine. And the number of trades falls from over ten thousand six hundred down to two hundred and fifty-one. Same information, same prediction, but patience instead of twitchiness — and the result swings by well over a hundred percentage points. Their broader tests point the same way: the things researchers obsess over — architecture, loss function, the exact feature set — mattered far less, and far less reliably, than the simple execution rule. The boring knob dominated the exciting ones.
The honest punchline
The authors are scrupulous about what this does and does not show. Does the restored, profitable, Sharpe-above-one strategy actually beat buying and holding Bitcoin? Statistically — no. Under rigorous bootstrap significance tests, the cost-aware strategy matches buy-and-hold but does not significantly beat it in risk-adjusted terms, and its performance is uneven across market regimes. The achievement is not "we beat the market." It is the demonstration itself: that execution design, not model choice, was the binding constraint all along.
Be fair about the limits. The backtest models cost as a flat, fixed percentage; it does not simulate the fine grain of real execution — order-book depth, partial fills, spreads that widen in a panic. It is a single asset, so we shouldn't over-generalise. Performance depends heavily on regime, hourly trading invites overfitting, and the filter and selection choices add their own researcher degrees of freedom — many wins hold only in selected configurations. And, again, the best it manages is to match passive holding. This is a preprint.
Why it matters
The lesson, in the authors' own framing, is that the economic value of a forecast depends as much on execution discipline as on the forecasting model itself. We pour enormous effort into prediction — better architectures, bigger models, cleverer features — and this is a cold reminder that prediction is necessary but nowhere near sufficient. The identical gross signal went from minus sixty-four percent to plus sixty-five percent purely by trading two hundred and fifty-one times instead of ten thousand, without changing the model at all. The edge didn't live in the model. It lived in the restraint.
So when you next see a machine-learning trading result with a jaw-dropping return, ask two questions before anything else. Are transaction costs in there, realistically? And how much does it trade? A model can be genuinely, measurably right about the future and still lose money hand over fist, because acting on each small correct call costs more than the call is worth. The frontier of profitable systems isn't only sharper prediction. It's the discipline of knowing when a prediction is worth acting on — and when the smartest move is to sit still.