Back to Research

REGIME DETECTION

Part 5: Bull, Bear, or Sideways — Let the Models Decide

Omega Arena • February 2026 • IN PROGRESS

97
ASSETS
233K
ROWS
203
FEATURES
3
MODELS
Abstract. Part 4 trained models to predict price direction. Here a different approach is taken: regime detection. Instead of asking "will price go up tomorrow?", the question becomes "what kind of market is this?" Three models—HMM (unsupervised), Random Forest (supervised), and BiLSTM with Attention—work together to classify market conditions as Bull, Bear, or Sideways. With 100% hindsight-accurate labels and 203 features, these models learn from perfect historical truth.

TABLE OF CONTENTS

1. Why Regime Detection? 2. The Dataset: Enterprise-Grade 3. The Labels: 100% Hindsight Accuracy 4. Model 1: Hidden Markov Model 5. Model 2: Random Forest 6. Model 3: Bidirectional LSTM 7. Data Pipeline: 10 Checks 8. Infrastructure 9. Current Status 10. Next Steps

1. WHY REGIME DETECTION?

Part 4's price prediction models achieved AUC scores around 0.52-0.57. Better than random, but modest. The problem? Markets behave differently in different conditions.

A strategy that works in a bull market may fail catastrophically in a bear market. The same signal that means "buy the dip" in an uptrend means "catch a falling knife" in a downtrend.

Regime detection provides context. Instead of one model trying to predict everything, specialized models first identify the market state. Then other models can adapt their behavior accordingly.

The Three Regimes

RegimeDaily LabelWeekly/Monthly LabelDefinition
BULLUPBULLPositive % change
BEARDOWNBEARNegative % change
SIDEWAYSSAMESIDEWAYSZero % change

2. THE DATASET: ENTERPRISE-GRADE

No corners were cut. This dataset represents months of data engineering.

Data Sources

SourceFeaturesDescription
Level 143RSI, MACD, Bollinger Bands, ATR, ADX, Ichimoku, etc.
Level 218Sharpe, Sortino, VaR, CVaR, Max Drawdown, etc.
Level 352Volatility regimes, trend strength, fear/greed, exhaustion
Level 519VIX, DXY, SPY correlation, macro signals
Level 621COT data, yield curve, credit spreads
Level 715Context-aware: ATH %, 52-week range, halving cycle
FRED13Economic indicators from Federal Reserve
TOTAL203After processing: 235 (with one-hot encoding)

Temporal Split

SplitDate RangeRowsPurpose
Train2014-09-17 → 2023-12-31163,574Model learning
Validation2024-01-01 → 2024-12-3134,402Hyperparameter tuning
Test2025-01-01 → 2026-01-2635,531Final evaluation
No data leakage. Strict temporal ordering ensures models never see future data during training. Test data is truly unseen—from a year the models have never encountered.

Why Level 7 Context Features?

Standard technical indicators (Levels 1-3) capture short-term patterns: RSI, MACD, Bollinger Bands operate on windows of 14-50 days. But regime detection requires longer-term context. A price at $50K means something completely different when it's the all-time high versus when it's 50% below ATH.

The problem: Existing features couldn't answer questions like "Where is the market in the bigger picture?" Level 7 was engineered specifically to provide this missing context for regime classification.
Feature CategoryWhat It CapturesWhy It Matters for Regimes
ATH Context Distance from all-time high, days since ATH Bull markets push ATHs; bear markets drift away from them
52-Week Range Position within yearly high/low range Near yearly lows = possible accumulation; near highs = possible distribution
Period Returns YTD return, yearly return, multi-period momentum Regime persistence: bull years stay bullish, bear years stay bearish
Seasonality Day of week, month, quarter, quarter-end flags Historical patterns: "Sell in May", Q4 rallies, weekend effects
Halving Cycle Days since/until Bitcoin halving, cycle position % Crypto-specific: halvings historically correlate with bull market onsets

Without Level 7, models would see identical feature patterns at completely different market contexts. With Level 7, a -5% daily drop at ATH looks different from -5% drop at yearly lows—because it IS different.

3. THE LABELS: 100% HINDSIGHT ACCURACY

This is the key insight that makes regime detection different from price prediction.

Hindsight is 20/20. When labeling historical data, exactly what happened is known. If the price went up, it's UP. If it went down, it's DOWN. No thresholds. No guessing. The model's job is to learn which feature patterns correspond to which outcomes.

Label Distribution (Train Set)

Label TypeUP/BULLDOWN/BEARSAME/SIDEWAYS
Daily Direction~50%~49%~1%
Weekly Regime~51%~48%~1%
Monthly Regime~52%~47%~1%

4. MODEL 1: HIDDEN MARKOV MODEL (HMM)

Unsupervised learning. HMM doesn't see the labels. It discovers hidden states purely from the feature patterns.

Why HMM?

Configuration

ParameterValueReason
Features219 (numeric only)HMM requires continuous features
States to try2, 3, 4, 5Find optimal number via BIC
CovarianceDiagonalNumerical stability with many features
Iterations300Ensure convergence
Initializations10Avoid local minima

HMM Status

StepStatus
Data preparationDONE
Training scriptDONE
Training executionPENDING
State analysisPENDING

5. MODEL 2: RANDOM FOREST

Supervised learning. Random Forest sees the hindsight-accurate labels and learns to predict them from features.

Why Random Forest?

Configuration

ParameterSearch Range
n_estimators500 - 1,250
max_depth15 - 35, None
min_samples_split2 - 15
min_samples_leaf1 - 6
max_featuressqrt, log2, 0.3, 0.5
class_weightbalanced, balanced_subsample

Training Approach

Random Forest Status

StepStatus
Data preparationDONE
Training scriptDONE
Training executionPENDING
Feature importance analysisPENDING

6. MODEL 3: BIDIRECTIONAL LSTM + ATTENTION

Deep learning for sequences. LSTM processes 90 consecutive days and predicts the next day's regime.

Why LSTM with Attention?

Architecture

ComponentConfiguration
Input(batch, 90 days, 235 features)
LayerNormNormalize inputs
BiLSTM3 layers, hidden_size=256
AttentionLearn important timesteps
Shared Dense256 → 128 with dropout
Output Heads3 separate heads (daily/weekly/monthly)

Training Configuration

ParameterValue
Sequence length90 days
Batch size128
Epochs150 (with early stopping)
Learning rate1e-3
Dropout0.3
Early stopping patience20 epochs
OptimizerAdamW with weight decay
SchedulerReduceLROnPlateau
Class weightingInverse frequency

LSTM Data

SplitSequencesShapeSize
Train154,844(154K, 90, 235)13.1 GB
Validation25,942(26K, 90, 235)2.2 GB
Test27,020(27K, 90, 235)2.3 GB

LSTM Status

StepStatus
Data preparation (90-day sequences)DONE
Model architectureDONE
Training scriptDONE
Training execution (GPU)PENDING

7. DATA PIPELINE: 10 VERIFICATION CHECKS

Enterprise-grade data preparation means verifying everything multiple times.

#CheckResult
1DB columns match dataset✓ PASS
2All table features present✓ PASS
3Sample data matches DB✓ PASS
4Row counts match✓ PASS
5No empty columns✓ PASS
6Data types correct✓ PASS
7No duplicates, price integrity✓ PASS
8Date continuity✓ PASS
9Random sample cross-validation✓ PASS
10Label consistency (sign = direction)✓ PASS
10/10 checks passed. The dataset is verified, cleaned, and ready for training.

8. INFRASTRUCTURE

Training requires significant compute resources.

ModelComputeMemoryEst. Time
HMMCPU~4 GB RAM1-2 hours
Random ForestCPU (all cores)~8-16 GB RAM2-4 hours
LSTMGPU (A100/L40)~30-40 GB VRAM4-8 hours

9. CURRENT STATUS

ComponentStatusNotes
DatasetDONE233K rows × 203 features × 97 assets
ML Final DatasetsDONEtrain/val/test splits, ~18 GB total
HMM DataDONE219 numeric features, 137 MB
RF DataDONE235 features, 212 MB
LSTM DataDONE90-day sequences, 17.6 GB
HMM TrainingPENDINGReady to run on RunPod CPU
RF TrainingPENDINGReady to run on RunPod CPU
LSTM TrainingPENDINGReady to run on RunPod GPU
EnsemblePENDINGAfter individual models complete

10. NEXT STEPS

  1. Upload 18 GB dataset to RunPod network volume
  2. Train HMM (CPU pod, ~2 hours)
  3. Train Random Forest (CPU pod, ~4 hours)
  4. Train LSTM (GPU pod A100/L40, ~6 hours)
  5. Analyze results and compare models
  6. Build ensemble voting system
  7. Integrate with Part 6 (Claude Opus 4.5 decision layer)
Part 5 Status: IN PROGRESS
Data preparation complete. Training scripts ready. Awaiting execution on RunPod infrastructure.

© 2026 Omega Arena