Baseline Models

What are baseline models?

In machine learning, a baseline model is the initial model created for a particular task, for example, an untuned XGBoost model can serve as a baseline. You can then refine these models and adapt them to achieve excellent results for your specific needs.

Baseline models for GeneAlpha

Here is the models GeneAlpha has opted to use, and why.

Baseline model

Tool

Why

ARIMA (AutoRegressive Integrated Moving Average)

Python statsmodels

ARIMA is the classic statistical benchmark for univariate time series. It captures autoregression (AR), differencing (I), and moving‐average (MA) components, making it a solid baseline to compare against more complex models. It’s fast to fit and interpretable,helpful for diagnosing whether more sophisticated models are actually adding value.

Facebook Prophet

prophet

Prophet is designed for business time series with multiple seasonalities (daily/weekly/annual) and holiday effects. It automatically handles missing data and trend changepoints, which is valuable in crypto where regime shifts (e.g., bull/bear markets) happen frequently.

XGBoost/LightGBM (Gradient-Boosted Trees)

xgboost/lightgbm Python API

Although tree-based, XGBoost can excel on lag‐feature formulations of time series. It’s robust to outliers, handles non-linear relationships, and often outperforms linear models when you engineer features like rolling statistics, momentum indicators, and volume–price interactions.

LSTM (Long Short-Term Memory network)

TensorFlow/Keras or PyTorch

LSTM is the workhorse RNN for sequence data, able to learn long-range dependencies in price histories. It’s a natural step up from classical models, capturing patterns that span many time steps (e.g., multi-day trends or cycles) while smoothing out noise.

Temporal Fusion Transformer (TFT)

PyTorch + pytorch-forecasting

TFT combines attention mechanisms with gated skips and variable selection networks, letting it focus on the most relevant inputs at each time step. It handles both static and time-varying covariates, making it ideal if you include exogenous factors (on-chain metrics, sentiment scores, macro indicators).

N-BEATS (Neural Basis Expansion Analysis)

PyTorch or pytorch-forecasting

N-BEATS is a purely deep-learning approach that builds interpretable basis expansions for trend and seasonality. It has achieved state-of-the-art results on generic forecasting benchmarks and can be trained end-to-end without heavy feature engineering.

These are only to kick-start things, we will be constantly adding new models over time.

PreviousGA in Prediction NextMetric Influence Analysis

Last updated 1 month ago