How to Build a Forecasting Pipeline with TimeCopilot Using Foundation Models and Automated Anomaly Detection
AI
This tutorial demonstrates building an end-to-end forecasting pipeline using TimeCopilot, covering data preparation, model evaluation, probabilistic forecasting, anomaly detection, and optional LLM-based interpretation.
Intelligence Insights
Context + impact, normalized for TechCulture.
The Big Picture
The article provides a step-by-step guide to creating a forecasting pipeline with TimeCopilot, using a panel dataset of airline passenger data and a synthetic seasonal series with injected anomalies. It configures statistical models (AutoARIMA, AutoETS, SeasonalNaive, Theta), Prophet, and foundation models like Chronos (and optionally TimesFM on GPU). Rolling cross-validation with three windows evaluates models using MAE, RMSE, and MAPE, selecting the best performer by mean RMSE. The pipeline generates 12-month probabilistic forecasts with 80% and 95% prediction intervals, visualizes results, and detects anomalies at the 99% confidence level. Finally, an optional LLM agent (OpenAI or Anthropic) selects a model, compares it to SeasonalNaive, and provides a natural language explanation of the forecast. The tutorial emphasizes a unified workflow from data preparation to actionable insights.
Why It Matters
This tutorial demonstrates how to combine traditional statistical models with foundation models like Chronos and TimesFM in a single forecasting pipeline, making advanced time-series analysis accessible to practitioners. By integrating automated anomaly detection and an LLM agent that explains forecasts in plain language, it bridges the gap between complex machine learning and actionable business insights. This approach could democratize forecasting for non-experts, enabling faster, data-driven decisions across industries like retail, finance, and logistics.
Deepen your understanding
Use our AI to break down complex signals.
Select an AI action to generate more depth.
In this tutorial, we build an end-to-end forecasting workflow withTimeCopilot. We prepare a panel dataset containing real airline passenger data and a synthetic seasonal series with injected anomalies, then evaluate a diverse collection of statistical, foundation, and optional GPU-based forecasting models. We use rolling cross-validation and multiple error metrics to identify the strongest model, generate probabilistic forecasts with prediction intervals, visualize future trends, and detect unusual observations. Finally, we explore TimeCopilot’s optional LLM agent, which selects a forecasting model and translates its predictions into an accessible analytical response.
Installing TimeCopilot and Pinning Compatible NumPy and SciPy Versions
We install TimeCopilot, UtilsForecast, and Matplotlib to prepare the forecasting environment. We enforce compatible NumPy and SciPy versions to prevent binary conflicts. We then restart the Colab runtime so the updated libraries load correctly.
Loading AirPassengers Data and Building a Synthetic Anomaly Panel
We import the required libraries, verify the environment, and detect GPU availability. We load the AirPassengers dataset and create a second synthetic seasonal series with injected spikes. We combine the two series into a panel dataset and set the forecasting horizon and monthly frequency.
Configuring Statistical, Prophet, and Chronos Forecasting Models
from timecopilot.forecaster import TimeCopilotForecaster
from timecopilot.models.stats import AutoARIMA, AutoETS, SeasonalNaive, Theta
from timecopilot.models.prophet import Prophet
from timecopilot.models.foundation.chronos import Chronos
chronos_repo = "amazon/chronos-bolt-small" if HAS_GPU else "amazon/chronos-bolt-tiny"
models = [
SeasonalNaive(), AutoETS(), AutoARIMA(), Theta(), Prophet(),
Chronos(repo_id=chronos_repo, alias="Chronos"),
]
if HAS_GPU:
try:
from timecopilot.models.foundation.timesfm import TimesFM
models.append(TimesFM(repo_id="google/timesfm-2.0-500m-pytorch", alias="TimesFM"))
except Exception as e:
print("Skipping TimesFM:", e)
tcf = TimeCopilotForecaster(models=models)
print("\nModels:", [getattr(m, "alias", type(m).__name__) for m in models])
We configure a diverse collection of statistical, Prophet, and Chronos forecasting models. We select the Chronos model size according to the available hardware and optionally include TimesFM when a GPU is present. We then initialize TimeCopilotForecaster to manage all models through one consistent interface.
Running Rolling Cross-Validation and Ranking Models by RMSE
print("\nRunning cross-validation (slow step: foundation weights download)...")
cv_df = tcf.cross_validation(df=panel, h=H, freq=FREQ, n_windows=3)
print(cv_df.head())
from utilsforecast.evaluation import evaluate
from utilsforecast.losses import mae, rmse, mape
eval_df = evaluate(cv_df.drop(columns=["cutoff"]), metrics=[mae, rmse, mape])
print("\n=== Per-series error (lower = better) ===")
print(eval_df.round(3))
model_cols = [c for c in eval_df.columns if c not in ("unique_id", "metric")]
leaderboard = (eval_df.groupby("metric")[model_cols].mean().T.sort_values("rmse"))
print("\n=== Leaderboard (mean across series) ===")
print(leaderboard.round(3))
best_model = leaderboard.index[0]
print(f"\n>>> Best model by mean RMSE: {best_model}")
We perform rolling cross-validation across three windows to measure each model’s forecasting performance. We calculate MAE, RMSE, and MAPE for every series and aggregate the results into a leaderboard. We identify the model with the lowest mean RMSE for subsequent forecasting and visualization.
Generating Probabilistic Forecasts with Prediction Intervals
fcst_df = tcf.forecast(df=panel, h=H, freq=FREQ, level=[80, 95])
print("\nForecast columns:", list(fcst_df.columns))
def plot_series(uid, point_model=best_model):
hist = panel[panel["unique_id"] == uid]; fc = fcst_df[fcst_df["unique_id"] == uid]
plt.figure(figsize=(11, 4)); plt.plot(hist["ds"], hist["y"], color="black", label="history")
if point_model in fc.columns:
plt.plot(fc["ds"], fc[point_model], color="C0", label=f"{point_model} forecast")
lo, hi = f"{point_model}-lo-95", f"{point_model}-hi-95"
if lo in fc.columns and hi in fc.columns:
plt.fill_between(fc["ds"], fc[lo], fc[hi], alpha=0.25, color="C0", label="95% interval")
plt.title(f"{uid} — {point_model}"); plt.legend(); plt.tight_layout(); plt.show()
for uid in panel["unique_id"].unique():
plot_series(uid)
We generate 12-month probabilistic forecasts with 80% and 95% prediction intervals. We define a reusable plotting function that displays historical values, point forecasts, and uncertainty ranges. We apply this function to each series to compare its observed history with the predicted future trajectory.
from timecopilot import TimeCopilot
if os.environ.get("OPENAI_API_KEY") or os.environ.get("ANTHROPIC_API_KEY"):
llm = "openai:gpt-4o" if os.environ.get("OPENAI_API_KEY") else "anthropic:claude-sonnet-4-5"
tc = TimeCopilot(llm=llm, retries=3)
single = panel[panel["unique_id"] == "AirPassengers"]
result = tc.forecast(df=single, freq=FREQ, h=H,
query="Total air passengers expected over the next 12 months, and which months peak?")
out = result.output
print("\n=== AGENT REPORT ===")
print("Selected model:", out.selected_model)
print("Beats SeasonalNaive:", out.is_better_than_seasonal_naive)
print("Why:", out.reason_for_selection)
print("Answer:", out.user_query_response)
print(result.fcst_df.head())
else:
print("\n[Agent section skipped] No LLM key. Everything above ran key-free.")
print("\nDone. ✅")
We detect anomalies across the panel and visualize the flagged observations in the synthetic series. We optionally initialize the TimeCopilot LLM agent when an OpenAI or Anthropic API key is available. We use the agent to select a model, evaluate it against SeasonalNaive, and explain the forecast in response to a practical question.
Conclusion
In conclusion, we created a unified TimeCopilot pipeline that takes us from data preparation to model evaluation, probabilistic forecasting, visualization, anomaly detection, and agent-driven interpretation. We compared traditional statistical methods with modern foundation models within a consistent cross-validation framework and selected the best-performing approach based on objective error metrics. We also quantified forecast uncertainty through prediction intervals and identified abnormal observations across multiple time series. By combining automated forecasting with an optional LLM agent, we produced both accurate numerical predictions and clear, decision-oriented insights within a single workflow.