Skip to content

ADR-0004 — Deep learning: tested, not assumed

  • Status: Accepted
  • Date: 2026-06-03
  • Author: Maxime GOURGUECHON
  • Related: ADR-0002, reports/dl_vs_ml.md, reports/model_benchmark.md

Context

The brief asks for a deep-learning model and a justification of its use versus classical ML. On 50k tabular rows, gradient-boosted trees are the well-evidenced default; deep nets typically need more data and tuning to compete. We also know (ADR-0002) the data is signal-free, which caps achievable performance for any model at "no skill".

Decision

Implement a regularised PyTorch tabular MLP (BatchNorm + dropout + early stopping) and benchmark it head-to-head against the tuned XGBoost/LightGBM/ CatBoost models, using the identical preprocessing and splits. Let the held-out metrics, not preference, decide.

Result (held-out)

Task Tabular MLP Best booster
Regression R² ≈ −0.005 ≈ −0.001
Classification ROC-AUC ≈ 0.49 ≈ 0.51

Both sit at no-skill. The MLP's early stopping firing after ~10 epochs confirms there is no structure to fit.

Rationale for the final choice

  • Boosting is the right default for tabular data of this size: equal (null) accuracy here, but cheaper to train, more robust, and directly explainable with tree SHAP (used in the app).
  • DL is not justified for this dataset. We keep the implementation in the repo as evidence of the comparison, but the production model is a booster.
  • Reproducibility: a Windows/Anaconda OpenMP clash (libiomp5md.dll) is neutralised in-code (KMP_DUPLICATE_LIB_OK) so the DL path runs everywhere.

Consequences

  • + The DL decision is defensible with numbers, not hand-waving.
  • + Demonstrates breadth (PyTorch training loop, early stopping) without over-engineering the deployed artefact.
  • Carries a torch dev dependency; kept out of the slim runtime image.