ADR-0004 — Deep learning: tested, not assumed¶
- Status: Accepted
- Date: 2026-06-03
- Author: Maxime GOURGUECHON
- Related: ADR-0002,
reports/dl_vs_ml.md,reports/model_benchmark.md
Context¶
The brief asks for a deep-learning model and a justification of its use versus classical ML. On 50k tabular rows, gradient-boosted trees are the well-evidenced default; deep nets typically need more data and tuning to compete. We also know (ADR-0002) the data is signal-free, which caps achievable performance for any model at "no skill".
Decision¶
Implement a regularised PyTorch tabular MLP (BatchNorm + dropout + early stopping) and benchmark it head-to-head against the tuned XGBoost/LightGBM/ CatBoost models, using the identical preprocessing and splits. Let the held-out metrics, not preference, decide.
Result (held-out)¶
| Task | Tabular MLP | Best booster |
|---|---|---|
| Regression R² | ≈ −0.005 | ≈ −0.001 |
| Classification ROC-AUC | ≈ 0.49 | ≈ 0.51 |
Both sit at no-skill. The MLP's early stopping firing after ~10 epochs confirms there is no structure to fit.
Rationale for the final choice¶
- Boosting is the right default for tabular data of this size: equal (null) accuracy here, but cheaper to train, more robust, and directly explainable with tree SHAP (used in the app).
- DL is not justified for this dataset. We keep the implementation in the repo as evidence of the comparison, but the production model is a booster.
- Reproducibility: a Windows/Anaconda OpenMP clash (
libiomp5md.dll) is neutralised in-code (KMP_DUPLICATE_LIB_OK) so the DL path runs everywhere.
Consequences¶
- + The DL decision is defensible with numbers, not hand-waving.
- + Demonstrates breadth (PyTorch training loop, early stopping) without over-engineering the deployed artefact.
- − Carries a
torchdev dependency; kept out of the slim runtime image.