Comparing two regression models using stambo

Binder

V1: © Aleksei Tiulpin, PhD, 2024

This notebook shows an end-to-end example on how one can take a dataset, train two machine learning models, and conduct a statistical test to assess whether the two models are different.

Import of necessary libraries

[1]:
import numpy as np
import stambo

from sklearn.neighbors import KNeighborsRegressor
from sklearn.linear_model import LinearRegression
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_absolute_error, mean_squared_error

SEED = 2024

Loading the diabetes dataset and creating train-test split

[14]:
X, y = load_diabetes(return_X_y=True)
Xtr, Xte, ytr, yte = train_test_split(X, y, test_size=0.5, random_state=SEED)

scaler = StandardScaler()
scaler.fit(Xtr)

Xtr = scaler.transform(Xtr)
Xte = scaler.transform(Xte)

Training the models

We train a kNN and a logistic regression. Here, we can see that the logistic regression outperformes the kNN.

[16]:
model = KNeighborsRegressor(n_neighbors=3)
model.fit(Xtr, ytr)
preds_knn = model.predict(Xte)

model = LinearRegression()
model.fit(Xtr, ytr)
preds_lr = model.predict(Xte)


mae_knn, mae_lr = mean_absolute_error(yte, preds_knn), mean_absolute_error(yte, preds_lr)
print(f"kNN MAE: {mae_knn:.4f} / LR MAE: {mae_lr:.4f}")
kNN MAE: 51.2489 / LR MAE: 44.3217

Statistical testing

As stated in the documentation, the testing routine returns the dict of tuple. The keys in the dict are the metric tags, and the values are tuples that store the data in the following format:

  • p-value (\(H_0: model_1 = model_2\))

  • Empirical value (model 1)

  • CI low (model 1)

  • CI high (model 1)

  • Empirical value (model 2)

  • CI low (model 2)

  • CI high (model 2)

If you launch the code in Binder, decrease the number of bootstrap iterations (10000 by default).

Important to note: Regression metrics are errors, which means that the lower value is better (contrary to classification metrics). Therefore, we actually ask a question whether the kNN has a larger MAE than the linear regression. So, model 1 is here is actually an improved model.

[17]:
testing_result = stambo.compare_models(yte, preds_lr, preds_knn, metrics=("MAE", "MSE"), seed=SEED)
Bootstrapping: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [00:03<00:00, 2571.56it/s]

If we want to visualize the testing results, they are available in a dict in the format we have described above:

[18]:
testing_result
[18]:
{'MAE': (0.0007999200079992001,
  44.321658137655405,
  40.17458217663324,
  48.62883263591205,
  51.248868778280546,
  46.417496229260934,
  56.1538838612368),
 'MSE': (0.0008999100089991,
  3020.4335055268534,
  2508.771399983571,
  3583.767801911658,
  3978.893916540975,
  3293.007629462041,
  4723.732478632479)}

Most commonly, we though want to visualize them in a report, paper, or a presentation. For that, we can use a function to_latex, and get a cut-and-paste tabular. To use it in a LaTeX document, one needs to not forget to import booktabs

[19]:
print(stambo.to_latex(testing_result, m2_name="kNN", m1_name="LR"))
% \usepackage{booktabs} <-- do not for get to have this imported.
\begin{tabular}{lll} \\
\toprule
\textbf{Model} & \textbf{MAE} & \textbf{MSE} \\
\midrule
LR & $44.32$ [$40.17$-$48.63$] & $3020.43$ [$2508.77$-$3583.77$] \\
kNN & $51.25$ [$46.42$-$56.15$] & $3978.89$ [$3293.01$-$4723.73$] \\
\midrule
$p$-value & $0.00$ & $0.00$ \\
\bottomrule
\end{tabular}