Comparing two regression models using stambo
V1: © Aleksei Tiulpin, PhD, 2024
This notebook shows an end-to-end example on how one can take a dataset, train two machine learning models, and conduct a statistical test to assess whether the two models are different.
Import of necessary libraries
[1]:
import numpy as np
import stambo
from sklearn.neighbors import KNeighborsRegressor
from sklearn.linear_model import LinearRegression
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_absolute_error, mean_squared_error
SEED = 2024
Loading the diabetes dataset and creating train-test split
[14]:
X, y = load_diabetes(return_X_y=True)
Xtr, Xte, ytr, yte = train_test_split(X, y, test_size=0.5, random_state=SEED)
scaler = StandardScaler()
scaler.fit(Xtr)
Xtr = scaler.transform(Xtr)
Xte = scaler.transform(Xte)
Training the models
We train a kNN and a logistic regression. Here, we can see that the logistic regression outperformes the kNN.
[16]:
model = KNeighborsRegressor(n_neighbors=3)
model.fit(Xtr, ytr)
preds_knn = model.predict(Xte)
model = LinearRegression()
model.fit(Xtr, ytr)
preds_lr = model.predict(Xte)
mae_knn, mae_lr = mean_absolute_error(yte, preds_knn), mean_absolute_error(yte, preds_lr)
print(f"kNN MAE: {mae_knn:.4f} / LR MAE: {mae_lr:.4f}")
kNN MAE: 51.2489 / LR MAE: 44.3217
Statistical testing
As stated in the documentation, the testing routine returns the dict
of tuple
. The keys in the dict are the metric tags, and the values are tuples that store the data in the following format:
p-value (\(H_0: model_1 = model_2\))
Empirical value (model 1)
CI low (model 1)
CI high (model 1)
Empirical value (model 2)
CI low (model 2)
CI high (model 2)
If you launch the code in Binder, decrease the number of bootstrap iterations (10000
by default).
Important to note: Regression metrics are errors, which means that the lower value is better (contrary to classification metrics). Therefore, we actually ask a question whether the kNN has a larger MAE than the linear regression. So, model 1 is here is actually an improved model.
[17]:
testing_result = stambo.compare_models(yte, preds_lr, preds_knn, metrics=("MAE", "MSE"), seed=SEED)
Bootstrapping: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [00:03<00:00, 2571.56it/s]
If we want to visualize the testing results, they are available in a dict in the format we have described above:
[18]:
testing_result
[18]:
{'MAE': (0.0007999200079992001,
44.321658137655405,
40.17458217663324,
48.62883263591205,
51.248868778280546,
46.417496229260934,
56.1538838612368),
'MSE': (0.0008999100089991,
3020.4335055268534,
2508.771399983571,
3583.767801911658,
3978.893916540975,
3293.007629462041,
4723.732478632479)}
Most commonly, we though want to visualize them in a report, paper, or a presentation. For that, we can use a function to_latex
, and get a cut-and-paste tabular
. To use it in a LaTeX document, one needs to not forget to import booktabs
[19]:
print(stambo.to_latex(testing_result, m2_name="kNN", m1_name="LR"))
% \usepackage{booktabs} <-- do not for get to have this imported.
\begin{tabular}{lll} \\
\toprule
\textbf{Model} & \textbf{MAE} & \textbf{MSE} \\
\midrule
LR & $44.32$ [$40.17$-$48.63$] & $3020.43$ [$2508.77$-$3583.77$] \\
kNN & $51.25$ [$46.42$-$56.15$] & $3978.89$ [$3293.01$-$4723.73$] \\
\midrule
$p$-value & $0.00$ & $0.00$ \\
\bottomrule
\end{tabular}