Comparing two samples using stambo
V1.1: © Aleksei Tiulpin, PhD, 2024
There are many cases, when we develop models other than classification or regression, and we want to compute scores per datapoint, and then find their mean. We may often want to compare just two samples of measurements, and stambo
allows to do this easily too.
This example shows how to conduct a simple two-sample test. The example is synthetic, and we will just simply generate two Gaussian samples, and assess whether the mean of the second sample is greater than the mean of the first sample.
Importing the libraries
[1]:
import numpy as np
import stambo
SEED=2024
Data generation
[2]:
np.random.seed(SEED)
n_samples = 100
sample_1 = np.random.randn(n_samples)+0.5
sample_2 = np.random.randn(n_samples)+0.7
Sample comparison
Note that when it comes to a two-sample test, stambo
does not require the statistic of choice to be a machine learning metric that is a subclass of stambo.metrics.Metric
.
[3]:
res = stambo.two_sample_test(sample_1, sample_2, statistics={"Mean": lambda x: x.mean()})
Bootstrapping: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [00:00<00:00, 82516.31it/s]
LaTeX report
[4]:
print(stambo.to_latex(res, m1_name="Sample 1", m2_name="Sample 2"))
% \usepackage{booktabs} <-- do not for get to have this imported.
\begin{tabular}{ll} \\
\toprule
\textbf{Model} & \textbf{Mean} \\
\midrule
Sample 1 & $0.53$ [$0.34$-$0.73$] \\
Sample 2 & $0.78$ [$0.58$-$0.99$] \\
\midrule
$p$-value & $0.02$ \\
\bottomrule
\end{tabular}