Comparing two samples using `stambo`

V1.1: © Aleksei Tiulpin, PhD, 2024

There are many cases, when we develop models other than classification or regression, and we want to compute scores per datapoint, and then find their mean. We may often want to compare just two samples of measurements, and stambo allows to do this easily too.

This example shows how to conduct a simple two-sample test. The example is synthetic, and we will just simply generate two Gaussian samples, and assess whether the mean of the second sample is greater than the mean of the first sample.

Importing the libraries

[1]:

import numpy as np
import stambo

SEED=2024

Data generation

[2]:

np.random.seed(SEED)
n_samples = 100
sample_1 = np.random.randn(n_samples)+0.5
sample_2 = np.random.randn(n_samples)+0.7

Sample comparison

Note that when it comes to a two-sample test, stambo does not require the statistic of choice to be a machine learning metric that is a subclass of stambo.metrics.Metric.

[3]:

res = stambo.two_sample_test(sample_1, sample_2, statistics={"Mean": lambda x: x.mean()})

Bootstrapping: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [00:00<00:00, 82516.31it/s]

LaTeX report

[4]:

print(stambo.to_latex(res, m1_name="Sample 1", m2_name="Sample 2"))

% \usepackage{booktabs} <-- do not for get to have this imported.
\begin{tabular}{ll} \\
\toprule
\textbf{Model} & \textbf{Mean} \\
\midrule
Sample 1 & $0.53$ [$0.34$-$0.73$] \\
Sample 2 & $0.78$ [$0.58$-$0.99$] \\
\midrule
$p$-value & $0.02$ \\
\bottomrule
\end{tabular}

v0.1.2

Other Versions