<- function(data, row_numbers, method = "pearson")
cor_boot
{ # Obtain the bootstrap sample:
<- data[row_numbers,]
sample
# Compute and return the statistic for the bootstrap sample:
return(cor(sample[[1]], sample[[2]], method = method))
}
The boot
package is one of the workhorses of the R statistics suite. Computing bootstrap confidence intervals is straightforward using its boot.ci
function, but computing p-values used to be a little trickier, especially as some care had to be taken to make sure that the resampling is done under the null hypothesis. I created the boot.pval
package with the aim of making it much easier to run bootstrap tests in R. In this blog post, I’ll give an example of how it can be used to run a bootstrap test using a custom test statistic.
Background
p-values can be computed by inverting the corresponding confidence intervals, as described in Section 14.2 of Thulin (2024) and Section 3.12 in Hall (1992). The boot.pval
package contains functions for computing bootstrap p-values in this way. The approach relies on the fact that:
- The p-value of the two-sided test for the parameter theta is the smallest alpha such that theta is not contained in the corresponding 1-alpha confidence interval,
- For a test of the parameter theta with significance level alpha, the set of values of theta that aren’t rejected by the two-sided test (when used as the null hypothesis) is a 1-alpha confidence interval for theta.
The package contains convenience functions for computing p-values and confidence intervals for coefficients of regression model and tests of location in a single line of code. But it also allows you to create custom bootstrap tests. The procedure is fairly straightforward if you’re familiar with boot
. For those of you who aren’t, I’ll give a complete example of how this can be done below.
Creating custom hypothesis tests
This example, based in part on code from Section 7.4.2 of Modern Statistics with R, shows how to implement a bootstrap correlation test.
First, we create a function for computing the correlation given a bivariate dataset and a vector containing row numbers (the indices of the bootstrap sample). We also include a method
option to allow the user to choose which type of correlation is used:
Let’s say that we wish to test the correlation between the variables hp
and drat
in the mtcars
data:
|> plot(hp ~ drat, data = _) mtcars
We subset the data (below I use the base R function subset
for this; see my post on base R replacements of tidyverse
functions for more on this) and then use boot
to draw 999 bootstrap samples:
library(boot)
<- mtcars |>
boot_res subset(select = c(hp, drat)) |>
boot(statistic = cor_boot,
R = 999)
If we like, we can plot the bootstrap distribution:
plot(boot_res)
And compute confidence intervals:
boot.ci(boot_res)
Warning in boot.ci(boot_res): bootstrap variances needed for studentized
intervals
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 999 bootstrap replicates
CALL :
boot.ci(boot.out = boot_res)
Intervals :
Level Normal Basic
95% (-0.7289, -0.1479 ) (-0.7599, -0.1645 )
Level Percentile BCa
95% (-0.7330, -0.1376 ) (-0.6819, -0.0458 )
Calculations and Intervals on Original Scale
Some BCa intervals may be unstable
To compute the p-value corresponding to the different intervals, we can now use boot.pval
:
# Compute the bootstrap p-value based on the percentile interval:
library(boot.pval)
boot.pval(boot_res, type = "perc")
[1] 0.008008008
For studentized intervals, we also need to estimate the variance of the statistic, e.g. using an inner bootstrap. We create a new function for this, run a new resampling, and compute the studentized p-value:
<- function(data, row_numbers, method = "pearson")
cor_boot_student
{ <- data[row_numbers,]
sample
<- cor(sample[[1]], sample[[2]], method = method)
correlation
<- boot(sample, cor_boot, 100)
inner_boot <- var(inner_boot$t)
variance
return(c(correlation, variance))
}
# Run the resampling:
<- mtcars |>
boot_res subset(select = c(hp, drat)) |>
boot(cor_boot_student, 999)
# Compute the bootstrap p-value based on the studentized interval:
boot.pval(boot_res, type = "stud")
[1] 0.06306306
That’s all there is to it! Using boot.pval
is completely analogous to how we use boot.ci
from boot
, and hopefully makes it easier to perform bootstrap tests in R.