rbtt (Robust bootstrapbased ttest)
Overview
rbtt is an alternative bootstrapbased ttest aiming to reduce typeI error for nonnegative, zeroinflated data
Tu & Zhou (1999) showed that comparing the means of populations whose datagenerating distributions are nonnegative with excess zero observations is a problem of great importance in the analysis of medical cost data. In the same study, Tu & Zhou discuss that it can be difficult to control typeI error rates of generalpurpose statistical tests for comparing the means of these particular data sets. This package allows users to perform a modified bootstrapbased ttest that aims to better control typeI error rates in these situations.
Usage
Let’s say we have some nonnegative data with clumping at zero:
x < rbinom(50, 1, 0.5) * rlnorm(50, 0, 1)
y < rbinom(150, 1, 0.3) * rlnorm(150, 2, 1)
Then we may compute rbttbased ttests to compare the means:
# Use ‘method = 1’ for a twosample, twosided rbtt under the equal variance assumption,
rbtt(x, y, n.boot=999, method = 1)
##
## Twosided robust bootstrapped ttest assuming equal variance
##
## data: x and y
## t = 2.3, pvalue = 0.03
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 7.2644 0.1393
## sample estimates:
## mean of x mean of y
## 0.8163 5.1450
# Use ’method = 2' for a twosample, onesided rbtt without the equal variance assumption
rbtt(x, y, n.boot=999, method = 2)
##
## Onesided robust bootstrapped ttest not assuming equal variance
##
## data: x and y
## t = 4, pvalue <2e16
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## 6.932 1.726
## sample estimates:
## mean of x mean of y
## 0.8163 5.1450
Alternatively, you can specify method = "both"
to perform both methods
simultaneously (this is also done by default).
Parallelize rbtt
# Compare speed when using singlecore versus multiplecore rbtt on 99999 bootstrap resamples
system.time(rbtt(x, y, n.boot = 99999, method = 1, n.cores = 1))
## user system elapsed
## 8.150 0.011 8.198
system.time(rbtt(x, y, n.boot = 99999, method = 1, n.cores = 3))
## user system elapsed
## 6.617 0.078 3.782
Comparison between rbtt and t.test
First, we perform some simulations.
n.sim < 999
t.test.results < numeric(n.sim)
rbtt.results < numeric(n.sim)
pval.table.list < mclapply(1:n.sim, function(i)
{
# True means are equal
x < rbinom(50, 1, 0.5) * rlnorm(50, 1.15, 1)
y < rbinom(150, 1, 0.5) * rlnorm(150, 1.15, 1)
t.test.result < t.test(x, y)$p.value
rbtt.result < rbtt(x, y, n.boot = 999, method = 1)$p.value
return(c(t.test.result, rbtt.result))
}, mc.cores = 4)
pval.table < do.call(rbind, pval.table.list)
Now, let’s evaluate the typeI error of these simulations using a significance level of 0.05.
# t.test typeI error with significance level of 0.05:
sum(pval.table[,1] < 0.05) / n.sim
## [1] 0.06006
# rbtt typeI error with significance level of 0.05:
sum(pval.table[,2] < 0.05) / n.sim
## [1] 0.05005
More accurate pvalues and typeI error estimates can be obtained by
increasing n.boot
and n.sim
, respectively
Contributors

Ian WaudbySmith (University of Waterloo)

Dr. Pengfei Li (University of Waterloo)