Estimation of a Two-component Mixture Model with Applications to Multiple Testing.

In this article, we discuss the compuatation of the confidence lower bound (see Section 4) and estimator (see Section 3) developed in Patra and Sen (2016). We also discuss the appropriate choice for the tuning parameter involved. The codes used here can be downloaded from here.

We first genrate an i.i.d sample of size \(n= 5000\) from \[F = \alpha_0 * \text{Beta}(1,10) + (1- \alpha_0 ) \text{Unif}(0,1),\] where \(\alpha_0 =.1.\)

source("Source/density_est.R")
source("Source/dist_calc.R")

## Iso 0.0-17

source("Source/est_mix_model.R")
source("Source/cv_mixmodel.R")

## Loading required package: lattice

## Loading required package: robustbase

data.gen<- function( n, alpha){
    ind<- rbinom(n, 1, alpha)
    data<- ind* rbeta(n, 1,5) + (1-ind)* runif(n,0,1)
    return(data)
}
data.1<- data.gen(n = 5000, alpha = .1)

For all computation in the above mixture model, we will use the R function named est.mix.model. Different values for the variable method lead to computation of different quantties.

Lower confidence bound for \(\alpha_0\)

The following code computes the \(95\%\) confidence lower bound for \(\alpha_0\)

est.lwr.bnd <- est.mix.model(data.1, method = "lwr.bnd", gridsize = 600)
print(est.lwr.bnd)

## Call:est.mix.model(data = data.1, method = "lwr.bnd", gridsize = 600)

## [1] "The  '95%' lower confidence for alp_0 is  0.0683333333333333"

plot(est.lwr.bnd)

Estimation of \(\alpha_0\)

Pre-specified choice of \(c_n\)

The following code computes an estimate of \(\alpha_0\) using the default choice of the tuning parameter, i.e., \(c_n = 0.1 \log(\log(n))\)

est.default <- est.mix.model(data.1, method = "fixed", gridsize = 600)

## Warning in est.mix.model(data.1, method = "fixed", gridsize = 600): 'c.n'
## is not given. Fixing it to be '0.1*log(log(n))

print(est.default)

## Call:est.mix.model(data = data.1, method = "fixed", gridsize = 600)
## [1] "Estimate of alp is 0.0966666666666667"
## [1] " The chosen value c_n is 0.214208684893753"

plot(est.default)

The follwoing code gives an estimator of \(\alpha_0\) when \(c_n\) is chosen to be \(0.05 \log\log(n)\)

est.fixed <- est.mix.model(data.1, method = "fixed", c.n = .05*log(log(length(data.1))), gridsize = 600)
print(est.fixed)
plot(est.fixed)

Cross validated choice of \(c_n\)

We can use the same function as above to compute the esimate of \(\alpha_0\) when \(c_n\) is chosen via cross-validation. The only difference being we have set method to be cv.

est.cv <- est.mix.model(data.1, method = "cv", gridsize = 600)

## Warning in cv.mixmodel(data, folds = folds, reps = reps, cn.s = cn.s,
## cn.length = cn.length, : Make sure that data is transformed such that F_b
## is Uniform(0,1)

## [1] "Expected time for completion"
## Time difference of 31.47532 secs

print(est.cv)

## Call:est.mix.model(data = data.1, method = "cv", gridsize = 600)
## [1] "Estimate of alp is 0.11"
## [1] " The chosen value c_n is 0.11839898219582"

plot(est.cv)

References

Patra, Rohit Kumar, and Bodhisattva Sen. 2016. “Estimation of a Two-Component Mixture Model with Applications to Multiple Testing.” J. R. Stat. Soc. Ser. B. Stat. Methodol. 78 (4): 869–93. doi:10.1111/rssb.12148.