R Code for Nonparametric Notion of Residual and Test for Conditional Independence

In this article, we discuss the R implementation of the testing procedure for the test of conditional independence of \(X, Y\) given \(Z\). The codes provided in this page can only be used when \(X\) and \(Y\) are real valued and \(Z \in \mathbb{R}^d\) for \(d\geq 1\).

In the following, we assume \(d=2\) and generate \(n=100\) i.i.d. copies of \((X,Y,Z)\) from the simulation scenario used in Section 4 of Patra, Sen, and Székely (2015). We assume that \(Z\sim N(0, \sigma_z^2 \textbf{I}_{d\times d})\), \(X=W+Z_1+\epsilon\), and \(X=W+Z_1+\epsilon'\), where \(Z_1\) is the first coordinate of \(Z\) and \(\epsilon, \epsilon^\prime,\) and \(W\) are three independent mean zero Gaussian random variables. Moreover, we assume that \(\epsilon, \epsilon^\prime,\) and \(W\) are independent of \(Z,\) and \(var(\epsilon)=var(\epsilon^\prime)=\sigma^2_E,\) and \(var(W)= \sigma^2_W\). Note that \(X\) and \(Y\) are conditionally independent of \(Z\) only when \(\sigma_W=0\). In the following, we fixed \(\sigma_W\) to be \(0.1\).

n <-100
d <- 2
sigma.Z <- 0.3
sigma.W <- 0.1
sigma.E <- 0.2
Z <- matrix(rnorm(n*d,0,sigma.Z),nr=n, nc=d)
W <- rnorm(n,0,sigma.W)
eps <- rnorm(n,0,sigma.E)
eps.prime <- rnorm(n,0,sigma.E)
X <- W + Z[,1] +eps
Y <- W + Z[,1] +eps.prime
Data <- cbind(X,Y,Z)
colnames(Data) <- c('X', 'Y', paste('Z.', 1:d,sep=''))
head(Data)

##               X           Y        Z.1          Z.2
## [1,] -0.6209561 -0.35011790 -0.5453813 -0.008216948
## [2,] -0.5213963 -0.43784900 -0.3721414  0.085239240
## [3,]  0.1374818  0.08060723  0.2268859 -0.131199999
## [4,] -0.5639182 -0.27728940 -0.3826270 -0.123249106
## [5,] -0.8627586 -0.51166254 -0.3830763  0.281173962
## [6,] -0.1749498 -0.37170336 -0.2509368 -0.111071919

The following block of code computes the test statistic \(\hat{\mathcal{E}}_n\); see (3.7) of Patra, Sen, and Székely (2015).

source('Npres_Fucntions.R')
test.stat <- npresid.statistics(Data,d)

Recall we reject the null hypothesis of conditional independence for large values of \(\hat{\mathcal{E}}_n\). However, limiting behavior of \(\hat{\mathcal{E}}_n\) is unknown. In Section 3.2.1 of Patra, Sen, and Székely (2015) we proposed a model based bootstrap procedure to approximate the asymptotic distribution and evaluate the \(p\)-value for the test. In the following ``boot.replic" denotes the number of bootstrap replications. We recommend using a bootstrap replication of size \(1000.\)

out <- npresid.boot(Data,d,boot.replic=50)

## [1] "Starting bootstrap"
## [1] "50 bootstrap samples obtained"
## [1] "At bootstrap iteration 25 of 50"
## [1] "At bootstrap iteration 50 of 50"

str(out, max.level = 1)

## List of 7
##  $ statistic            : num 2.13
##  $ p.value              : num 0.88
##  $ method               : chr "Cond Indep test: p-values by inverting F_hat to get bootstrap samples"
##  $ bandwidth.method     : chr "least-squares cross-validation, see \"np\" package"
##  $ data.desrip          : chr "dimension of Z is 2, sample size 100, dimension of (X,Y,Z) is 4, bootstrap replication number 50"
##  $ bootstrap.stat.values: num [1:50] 2.03 7.15 4.13 6.62 2.18 ...
##  $ cond.dist.obj        :List of 6

Here ``cond.dist.obj’’ is the the list containing estimators of \(F_{X|Z}\), \(F_{Y|Z}\), and \(F_Z\) evaluated at the data points (denoted by F.x_z, F.y_z, and F.z_hat) and the bandwidth used (denoted by Fbw.x_z, Fbw.y_z, and Fbw.z_z) to evaluate the conditional distribution functions. Note that we use the functions available in the “np’’ package ( see Hayfield and Racine (2008)) to compute the optimal bandwidths as well as the estimates of the conditional distribution functions.

str(out$cond.dist.obj, max.level = 1)

## List of 6
##  $ F.x_z  : num [1:100] 0.2588 0.3201 0.3829 0.271 0.0433 ...
##  $ F.y_z  : num [1:100] 0.739 0.362 0.444 0.683 0.242 ...
##  $ Fbw.x_z:List of 64
##   ..- attr(*, "class")= chr "condbandwidth"
##  $ Fbw.y_z:List of 64
##   ..- attr(*, "class")= chr "condbandwidth"
##  $ Fbw.z_z:List of 2
##  $ F.z_hat: num [1:100, 1:2] 0.0589 0.1595 0.7888 0.1512 0.1508 ...

The estimated \(p\)-value of the test procedure is given through

p.value <- out$p.value

The required R file ``Npres_Fucntions.R’’ can be downloaded here. Note the functions require the following R-packages: boot, data.table, energy, np, and stats.

# References

Hayfield, Tristen, and Jeffrey S. Racine. 2008. “Nonparametric Econometrics: The Np Package.” Journal of Statistical Software 27 (5). http://www.jstatsoft.org/v27/i05/.

Patra, Rohit K., Bodhisattva Sen, and Gábor Székely. 2015. “A Consistent Bootstrap Procedure for the Maximum Score Estimator.” Statist. Probab. Lett. (to Appear). http://arxiv.org/abs/1409.3886.

R Code for Conditional Independence Test Proposed in Patra, Sen, and Székely (2015)