Last updated: 2018-05-15

workflowr checks: (Click a bullet for more information)
  • R Markdown file: up-to-date

    Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

  • Environment: empty

    Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

  • Seed: set.seed(12345)

    The command set.seed(12345) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

  • Session information: recorded

    Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

  • Repository version: 388e65e

    Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

    Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
    
    Ignored files:
        Ignored:    .DS_Store
        Ignored:    .Rhistory
        Ignored:    .Rproj.user/
        Ignored:    analysis/.DS_Store
        Ignored:    analysis/BH_robustness_cache/
        Ignored:    analysis/FDR_Null_cache/
        Ignored:    analysis/FDR_null_betahat_cache/
        Ignored:    analysis/Rmosek_cache/
        Ignored:    analysis/StepDown_cache/
        Ignored:    analysis/alternative2_cache/
        Ignored:    analysis/alternative_cache/
        Ignored:    analysis/ash_gd_cache/
        Ignored:    analysis/average_cor_gtex_2_cache/
        Ignored:    analysis/average_cor_gtex_cache/
        Ignored:    analysis/brca_cache/
        Ignored:    analysis/cash_deconv_cache/
        Ignored:    analysis/cash_fdr_1_cache/
        Ignored:    analysis/cash_fdr_2_cache/
        Ignored:    analysis/cash_fdr_3_cache/
        Ignored:    analysis/cash_fdr_4_cache/
        Ignored:    analysis/cash_fdr_5_cache/
        Ignored:    analysis/cash_fdr_6_cache/
        Ignored:    analysis/cash_plots_cache/
        Ignored:    analysis/cash_sim_1_cache/
        Ignored:    analysis/cash_sim_2_cache/
        Ignored:    analysis/cash_sim_3_cache/
        Ignored:    analysis/cash_sim_4_cache/
        Ignored:    analysis/cash_sim_5_cache/
        Ignored:    analysis/cash_sim_6_cache/
        Ignored:    analysis/cash_sim_7_cache/
        Ignored:    analysis/correlated_z_2_cache/
        Ignored:    analysis/correlated_z_3_cache/
        Ignored:    analysis/correlated_z_cache/
        Ignored:    analysis/create_null_cache/
        Ignored:    analysis/cutoff_null_cache/
        Ignored:    analysis/design_matrix_2_cache/
        Ignored:    analysis/design_matrix_cache/
        Ignored:    analysis/diagnostic_ash_cache/
        Ignored:    analysis/diagnostic_correlated_z_2_cache/
        Ignored:    analysis/diagnostic_correlated_z_3_cache/
        Ignored:    analysis/diagnostic_correlated_z_cache/
        Ignored:    analysis/diagnostic_plot_2_cache/
        Ignored:    analysis/diagnostic_plot_cache/
        Ignored:    analysis/efron_leukemia_cache/
        Ignored:    analysis/fitting_normal_cache/
        Ignored:    analysis/gaussian_derivatives_2_cache/
        Ignored:    analysis/gaussian_derivatives_3_cache/
        Ignored:    analysis/gaussian_derivatives_4_cache/
        Ignored:    analysis/gaussian_derivatives_5_cache/
        Ignored:    analysis/gaussian_derivatives_cache/
        Ignored:    analysis/gd-ash_cache/
        Ignored:    analysis/gd_delta_cache/
        Ignored:    analysis/gd_lik_2_cache/
        Ignored:    analysis/gd_lik_cache/
        Ignored:    analysis/gd_w_cache/
        Ignored:    analysis/knockoff_10_cache/
        Ignored:    analysis/knockoff_2_cache/
        Ignored:    analysis/knockoff_3_cache/
        Ignored:    analysis/knockoff_4_cache/
        Ignored:    analysis/knockoff_5_cache/
        Ignored:    analysis/knockoff_6_cache/
        Ignored:    analysis/knockoff_7_cache/
        Ignored:    analysis/knockoff_8_cache/
        Ignored:    analysis/knockoff_9_cache/
        Ignored:    analysis/knockoff_cache/
        Ignored:    analysis/knockoff_var_cache/
        Ignored:    analysis/marginal_z_alternative_cache/
        Ignored:    analysis/marginal_z_cache/
        Ignored:    analysis/mosek_reg_2_cache/
        Ignored:    analysis/mosek_reg_4_cache/
        Ignored:    analysis/mosek_reg_5_cache/
        Ignored:    analysis/mosek_reg_6_cache/
        Ignored:    analysis/mosek_reg_cache/
        Ignored:    analysis/pihat0_null_cache/
        Ignored:    analysis/plot_diagnostic_cache/
        Ignored:    analysis/poster_obayes17_cache/
        Ignored:    analysis/real_data_simulation_2_cache/
        Ignored:    analysis/real_data_simulation_3_cache/
        Ignored:    analysis/real_data_simulation_4_cache/
        Ignored:    analysis/real_data_simulation_5_cache/
        Ignored:    analysis/real_data_simulation_cache/
        Ignored:    analysis/rmosek_primal_dual_2_cache/
        Ignored:    analysis/rmosek_primal_dual_cache/
        Ignored:    analysis/seqgendiff_cache/
        Ignored:    analysis/simulated_correlated_null_2_cache/
        Ignored:    analysis/simulated_correlated_null_3_cache/
        Ignored:    analysis/simulated_correlated_null_cache/
        Ignored:    analysis/simulation_real_se_2_cache/
        Ignored:    analysis/simulation_real_se_cache/
        Ignored:    analysis/smemo_2_cache/
        Ignored:    data/LSI/
        Ignored:    docs/.DS_Store
        Ignored:    docs/figure/.DS_Store
        Ignored:    output/fig/
    
    
    Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
Expand here to see past versions:
    File Version Author Date Message
    html fe8f8f7 LSun 2018-05-12 Update to 1.0
    Rmd cc0ab83 Lei Sun 2018-05-11 update
    html 0f36d99 LSun 2017-12-21 Build site.
    html 853a484 LSun 2017-11-07 Build site.
    html 1c0be20 LSun 2017-03-06 write-ups
    Rmd be223d3 LSun 2017-03-04 t
    html be223d3 LSun 2017-03-04 t
    Rmd f7ef5aa LSun 2017-03-03 fix bugs
    Rmd aba2880 LSun 2017-03-02 t-lik

Last updated: 2018-05-15

Code version: 388e65e

Introduction

Now we are moving to handle \(t\) likelihood. Suppose we have observations \[ (\hat\beta_1, \hat s_1, \nu_1), \ldots, (\hat\beta_n, \hat s_n, \nu_n) \] \((\hat\beta_i, \hat s_i)\) being the summary statistics, and \(\nu_i\) the degree of freedom for estimating \(\hat s_i\). Now we are considering both \(\hat\beta_j\) and \(\hat s_j\) are jointly generated by a data generating mechanism as follows.

\[ \begin{array}{c} \hat\beta_j |\beta_j, s_j \sim N(\beta_j, s_j^2) \\ \hat s_j^2 / s_j^2 |s_j, \nu_j \sim \chi_{\nu_j}^2 / \nu_j\\ \hat\beta_j \perp \hat s_j | \beta_j, s_j, \nu_j \end{array} \Rightarrow \begin{array}{c} (\hat\beta_j - \beta_j) / \hat s_j | \beta_j\sim t_{\nu_j}\\ \hat\beta_j / \hat s_j | \beta_j, s_j \sim t_{\nu_j}(\beta_j / s_j) \end{array} \] where “\(\perp\)” means “conditionally independent with,” and \(t_{\nu}(\mu)\) is the noncentral \(t\) distribution with \(\nu\) degrees of freedom and noncentrality parameter \(\mu\). \(t_{\nu}(0) = t_\nu\) is the standard \(t\) distribution.

Models for \(t\) likelihood

Since now we are taking into consideration the randomness of both \(\hat\beta_j\) and \(\hat s_j\), or at least the fact that \(\hat s_j\) is not a precise measure of \(s_j\), the model could become trickier. Idealy we would hope to use the distribution

\[ (\hat\beta_j - \beta_j) / \hat s_j | \beta_j\sim t_{\nu_j} \]

but it’s not clear how to use it. We need a likelihood for the data, either \(\hat\beta_j\) itself or \((\hat\beta_j, \hat s_j)\) jointly, but this aforementioned distribution doesn’t give us such a likelihood directly as it does not specify either \(p(\hat\beta_j | \beta_j, \hat s_j)\) or \(p(\hat\beta_j, \hat s_j | \beta_j)\).

1. Current implementation in ashr

The current implementation in ashr uses a simplication

\[ \hat\beta_j | \beta_j, \hat s_j \sim \beta_j + \hat s_j t_{\nu_j} \] As Matthew noted, it’s different from \((\hat\beta_j - \beta_j) / \hat s_j | \beta_j\sim t_{\nu_j}\). For one thing, under this approximation

\[ \hat\beta_j / \hat s_j | \beta_j, \hat s_j \sim \beta_j / \hat s_j + t_{\nu_j} \]

should be unimodal and symmetric, whereas the “true” distribution

\[ \hat\beta_j / \hat s_j | \beta_j, s_j \sim t_{\nu_j}(\beta_j / s_j) \] is a noncentral \(t\) which is not symmetric. So it might have some problem going to truncash when we need to consider the probability of \(|\hat\beta_j / \hat s_j|\) smaller than some threshold (more on that below). However, it works satisfactorily well with ashr in practice, and there seems no obviously better alternatives, detailed below.

2. Pivotal likelihood

Using the fact that

\[ (\hat\beta_j - \beta_j) / \hat s_j | \beta_j\sim t_{\nu_j}\\ \] and assuming \(\beta_j\) from a mixture of uniform

\[ \beta_j \sim \sum_k\pi_k \text{Unif}[a_k, b_k] \] we can integrate out \(\beta_j\) using convolution of \(t_{\nu_j}\) and each component of the mixture uniform \([a_k, b_k]\)

\[ \begin{array}{rl} &\int p((\hat\beta_j - \beta_j) / \hat s_j | \beta_j) p(\beta_j)\text{d}\beta_j \\ =&\int p((\hat\beta_j - \beta_j) / \hat s_j | \beta_j) p(\beta_j|\beta_j\sim \sum_k\pi_k\text{Unif}[a_k, b_k])\text{d}\beta_j\\ =&\sum_k\pi_k\int p((\hat\beta_j - \beta_j) / \hat s_j | \beta_j) p(\beta_j|\beta_j\sim \text{Unif}[a_k, b_k])\text{d}\beta_j\\ =&\sum_k\pi_k \begin{cases} \frac{\hat s_j}{b_k - a_k}(F_{t_{\nu_j}}((\hat\beta_j - b_k) / \hat s_j) - F_{t_{\nu_j}}((\hat\beta_j - a_k) / \hat s_j)), & a_k < b_k \\ f_{t_{\nu_j}}((\hat\beta_j - a_k) / \hat s_j), & a_k = b_k \end{cases} \end{array} \] It’s mathematically feasible, yet I cannot quite recognize the meaning of this “expected” probability density \(\int p((\hat\beta_j - \beta_j) / \hat s_j | \beta_j) p(\beta_j)\text{d}\beta_j\) and how to use it. It turns out to be related with the pivotal likelihood idea, yet as Matthew put it, “ultimately we did not find it satisfying. It is not a well established concept and it is not clear to me that it ends up being a good idea.”

3. Joint ash: Jointly modeling \(\hat\beta\) and \(\hat s\)

Taking advantage of the conditional indepedence of \(\hat\beta\) and \(\hat s\) given \(\beta\) and \(s\), we can write a model as

\[ \begin{array}{c} p(\hat\beta_j, \hat s_j|\beta_j, s_j, \nu_j) = p(\hat\beta_j|\beta_j, s_j)p(\hat s_j|s_j, \nu_j)\\ \hat\beta_j|\beta_j, s_j \sim N(\beta_j, s_j^2)\\ \hat s_j|s_j, \nu_j \sim s_j^2\chi_{\nu_j}^2 /\nu_j\\ \beta_j \sim \sum_k\pi_kg_k^\beta\\ s_j \sim \sum_l\rho_lg_l^s \end{array} \]

This line of “joint ash” is being done by Mengyin.

4. Sequential joint ash

The “better” approach is the one that Mengyin now takes. First appy vash to shrink the \(\hat s\), and then apply ashr with its currently-implemented \(t\) likelihood (taking \(\hat s\)) as given) using moderated \(\hat s\) (and moderated df). This approach can be formally justified, although not obvious, as Matthew noted. Probably the reason is that since \(\hat\beta\) and \(\hat s\) are conditionally independent given \(\beta\) and \(s\), we could model them separately and sequentially.

5. Matthew’s recommendation

So the bottom line is that for truncash I think it suffices to implement the same \(t\) approach as ashr, since then we can use the same trick as in 4. Of course, for testing the implementation you will want to simulate from the assumed model

\[ \hat\beta_j | \beta_j, \hat s_j \sim \beta_j + \hat s_j t_{\nu_j} \]

Moving to truncash

Problem setting

As in Normal likelihood case, suppose we have \(m + n\) observations of \((\hat\beta_j, \hat s_j, \nu_j)\) in two groups such that, with a pre-specified \(t_j\) (related to \(\nu_j\)) for each observation

\[ \text{Group 1: }(\hat\beta_1, \hat s_1), \ldots, (\hat\beta_m, \hat s_m), \text{with } |\hat\beta_j/\hat s_j| \leq t_j \]

\[ \text{Group 2: }(\hat\beta_{m+1}, \hat s_{m+1}), \ldots, (\hat\beta_{m+n}, \hat s_{m+n}), \text{with } |\hat\beta_j/\hat s_j| > t_j \]

For Group 1, we’ll only use the information that for each one, \(|\hat\beta_j/\hat s_j| \leq t_j\); that is, they are moderate observations. For Group 2, we’ll use the full observation \((\hat\beta_j, \hat s_j, \nu_j)\).

The extreme group: business as usual

Now for Group 2, the extreme group where we observe the whole thing, we have usual ASH with an approximate \(t_{\nu_j}\) likelihood and uniform mixture priors

\[ \begin{array}{c} \hat\beta_j | \beta_j, \hat s_j \sim \beta_j + \hat s_j t_{\nu_j}\\ \beta_j \sim \sum_k\pi_k \text{Unif}[a_k, b_k] \end{array} \]

The moderate group: two possible ways

For Group 1, the extreme group where the relevant information is \(|\hat \beta_j / \hat s_j| \leq t_j\), we still use the same uniform mixture priors

\[ \beta_j \sim \sum_k\pi_k \text{Unif}[a_k, b_k] \] yet two possible likelihood. One comes from the current ashr implementation

\[ \hat\beta_j | \beta_j, \hat s_j \sim \beta_j + \hat s_j t_{\nu_j} \Rightarrow \hat\beta_j / \hat s_j | \beta_j, \hat s_j \sim \beta_j / \hat s_j + t_{\nu_j} \] based on a standard \(t\). The other approach comes from the fact

\[ \hat\beta_j / \hat s_j | \beta_j, s_j \sim t_{\nu_j}(\beta_j / s_j) \approx t_{\nu_j}(\beta_j / \hat s_j) \]

based on a noncentral \(t\). Both use some simplification and approximation, and presumably it shouldn’t make much difference in practice.

The rest: back to business as usual

Then we put both groups together and estimate \(\pi_k\) from the marginal probability of the data in both groups.

Code

Simulation

source("../code/truncash.t.R")
source("../code/truncash.R")
betahat = rt(100, df = 3)
sebetahat = rep(1, 100)
fit.normal.original = truncash(betahat, sebetahat, t = qnorm(0.975))
get_pi0(fit.normal)
fit.normal.t = truncash.t(betahat, sebetahat, pval.thresh = 0.05, df = rep(2, 100), method = "fdr", mixcompdist = "uniform")
ashr::ash.workhorse(betahat, sebetahat, fixg = TRUE, g = fit.normal.t)

Session Information

Session information

sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.4

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] workflowr_1.0.1   Rcpp_0.12.16      digest_0.6.15    
 [4] rprojroot_1.3-2   R.methodsS3_1.7.1 backports_1.1.2  
 [7] git2r_0.21.0      magrittr_1.5      evaluate_0.10.1  
[10] stringi_1.1.6     whisker_0.3-2     R.oo_1.21.0      
[13] R.utils_2.6.0     rmarkdown_1.9     tools_3.4.3      
[16] stringr_1.3.0     yaml_2.1.18       compiler_3.4.3   
[19] htmltools_0.3.6   knitr_1.20       

This reproducible R Markdown analysis was created with workflowr 1.0.1