Rmosek
: RegularizationLast updated: 2018-05-15
workflowr checks: (Click a bullet for more information) ✔ R Markdown file: up-to-date
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
✔ Repository version: 388e65e
wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: analysis/.DS_Store
Ignored: analysis/BH_robustness_cache/
Ignored: analysis/FDR_Null_cache/
Ignored: analysis/FDR_null_betahat_cache/
Ignored: analysis/Rmosek_cache/
Ignored: analysis/StepDown_cache/
Ignored: analysis/alternative2_cache/
Ignored: analysis/alternative_cache/
Ignored: analysis/ash_gd_cache/
Ignored: analysis/average_cor_gtex_2_cache/
Ignored: analysis/average_cor_gtex_cache/
Ignored: analysis/brca_cache/
Ignored: analysis/cash_deconv_cache/
Ignored: analysis/cash_fdr_1_cache/
Ignored: analysis/cash_fdr_2_cache/
Ignored: analysis/cash_fdr_3_cache/
Ignored: analysis/cash_fdr_4_cache/
Ignored: analysis/cash_fdr_5_cache/
Ignored: analysis/cash_fdr_6_cache/
Ignored: analysis/cash_plots_cache/
Ignored: analysis/cash_sim_1_cache/
Ignored: analysis/cash_sim_2_cache/
Ignored: analysis/cash_sim_3_cache/
Ignored: analysis/cash_sim_4_cache/
Ignored: analysis/cash_sim_5_cache/
Ignored: analysis/cash_sim_6_cache/
Ignored: analysis/cash_sim_7_cache/
Ignored: analysis/correlated_z_2_cache/
Ignored: analysis/correlated_z_3_cache/
Ignored: analysis/correlated_z_cache/
Ignored: analysis/create_null_cache/
Ignored: analysis/cutoff_null_cache/
Ignored: analysis/design_matrix_2_cache/
Ignored: analysis/design_matrix_cache/
Ignored: analysis/diagnostic_ash_cache/
Ignored: analysis/diagnostic_correlated_z_2_cache/
Ignored: analysis/diagnostic_correlated_z_3_cache/
Ignored: analysis/diagnostic_correlated_z_cache/
Ignored: analysis/diagnostic_plot_2_cache/
Ignored: analysis/diagnostic_plot_cache/
Ignored: analysis/efron_leukemia_cache/
Ignored: analysis/fitting_normal_cache/
Ignored: analysis/gaussian_derivatives_2_cache/
Ignored: analysis/gaussian_derivatives_3_cache/
Ignored: analysis/gaussian_derivatives_4_cache/
Ignored: analysis/gaussian_derivatives_5_cache/
Ignored: analysis/gaussian_derivatives_cache/
Ignored: analysis/gd-ash_cache/
Ignored: analysis/gd_delta_cache/
Ignored: analysis/gd_lik_2_cache/
Ignored: analysis/gd_lik_cache/
Ignored: analysis/gd_w_cache/
Ignored: analysis/knockoff_10_cache/
Ignored: analysis/knockoff_2_cache/
Ignored: analysis/knockoff_3_cache/
Ignored: analysis/knockoff_4_cache/
Ignored: analysis/knockoff_5_cache/
Ignored: analysis/knockoff_6_cache/
Ignored: analysis/knockoff_7_cache/
Ignored: analysis/knockoff_8_cache/
Ignored: analysis/knockoff_9_cache/
Ignored: analysis/knockoff_cache/
Ignored: analysis/knockoff_var_cache/
Ignored: analysis/marginal_z_alternative_cache/
Ignored: analysis/marginal_z_cache/
Ignored: analysis/mosek_reg_2_cache/
Ignored: analysis/mosek_reg_4_cache/
Ignored: analysis/mosek_reg_5_cache/
Ignored: analysis/mosek_reg_6_cache/
Ignored: analysis/mosek_reg_cache/
Ignored: analysis/pihat0_null_cache/
Ignored: analysis/plot_diagnostic_cache/
Ignored: analysis/poster_obayes17_cache/
Ignored: analysis/real_data_simulation_2_cache/
Ignored: analysis/real_data_simulation_3_cache/
Ignored: analysis/real_data_simulation_4_cache/
Ignored: analysis/real_data_simulation_5_cache/
Ignored: analysis/real_data_simulation_cache/
Ignored: analysis/rmosek_primal_dual_2_cache/
Ignored: analysis/rmosek_primal_dual_cache/
Ignored: analysis/seqgendiff_cache/
Ignored: analysis/simulated_correlated_null_2_cache/
Ignored: analysis/simulated_correlated_null_3_cache/
Ignored: analysis/simulated_correlated_null_cache/
Ignored: analysis/simulation_real_se_2_cache/
Ignored: analysis/simulation_real_se_cache/
Ignored: analysis/smemo_2_cache/
Ignored: data/LSI/
Ignored: docs/.DS_Store
Ignored: docs/figure/.DS_Store
Ignored: output/fig/
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
The \(w\) optimization problem we hope to solve has two constraints. If we are not imposing these two constraints, chances are the optimization will be unstable, especially when we don’t know \(L\) for sure beforehand. However, these two constraints are hard to be converted directly and strictly to convex or even tractable forms.
\[ \begin{array}{rl} \min\limits_{w} & -\sum\limits_{j = 1}^n\log\left(\sum\limits_{l=1}^Lw_l\left(\sum\limits_{k = 1}^K\hat\pi_k f_{jkl}\right) + \sum\limits_{k = 1}^K\hat\pi_kf_{jk0}\right) + \sum\limits_{l = 1}^L\lambda_l^w \phi \left(\left|w_l\right|\right) \\ \text{subject to} & \sum\limits_{l=1}^L w_l \varphi^{(l)}\left(z\right) + \varphi\left(z\right) \geq 0, \forall z\in \mathbb{R}\\ & w_l \text{ decay reasonably fast.} \end{array} \]
One observation is that without the constraints, the optimization goes unstable as \(\hat w\) get uncontrollably large. Therefore, a natural idea to replace these two constraints would be to bound, penalize, or regularize \(w\), to prevent them from being too large.
On the other hand, as indicated the theory and examples of good fitting by Gaussian derivatives, \(w_l\) is getting smaller as \(l\) increases. Moreover, oftentimes, really small \(w_l\) could still make a difference and thus indispensable. Therefore, although we need to stop \(\hat w\) getting too large, we cerntainly don’t want to shrink them unnecessarily and unwarrantedly to zero.
Ideally, the goal of \(\sum\limits_{l = 1}^L\lambda_l^w \phi \left(\left|w_l\right|\right)\) regularizing \(w\) should be to
Remember that in theory, if we are writing the random empirical density of the correlated marginally \(N\left(0, 1\right)\) random samples as
\[ f_0\left(z\right) = \varphi\left(z\right) + \sum\limits_{l = 1}^\infty W_k\varphi^{(l)}\left(z\right) \ , \] then
\[ \text{var}\left(W_l\right) = \frac{\alpha_l}{l!} \ , \] where
\[ \alpha_l = \frac{1}{\frac{n\left(n - 1\right)}{2}}\sum_\limits{i < j}\rho_{ij}^l := \bar{\rho_{ij}^l} \ . \]
Therefore, naturally \(w_l\) should decay somehow in the order of
\[ \left|w_l\right| \leadsto \sqrt{\text{var}\left(W_l\right)} = \frac{\sqrt{\bar{\rho_{ij}^l}}}{\sqrt{l!}} \ . \]
The order of decay suggests that we should work with normalized coefficients, so that
\[ \left|w_l^n\right| = \sqrt{l!}\left|w_l\right| \leadsto \sqrt{\bar{\rho_{ij}^l}} \ . \] This piece of information on the order of magnitude of the normalized \(w_l^n\) provides a hint to determine \(\lambda_l^w\), for example,
Odd orders of Gaussian derivatives are generally associated with mean shift and skewness of \(f_0\), so they are generally very small, but if they are indeed not zero, they are important however they are small.
Meanwhile, when \(l\) is odd, \(\bar{\rho_{ij}^l}\) could be very small, and not in order of \(\bar{\rho^l}\) for some \(\rho \in \left(0, 1\right)\), so it’s very difficult to come up with a good \(\lambda_l^w\) when \(l\) is odd.
Therefore, it might be better to leave the odd orders alone and regularize the even orders only.
Generally speaking, \(l_1\) imposes sparsity by shrinking small estimates exactly to zero, which is both good and bad for our case, whereas \(l_2\) penalizes large estimates severely but doesn’t force exact sparsity, which could also be good or bad.
We’ll implement both and see what happens.
\(\lambda_l^w\) is determined to help ensure
\[ \left|w_l^n\right| = \sqrt{l!}\left|w_l\right| \leadsto \sqrt{\bar{\rho_{ij}^l}} \ , \]
Given \(\rho \in \left(0, 1\right)\), for \(l_1\) regularization, \(\lambda_l^n \sim \lambda / \sqrt{\rho^l}\); for \(l_2\) regularization, \(\lambda_l^n \sim \lambda / \rho^l\).
So far, \(L = 9\) has been able to handle all the example we’ve tried. Without constraints or regularization, \(L = 9\) is usually too large for cases where, say, \(L = 4\) is enough, and a larger than necessary \(L\) could lead to numerical instability.
Hence, we’ll experiment with \(L = 12\), assuming it’s enough for all the correlation induced distortions in practice, and assuming the regularization could prevent the optimization from going crazy.
The \(w\) optimization problem becomes
\[ \begin{array}{rl} \min\limits_{w, g} & -\sum\limits_{j = 1}^n \log\left(g_j\right) + \text{Regularization} \\ \text{subject to} & Aw + a = g \\ & g \geq 0\ , \end{array} \] where \(A_{n \times L} = \left[a_1,\ldots, a_L\right]\) and \(a\) are computed with normalized Gaussian derivatives and Hermite polynomials.
Let \(\lambda > 0\), \(\rho \in \left(0, 1\right)\), the regularization term has two forms as follows.
\[ \begin{array}{l} \sum\limits_{l = 1}^L\lambda_l^{w^s}\left|w_l^s\right| \ ,\\ \lambda_l^{w^s} = \begin{cases} 0, & l \text{ is odd;} \\ \lambda / \rho^{l/2}, & l \text{ is even.} \end{cases} \end{array} \]
\[ \begin{array}{l} \sum\limits_{l = 1}^L \lambda_l^{w^s}{w_l^s}^2 \ ,\\ \lambda_l^{w^s} = \begin{cases} 0, & l \text{ is odd;} \\ \lambda / \rho^{l}, & l \text{ is even.} \end{cases} \end{array} \]
The primal form is
\[ \begin{array}{rl} \min\limits_{w, g} & -\sum\limits_{j = 1}^n \log\left(g_j\right) + \sum\limits_{l = 1}^L\lambda_l^{w^s}\left|w_l^s\right| \\ \text{subject to} & Aw + a = g \\ & g \geq 0\ , \end{array} \] The dual form is
\[ \begin{array}{rl} \min\limits_{v} & -\sum\limits_{j = 1}^n \log\left(v_j\right) + a^Tv - n \\ \text{subject to} & -\lambda \leq A^Tv \leq \lambda \\ & v \geq 0\ . \end{array} \]
The primal form is
\[ \begin{array}{rl} \min\limits_{w, g} & -\sum\limits_{j = 1}^n \log\left(g_j\right) + \sum\limits_{l = 1}^L\lambda_l^{w^s}{w_l^s}^2 \\ \text{subject to} & Aw + a = g \\ & g \geq 0\ , \end{array} \] The dual form is
\[ \begin{array}{rl} \min\limits_{v} & -\sum\limits_{j = 1}^n \log\left(v_j\right) + a^Tv - n + \frac14 v^T \left(A \Lambda^{-1} A^T\right)v \\ \text{subject to} & a_j^Tv = 0 \ \text{ if }\lambda_j = 0 \ ,\\ & v \geq 0\ , \end{array} \] where \(\Lambda = \begin{bmatrix}\lambda_1 & & \\ & \ddots & \\ & & \lambda_L \end{bmatrix}\), and \({\Lambda^{-1}}_{jj} = 0\) when \(\lambda_j = 0\).
Let’s start with \(\rho = 0.9\) and \(\lambda = 0.1\).
This reproducible R Markdown analysis was created with workflowr 1.0.1