Commit 5d1db82b by Ying-Qiu Zheng

 ... ... @@ -2,7 +2,7 @@ ### Graphical models ![diagram1](/figs/2021JUL01/diagram-20210630-2.png) ### Panel A - (the most basic) model formulation (with classical ARD priors) The model for high quality data classification follows a regression form with ARD priors. The low-quality model is trained on the posterior distribution of the high quality coefficients $\mathbf{w}^{H}$ to give a set of low-quality coefficients (with ARD priors likewise). The model for high quality data classification follows a regression form with ARD priors. The low-quality model is trained by marginalising over the posterior distribution of the high quality coefficients $\mathbf{w}^{H}$ to give (the distribution of) a set of low-quality coefficients (with ARD priors likewise). #### On the high quality data. - Suppose $\mathbf{X}^{H}$ is the $v\times d$ feature matrix (e.g. connectivity profiles of $v$ voxels). $\mathbf{t}$ is the $v\times 1$ labels (0-1 variables). $\mathbf{w}$ is the $d\times 1$ coefficients, and $\mathbf{y}=\sigma(\mathbf{X}^{H}\mathbf{w})$ determines the probability for each class. - Here we adopt the Relevance Vector Machine (RVM) with ARD prior to find $\mathbf{w}$. Suppose $\mathbf{w}$ has a prior distribution $\mathcal{N}(\mathbf{0}, \text{diag}(\alpha_{i}^{-1}))$. We hope $\alpha_{i}$ is driven to Inf, if the associated feature is useless for prediction. ... ... @@ -87,7 +87,13 @@ And we compared three methods: When $d >> n$, Lasso appears superior to the others. ### Panel B - with structured ARD priors (in progress). Instead of using ARD priors $\mathbf{w}\sim\mathcal{N}(0, \text{diag}(\alpha_1,...\alpha_d))$, we assume the hyperparamters have a underlying structure, e.g., $\mathbf{w}\sim\mathcal{N}(0, \text{diag}(\exp(\mathbf{u}))$, where $\mathbf{u}$ is a Gaussian process $\mathbf{u}\sim\mathcal{N}(\mathbf{0}, \mathbf{C}_{\Theta})$ such that neighbouring features (i.e., adjoining/co-activating voxels) share similar sparsity. #### On the high quality data Instead of using ARD priors $\mathbf{w}\sim\mathcal{N}(0, \text{diag}(\alpha_1,...\alpha_d))$, we assume the hyperparamters have a underlying structure, e.g., $\mathbf{w}\sim\mathcal{N}(0, \text{diag}(\exp(\mathbf{u}))$, where $\mathbf{u}$ is a Gaussian Process $\mathbf{u}\sim\mathcal{N}(\mathbf{0}, \mathbf{C}_{\Theta})$ such that neighbouring features (i.e., adjoining/co-activating voxels) share similar sparsity. [ref](https://proceedings.neurips.cc/paper/2014/file/f9a40a4780f5e1306c46f1c8daecee3b-Paper.pdf) #### On the low quality data The low-quality coefficients have similar structured ARD priors (exp of a Gaussian Process) that may not share the same hyperparameters with the high-quality coefficients' priors. We seek to solve the hyperparameters for the low-quality classification model, marginalising over the posteriors of the high-quality model. ### Panel C - with structured spike-and-slab priors (in progress). Similarly, instead of using ARD priors, we assume the coefficients have spike-and-slab priors with latent variables $\gamma_i^{H}, i=1,2,...d$, where $\gamma_i\sim\text{Bernoulli}(\sigma(\theta))$. The hyperparameter $\theta$ can be a Gaussian Process. #### On the high quality data Similarly, instead of using ARD priors, we assume the coefficients have spike-and-slab priors with latent variables $\gamma_i^{H}, i=1,2,...d$, where $\gamma_i\sim\text{Bernoulli}(\sigma(\theta))$. The hyperparameter $\theta$ can be a Gaussian Process. [ref](https://ohbm.sparklespace.net/srh-2591/) #### On the low quality data The low quality coefficients have similar priors to enforce sparsity.