2021JUL21.md 6.9 KB
 Ying-Qiu Zheng committed Jul 21, 2021 1 ## Fusion of high-quality and low-quality data - GMM-based implementation  Ying-Qiu Zheng committed Jul 21, 2021 2 ![diagram](/figs/2021JUL21/diagram-20210721.png)  Ying-Qiu Zheng committed Jul 21, 2021 3   Ying-Qiu Zheng committed Jul 21, 2021 4 ### Model formulation  Ying-Qiu Zheng committed Jul 21, 2021 5 6 Suppose $\mathbf{X}^{H}, \mathbf{X}^{L}$ are $N \times V$ feature matrices (e.g. connectivity between $N$ thalamus voxels and $V$ whole brain voxels). Note that they can have different dimensions in practice. To keep notations uncluttered, we suppose the number of voxels in high- and low-quality images are the same for a given subject. Now we assume $\mathbf{X}^{H}, \mathbf{X}^{L}$ share the same latent variable $Y$, which is a $N \times K$ binary matrix representing the voxels' classes.  Ying-Qiu Zheng committed Jul 21, 2021 7 For a single voxel, suppose $\mathbf{y}_{n} \sim \text{multinomial}(\mathcal{\pi})$, and $p(\mathbf{x}^{L}_{n}|y_{nk}=1) = \mathcal{N}(\mu_{k}, \Sigma_{k}^{L})$. To use high-quality data to inform the inference on low-quality data, we assume $p(\mathbf{x}^{H}_{n}|y_{nk}=1, \mathbf{U}) = \mathcal{N}(\mathbf{U}\mathbf{x}^{H}_{n}|\mu_{k}, \Sigma_{k}^{H})$ where $\mathbf{U}^{T}\mathbf{U} = \mathbf{I}$. The complete log-likelihood can be written as  Ying-Qiu Zheng committed Jul 21, 2021 8   Ying-Qiu Zheng committed Jul 21, 2021 9 math  Ying-Qiu Zheng committed Jul 21, 2021 10 \log p(\mathbf{x}^{H}_{n}, \mathbf{x}^{L}_{n}, \mathbf{y}_{n}|\mathbf{U},...\mathbf{\pi}, \mathbf{\mu}_{k},...\mathbf{\Sigma}^{H}_{k},...\mathbf{\Sigma}^{L}_{k},...) = \prod_{k=1}^{K}(\mathcal{N}(\mathbf{x}^{L}_{n}|\mu_{k},\Sigma_{k}^{L})\mathcal{N}(\mathbf{Ux}^{H}_{n}|\mu_{k},\Sigma_{k}^{H}))^{y_{nk}}  Ying-Qiu Zheng committed Jul 21, 2021 11 12   Ying-Qiu Zheng committed Jul 21, 2021 13 14 15 16 The marginal distribution of $\mathbf{x}_{n}^{L}, \mathbf{x}_{n}^{H}$ is math \log p(\mathbf{x}^{H}_{n}, \mathbf{x}^{L}_{n} | \mathbf{U},...\mathbf{\pi}, \mathbf{\mu}_{k},...\mathbf{\Sigma}^{H}_{k},...\mathbf{\Sigma}^{L}_{k},...)=\sum_{k=1}^{K}\pi_{k}\mathcal{N}(\mathbf{x}^{L}_{n}|\mu_{k},\Sigma_{k}^{L})\mathcal{N}(\mathbf{Ux}^{H}_{n}|\mu_{k},\Sigma_{k}^{H})   Ying-Qiu Zheng committed Jul 21, 2021 17   Ying-Qiu Zheng committed Jul 21, 2021 18 19 In summary, in addition to finding the the hyper-parameters $\pi, \mu, \Sigma_{k}^{H}, \Sigma^{L}_{k}$, we want to estimate a transformation matrix $\mathbf{U}$ such that $\mathbf{UX}^{H}$ is as close to $\mathbf{X}^{L}$ as possible (or vice versa).  Ying-Qiu Zheng committed Jul 25, 2021 20 ### Pseudo code - Algorithm 1. EM for the Fusion of GMMs  Ying-Qiu Zheng committed Jul 25, 2021 21  1. Run K-means clustering on the high-quality data to generate the assignment of the voxels $R^{(0)}$.  Ying-Qiu Zheng committed Jul 25, 2021 22 23  2. Initialise the means $\mu_{k}^{L}$, $\mu_{k}^{H}$, covariances $\Sigma_{k}^{L}$, $\Sigma_{k}^{H}$, and mixing coefficients $\pi_k$ using the K-means assignment $R^{(0)}$, and evaluate the initial likelihood. 3. Initialise the transformation matrix $\mathbf{U} = \mathbf{MN}^{T}$, where $\mathbf{MDN}^{T}$ is the SVD of $\sum_{k=1}^{K}\mu_{k}^{H}(\mu_{k}^{L})^{T}$.  Ying-Qiu Zheng committed Jul 25, 2021 24 25  4. For iteration = $1, 2, ...$, do - **E-step.** Evaluate the responsibilities using the current parameter values  Ying-Qiu Zheng committed Jul 25, 2021 26  - $\gamma(y_{nk}) = \frac{\pi_{k}\mathcal{N}(\mathbf{x}^{L}_{n} | \mu_{k}^{L}, \Sigma_{k}^{L})\mathcal{N}(\mathbf{Ux}^{H}_{n} | \mu_{k}^{L}, \Sigma_{k}^{H})}{\sum_{j=1}^{K}\pi_{j}\mathcal{N}(\mathbf{x}^{L}_{n} | \mu_{k}^{L}, \Sigma_{k}^{L})\mathcal{N}(\mathbf{Ux}^{H}_{n} | \mu_{k}^{L}, \Sigma_{k}^{H})}$  Ying-Qiu Zheng committed Jul 25, 2021 27  - **M-step.** Re-estimate the parameters using the current responsibilities by setting the derivatives to zero  Ying-Qiu Zheng committed Jul 25, 2021 28 29 30  - $\mu_{k}^{L} = \frac{1}{N_{k}}((\Sigma^{H}_{k})^{-1} + (\Sigma^{L}_{k})^{-1} )^{-1}\sum_{n=1}^{N}\gamma(y_{nk})((\Sigma_{k}^{H})^{-1}\mathbf{Ux}^{H}_{n} + (\Sigma_{k}^{L})^{-1}\mathbf{x}_{n}^{L} )$ - $\Sigma_{k}^{L} = \frac{1}{N_{k}}\sum_{n=1}^{N}\gamma(y_{nk})(\mathbf{x}^{L}_{n} - \mathbf{\mu}_{k}^{L})(\mathbf{x}^{L}_{n} - \mathbf{\mu}_{k}^{L})^{T}$ - $\Sigma_{k}^{H} = \frac{1}{N_{k}}\sum_{n=1}^{N}\gamma(y_{nk})(\mathbf{Ux}^{H}_{n} - \mathbf{\mu}_{k}^{L})(\mathbf{Ux}^{H}_{n} - \mathbf{\mu}_{k}^{L})^{T}$  Ying-Qiu Zheng committed Jul 25, 2021 31  - $\pi_k = \frac{N_{k}}{N}$  Ying-Qiu Zheng committed Jul 25, 2021 32  - $\mathbf{U}=\mathbf{MN}^{T}$ where $\mathbf{MDN}^{T}$ is the svd of $\sum_{k=1}^{K}\gamma(y_nk)\mu_{k}^{L}(\mathbf{x}_{n}^{H})^{T}(\sum_{n=1}^{N}\mathbf{x}_{n}^{H}(\mathbf{x}_{n}^{H})^{T})$  Ying-Qiu Zheng committed Jul 25, 2021 33  - Evaluate the likelihood and check for convergence.  Ying-Qiu Zheng committed Jul 25, 2021 34  5. Using $\mu_{k}^{L}, \Sigma_{k}^{L}, \pi_{k}$ to assignment unseen low-quality data points.  Ying-Qiu Zheng committed Jul 25, 2021 35   Ying-Qiu Zheng committed Jul 21, 2021 36 ### Simulation results  Ying-Qiu Zheng committed Jul 21, 2021 37 38 #### We considered three scenarios ##### I. Low-quality data noisier than the high-quality data  Ying-Qiu Zheng committed Jul 21, 2021 39 40 41 We simulate the case where the features of low-quality data are noiser than those of the high-quality data. The number of informative features remains the same, however. julia noise_level = 10  Ying-Qiu Zheng committed Jul 21, 2021 42 d = 3  Ying-Qiu Zheng committed Jul 21, 2021 43 # the high- and low-quality share the same cluster centroid  Ying-Qiu Zheng committed Jul 21, 2021 44 # there are two clusters, and d features  Ying-Qiu Zheng committed Jul 21, 2021 45 46 47 48 49 50 51 52 53 54 55 56 57 XHmean = hcat(randn(Float32, d), randn(Float32, d)) XLmean = copy(XHmean) n_samples = 1000 # there are two clusters class = rand(1:2, n_samples) # this is ytrain # pre-allocate XHtrain, XLtrain = [Matrix{Float32}(undef, d, n_samples) for _ ∈ 1:2] # XLtrain is noisier by a factor of noise_level for c ∈ [1, 2] XHtrain[:, findall(x -> x==c, class)] .= rand(MvNormal(XHmean[:, c], 0.05f0 * I), count(x -> x==c, class)) XLtrain[:, findall(x -> x==c, class)] .= rand(MvNormal(XLmean[:, c], 0.05f0 * noise_level * I), count(x -> x==c, class)) end   Ying-Qiu Zheng committed Jul 21, 2021 58 ##### II. Low-quality data noisier than the high-quality data with less informative features  Ying-Qiu Zheng committed Jul 21, 2021 59 60 61 In this scenario, low-quality data has less informative features julia # the high- and low-quality share the same cluster centroid  Ying-Qiu Zheng committed Jul 21, 2021 62 # there are two clusters, and d features  Ying-Qiu Zheng committed Jul 21, 2021 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 XHmean = hcat(randn(Float32, d), randn(Float32, d)) XLmean = copy(XHmean) # 50% of the original features are non-informative XLmean[rand(1:d, Int(round(d * 0.5))), :] .= 0.0f0 n_samples = 1000 # there are two clusters class = rand(1:2, n_samples) # this is ytrain # pre-allocate XHtrain, XLtrain = [Matrix{Float32}(undef, d, n_samples) for _ ∈ 1:2] # XLtrain is noisier by a factor of noise_level for c ∈ [1, 2] XHtrain[:, findall(x -> x==c, class)] .= rand(MvNormal(XHmean[:, c], 0.05f0 * I), count(x -> x==c, class)) XLtrain[:, findall(x -> x==c, class)] .= rand(MvNormal(XLmean[:, c], 0.05f0 * noise_level * I), count(x -> x==c, class)) end   Ying-Qiu Zheng committed Jul 21, 2021 78 ##### III. Low-quality data noiser than the high-quality data with 10% outliers  Ying-Qiu Zheng committed Jul 21, 2021 79 80 julia # the high- and low-quality share the same cluster centroid  Ying-Qiu Zheng committed Jul 21, 2021 81 # there are two clusters, and d features  Ying-Qiu Zheng committed Jul 21, 2021 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 XHmean = hcat(randn(Float32, d), randn(Float32, d)) XLmean = copy(XHmean) n_samples = 1000 # there are two clusters class = rand(1:2, n_samples) # this is ytrain # pre-allocate XHtrain, XLtrain = [Matrix{Float32}(undef, d, n_samples) for _ ∈ 1:2] # XLtrain is noisier by a factor of noise_level for c ∈ [1, 2] XHtrain[:, findall(x -> x==c, class)] .= rand(MvNormal(XHmean[:, c], 0.05f0 * I), count(x -> x==c, class)) XLtrain[:, findall(x -> x==c, class)] .= rand(MvNormal(XLmean[:, c], 0.05f0 * noise_level * I), count(x -> x==c, class)) end # 10% of the samples are outliers XLtrain[:, rand(1:n_samples, Int(round(n_samples / 10)))] .= randn(d, Int(round(n_samples / 10))) .* 2  #### Results  Ying-Qiu Zheng committed Jul 21, 2021 98 99 Note that "High+Low" and "Low" are initialised from the same points.  Ying-Qiu Zheng committed Jul 21, 2021 100  - d = 3  Ying-Qiu Zheng committed Jul 21, 2021 101   Ying-Qiu Zheng committed Jul 21, 2021 102 ![res1](/figs/2021JUL21/d3.svg)  Ying-Qiu Zheng committed Jul 21, 2021 103   Ying-Qiu Zheng committed Jul 21, 2021 104  - d = 10  Ying-Qiu Zheng committed Jul 21, 2021 105   Ying-Qiu Zheng committed Jul 21, 2021 106 ![res2](/figs/2021JUL21/d10.svg)  Ying-Qiu Zheng committed Jul 21, 2021 107