2021JUL21.md 6.9 KB
Newer Older
Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
1
## Fusion of high-quality and low-quality data - GMM-based implementation
Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
2
![diagram](/figs/2021JUL21/diagram-20210721.png)
Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
3

Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
4
### Model formulation
Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
5
6
Suppose $`\mathbf{X}^{H}, \mathbf{X}^{L}`$ are $`N \times V`$ feature matrices (e.g. connectivity between $`N`$ thalamus voxels and $`V`$ whole brain voxels). Note that they can have different dimensions in practice. To keep notations uncluttered, we suppose the number of voxels in high- and low-quality images are the same for a given subject. Now we assume $`\mathbf{X}^{H}, \mathbf{X}^{L}`$ share the same latent variable $`Y`$, which is a $`N \times K`$ binary matrix representing the voxels' classes. 

Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
7
For a single voxel, suppose $`\mathbf{y}_{n} \sim \text{multinomial}(\mathcal{\pi})`$, and $`p(\mathbf{x}^{L}_{n}|y_{nk}=1) = \mathcal{N}(\mu_{k}, \Sigma_{k}^{L})`$. To use high-quality data to inform the inference on low-quality data, we assume $`p(\mathbf{x}^{H}_{n}|y_{nk}=1, \mathbf{U}) = \mathcal{N}(\mathbf{U}\mathbf{x}^{H}_{n}|\mu_{k}, \Sigma_{k}^{H})`$ where $`\mathbf{U}^{T}\mathbf{U} = \mathbf{I}`$. The complete log-likelihood can be written as
Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
8

Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
9
```math
Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
10
\log p(\mathbf{x}^{H}_{n}, \mathbf{x}^{L}_{n}, \mathbf{y}_{n}|\mathbf{U},...\mathbf{\pi}, \mathbf{\mu}_{k},...\mathbf{\Sigma}^{H}_{k},...\mathbf{\Sigma}^{L}_{k},...) = \prod_{k=1}^{K}(\mathcal{N}(\mathbf{x}^{L}_{n}|\mu_{k},\Sigma_{k}^{L})\mathcal{N}(\mathbf{Ux}^{H}_{n}|\mu_{k},\Sigma_{k}^{H}))^{y_{nk}}
Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
11
12
```

Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
13
14
15
16
The marginal distribution of $`\mathbf{x}_{n}^{L}, \mathbf{x}_{n}^{H}`$ is
```math
\log p(\mathbf{x}^{H}_{n}, \mathbf{x}^{L}_{n} | \mathbf{U},...\mathbf{\pi}, \mathbf{\mu}_{k},...\mathbf{\Sigma}^{H}_{k},...\mathbf{\Sigma}^{L}_{k},...)=\sum_{k=1}^{K}\pi_{k}\mathcal{N}(\mathbf{x}^{L}_{n}|\mu_{k},\Sigma_{k}^{L})\mathcal{N}(\mathbf{Ux}^{H}_{n}|\mu_{k},\Sigma_{k}^{H})
```
Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
17

Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
18
19
In summary, in addition to finding the the hyper-parameters $`\pi, \mu, \Sigma_{k}^{H}, \Sigma^{L}_{k}`$, we want to estimate a transformation matrix $`\mathbf{U}`$ such that $`\mathbf{UX}^{H}`$ is as close to $`\mathbf{X}^{L}`$ as possible (or vice versa).

Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
20
### Pseudo code - Algorithm 1. EM for the Fusion of GMMs
Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
21
 1. Run K-means clustering on the high-quality data to generate the assignment of the voxels $`R^{(0)}`$.
Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
22
23
 2. Initialise the means $`\mu_{k}^{L}`$, $`\mu_{k}^{H}`$, covariances $`\Sigma_{k}^{L}`$, $`\Sigma_{k}^{H}`$, and mixing coefficients $`\pi_k`$ using the K-means assignment $`R^{(0)}`$, and evaluate the initial likelihood.
 3. Initialise the transformation matrix $`\mathbf{U} = \mathbf{MN}^{T}`$, where $`\mathbf{MDN}^{T}`$ is the SVD of $`\sum_{k=1}^{K}\mu_{k}^{H}(\mu_{k}^{L})^{T}`$.
Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
24
25
 4. For iteration = $`1, 2, ...`$, do
    - **E-step.** Evaluate the responsibilities using the current parameter values
Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
26
       - $`\gamma(y_{nk}) = \frac{\pi_{k}\mathcal{N}(\mathbf{x}^{L}_{n} | \mu_{k}^{L}, \Sigma_{k}^{L})\mathcal{N}(\mathbf{Ux}^{H}_{n} | \mu_{k}^{L}, \Sigma_{k}^{H})}{\sum_{j=1}^{K}\pi_{j}\mathcal{N}(\mathbf{x}^{L}_{n} | \mu_{k}^{L}, \Sigma_{k}^{L})\mathcal{N}(\mathbf{Ux}^{H}_{n} | \mu_{k}^{L}, \Sigma_{k}^{H})}`$
Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
27
    - **M-step.** Re-estimate the parameters using the current responsibilities by setting the derivatives to zero
Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
28
29
30
       - $`\mu_{k}^{L} = \frac{1}{N_{k}}((\Sigma^{H}_{k})^{-1} + (\Sigma^{L}_{k})^{-1} )^{-1}\sum_{n=1}^{N}\gamma(y_{nk})((\Sigma_{k}^{H})^{-1}\mathbf{Ux}^{H}_{n} + (\Sigma_{k}^{L})^{-1}\mathbf{x}_{n}^{L} )`$
       - $`\Sigma_{k}^{L} = \frac{1}{N_{k}}\sum_{n=1}^{N}\gamma(y_{nk})(\mathbf{x}^{L}_{n} - \mathbf{\mu}_{k}^{L})(\mathbf{x}^{L}_{n} - \mathbf{\mu}_{k}^{L})^{T}`$
       - $`\Sigma_{k}^{H} = \frac{1}{N_{k}}\sum_{n=1}^{N}\gamma(y_{nk})(\mathbf{Ux}^{H}_{n} - \mathbf{\mu}_{k}^{L})(\mathbf{Ux}^{H}_{n} - \mathbf{\mu}_{k}^{L})^{T}`$
Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
31
       - $`\pi_k = \frac{N_{k}}{N}`$
Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
32
       - $`\mathbf{U}=\mathbf{MN}^{T}`$ where $`\mathbf{MDN}^{T}`$ is the svd of $`\sum_{k=1}^{K}\gamma(y_nk)\mu_{k}^{L}(\mathbf{x}_{n}^{H})^{T}(\sum_{n=1}^{N}\mathbf{x}_{n}^{H}(\mathbf{x}_{n}^{H})^{T})`$ 
Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
33
    - Evaluate the likelihood and check for convergence.
Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
34
 5. Using $`\mu_{k}^{L}, \Sigma_{k}^{L}, \pi_{k}`$ to assignment unseen low-quality data points.
Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
35

Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
36
### Simulation results
Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
37
38
#### We considered three scenarios
##### I. Low-quality data noisier than the high-quality data
Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
39
40
41
We simulate the case where the features of low-quality data are noiser than those of the high-quality data. The number of informative features remains the same, however.
```julia
noise_level = 10
Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
42
d = 3
Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
43
# the high- and low-quality share the same cluster centroid
Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
44
# there are two clusters, and d features
Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
45
46
47
48
49
50
51
52
53
54
55
56
57
XHmean = hcat(randn(Float32, d), randn(Float32, d))
XLmean = copy(XHmean)
n_samples = 1000
# there are two clusters
class = rand(1:2, n_samples) # this is ytrain
# pre-allocate
XHtrain, XLtrain = [Matrix{Float32}(undef, d, n_samples) for _  1:2]
# XLtrain is noisier by a factor of noise_level
for c  [1, 2]
    XHtrain[:, findall(x -> x==c, class)] .= rand(MvNormal(XHmean[:, c], 0.05f0 * I), count(x -> x==c, class))
    XLtrain[:, findall(x -> x==c, class)] .= rand(MvNormal(XLmean[:, c], 0.05f0 * noise_level * I), count(x -> x==c, class))
end
```
Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
58
##### II. Low-quality data noisier than the high-quality data with less informative features
Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
59
60
61
In this scenario, low-quality data has less informative features
```julia
# the high- and low-quality share the same cluster centroid
Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
62
# there are two clusters, and d features
Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
XHmean = hcat(randn(Float32, d), randn(Float32, d))
XLmean = copy(XHmean)
# 50% of the original features are non-informative
XLmean[rand(1:d, Int(round(d * 0.5))), :] .= 0.0f0
n_samples = 1000
# there are two clusters
class = rand(1:2, n_samples) # this is ytrain
# pre-allocate
XHtrain, XLtrain = [Matrix{Float32}(undef, d, n_samples) for _  1:2]
# XLtrain is noisier by a factor of noise_level
for c  [1, 2]
    XHtrain[:, findall(x -> x==c, class)] .= rand(MvNormal(XHmean[:, c], 0.05f0 * I), count(x -> x==c, class))
    XLtrain[:, findall(x -> x==c, class)] .= rand(MvNormal(XLmean[:, c], 0.05f0 * noise_level * I), count(x -> x==c, class))
end
```
Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
78
##### III. Low-quality data noiser than the high-quality data with 10% outliers
Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
79
80
```julia
# the high- and low-quality share the same cluster centroid
Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
81
# there are two clusters, and d features
Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
XHmean = hcat(randn(Float32, d), randn(Float32, d))
XLmean = copy(XHmean)
n_samples = 1000
# there are two clusters
class = rand(1:2, n_samples) # this is ytrain
# pre-allocate
XHtrain, XLtrain = [Matrix{Float32}(undef, d, n_samples) for _  1:2]
# XLtrain is noisier by a factor of noise_level
for c  [1, 2]
    XHtrain[:, findall(x -> x==c, class)] .= rand(MvNormal(XHmean[:, c], 0.05f0 * I), count(x -> x==c, class))
    XLtrain[:, findall(x -> x==c, class)] .= rand(MvNormal(XLmean[:, c], 0.05f0 * noise_level * I), count(x -> x==c, class))
end
# 10% of the samples are outliers
XLtrain[:, rand(1:n_samples, Int(round(n_samples / 10)))] .= randn(d, Int(round(n_samples / 10))) .* 2
```
#### Results
Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
98
99
Note that "High+Low" and "Low" are initialised from the same points.

Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
100
 - d = 3
Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
101

Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
102
![res1](/figs/2021JUL21/d3.svg)
Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
103

Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
104
 - d = 10
Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
105

Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
106
![res2](/figs/2021JUL21/d10.svg)
Ying-Qiu Zheng's avatar
Ying-Qiu Zheng committed
107