Commit 5db9fb32 authored by Ying-Qiu Zheng's avatar Ying-Qiu Zheng
Browse files

Update 2021JUL21.md

parent 7f969360
......@@ -18,12 +18,14 @@ The marginal distribution of $`\mathbf{x}_{n}^{L}, \mathbf{x}_{n}^{H}`$ is
In summary, in addition to finding the the hyper-parameters $`\pi, \mu, \Sigma_{k}^{H}, \Sigma^{L}_{k}`$, we want to estimate a transformation matrix $`\mathbf{U}`$ such that $`\mathbf{UX}^{H}`$ is as close to $`\mathbf{X}^{L}`$ as possible (or vice versa).
### Simulation results
#### Methods
##### Low-quality data noisier than the high-quality data
#### We considered three scenarios
##### I. Low-quality data noisier than the high-quality data
We simulate the case where the features of low-quality data are noiser than those of the high-quality data. The number of informative features remains the same, however.
```julia
noise_level = 10
d = 3
# the high- and low-quality share the same cluster centroid
# there are two clusters, and d features
XHmean = hcat(randn(Float32, d), randn(Float32, d))
XLmean = copy(XHmean)
n_samples = 1000
......@@ -37,11 +39,11 @@ for c ∈ [1, 2]
XLtrain[:, findall(x -> x==c, class)] .= rand(MvNormal(XLmean[:, c], 0.05f0 * noise_level * I), count(x -> x==c, class))
end
```
##### Low-quality data noisier than the high-quality data with less informative features
##### II. Low-quality data noisier than the high-quality data with less informative features
In this scenario, low-quality data has less informative features
```julia
noise_level = 10
# the high- and low-quality share the same cluster centroid
# there are two clusters, and d features
XHmean = hcat(randn(Float32, d), randn(Float32, d))
XLmean = copy(XHmean)
# 50% of the original features are non-informative
......@@ -57,10 +59,10 @@ for c ∈ [1, 2]
XLtrain[:, findall(x -> x==c, class)] .= rand(MvNormal(XLmean[:, c], 0.05f0 * noise_level * I), count(x -> x==c, class))
end
```
##### Low-quality data noiser than the high-quality data with 10% outliers
##### III. Low-quality data noiser than the high-quality data with 10% outliers
```julia
noise_level = 10
# the high- and low-quality share the same cluster centroid
# there are two clusters, and d features
XHmean = hcat(randn(Float32, d), randn(Float32, d))
XLmean = copy(XHmean)
n_samples = 1000
......@@ -77,5 +79,9 @@ end
XLtrain[:, rand(1:n_samples, Int(round(n_samples / 10)))] .= randn(d, Int(round(n_samples / 10))) .* 2
```
#### Results
- d = 3
![res1](/figs/2021JUL21/d3.svg)
- d = 10
![res2](/figs/2021JUL21/d10.svg)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment