Commit 7f969360 authored by Ying-Qiu Zheng's avatar Ying-Qiu Zheng
Browse files

Update 2021JUL21.md

parent 4e4259d8
......@@ -18,8 +18,64 @@ The marginal distribution of $`\mathbf{x}_{n}^{L}, \mathbf{x}_{n}^{H}`$ is
In summary, in addition to finding the the hyper-parameters $`\pi, \mu, \Sigma_{k}^{H}, \Sigma^{L}_{k}`$, we want to estimate a transformation matrix $`\mathbf{U}`$ such that $`\mathbf{UX}^{H}`$ is as close to $`\mathbf{X}^{L}`$ as possible (or vice versa).
### Simulation results
#### Methods
##### Low-quality data noisier than the high-quality data
We simulate the case where the features of low-quality data are noiser than those of the high-quality data. The number of informative features remains the same, however.
```julia
noise_level = 10
# the high- and low-quality share the same cluster centroid
XHmean = hcat(randn(Float32, d), randn(Float32, d))
XLmean = copy(XHmean)
n_samples = 1000
# there are two clusters
class = rand(1:2, n_samples) # this is ytrain
# pre-allocate
XHtrain, XLtrain = [Matrix{Float32}(undef, d, n_samples) for _ 1:2]
# XLtrain is noisier by a factor of noise_level
for c [1, 2]
XHtrain[:, findall(x -> x==c, class)] .= rand(MvNormal(XHmean[:, c], 0.05f0 * I), count(x -> x==c, class))
XLtrain[:, findall(x -> x==c, class)] .= rand(MvNormal(XLmean[:, c], 0.05f0 * noise_level * I), count(x -> x==c, class))
end
```
##### Low-quality data noisier than the high-quality data with less informative features
In this scenario, low-quality data has less informative features
```julia
noise_level = 10
# the high- and low-quality share the same cluster centroid
XHmean = hcat(randn(Float32, d), randn(Float32, d))
XLmean = copy(XHmean)
# 50% of the original features are non-informative
XLmean[rand(1:d, Int(round(d * 0.5))), :] .= 0.0f0
n_samples = 1000
# there are two clusters
class = rand(1:2, n_samples) # this is ytrain
# pre-allocate
XHtrain, XLtrain = [Matrix{Float32}(undef, d, n_samples) for _ 1:2]
# XLtrain is noisier by a factor of noise_level
for c [1, 2]
XHtrain[:, findall(x -> x==c, class)] .= rand(MvNormal(XHmean[:, c], 0.05f0 * I), count(x -> x==c, class))
XLtrain[:, findall(x -> x==c, class)] .= rand(MvNormal(XLmean[:, c], 0.05f0 * noise_level * I), count(x -> x==c, class))
end
```
##### Low-quality data noiser than the high-quality data with 10% outliers
```julia
noise_level = 10
# the high- and low-quality share the same cluster centroid
XHmean = hcat(randn(Float32, d), randn(Float32, d))
XLmean = copy(XHmean)
n_samples = 1000
# there are two clusters
class = rand(1:2, n_samples) # this is ytrain
# pre-allocate
XHtrain, XLtrain = [Matrix{Float32}(undef, d, n_samples) for _ 1:2]
# XLtrain is noisier by a factor of noise_level
for c [1, 2]
XHtrain[:, findall(x -> x==c, class)] .= rand(MvNormal(XHmean[:, c], 0.05f0 * I), count(x -> x==c, class))
XLtrain[:, findall(x -> x==c, class)] .= rand(MvNormal(XLmean[:, c], 0.05f0 * noise_level * I), count(x -> x==c, class))
end
# 10% of the samples are outliers
XLtrain[:, rand(1:n_samples, Int(round(n_samples / 10)))] .= randn(d, Int(round(n_samples / 10))) .* 2
```
#### Results
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment