Commit 7f969360 by Ying-Qiu Zheng

### Update 2021JUL21.md

parent 4e4259d8
 ... ... @@ -18,8 +18,64 @@ The marginal distribution of $\mathbf{x}_{n}^{L}, \mathbf{x}_{n}^{H}$ is In summary, in addition to finding the the hyper-parameters $\pi, \mu, \Sigma_{k}^{H}, \Sigma^{L}_{k}$, we want to estimate a transformation matrix $\mathbf{U}$ such that $\mathbf{UX}^{H}$ is as close to $\mathbf{X}^{L}$ as possible (or vice versa). ### Simulation results #### Methods ##### Low-quality data noisier than the high-quality data We simulate the case where the features of low-quality data are noiser than those of the high-quality data. The number of informative features remains the same, however. julia noise_level = 10 # the high- and low-quality share the same cluster centroid XHmean = hcat(randn(Float32, d), randn(Float32, d)) XLmean = copy(XHmean) n_samples = 1000 # there are two clusters class = rand(1:2, n_samples) # this is ytrain # pre-allocate XHtrain, XLtrain = [Matrix{Float32}(undef, d, n_samples) for _ ∈ 1:2] # XLtrain is noisier by a factor of noise_level for c ∈ [1, 2] XHtrain[:, findall(x -> x==c, class)] .= rand(MvNormal(XHmean[:, c], 0.05f0 * I), count(x -> x==c, class)) XLtrain[:, findall(x -> x==c, class)] .= rand(MvNormal(XLmean[:, c], 0.05f0 * noise_level * I), count(x -> x==c, class)) end  ##### Low-quality data noisier than the high-quality data with less informative features In this scenario, low-quality data has less informative features julia noise_level = 10 # the high- and low-quality share the same cluster centroid XHmean = hcat(randn(Float32, d), randn(Float32, d)) XLmean = copy(XHmean) # 50% of the original features are non-informative XLmean[rand(1:d, Int(round(d * 0.5))), :] .= 0.0f0 n_samples = 1000 # there are two clusters class = rand(1:2, n_samples) # this is ytrain # pre-allocate XHtrain, XLtrain = [Matrix{Float32}(undef, d, n_samples) for _ ∈ 1:2] # XLtrain is noisier by a factor of noise_level for c ∈ [1, 2] XHtrain[:, findall(x -> x==c, class)] .= rand(MvNormal(XHmean[:, c], 0.05f0 * I), count(x -> x==c, class)) XLtrain[:, findall(x -> x==c, class)] .= rand(MvNormal(XLmean[:, c], 0.05f0 * noise_level * I), count(x -> x==c, class)) end  ##### Low-quality data noiser than the high-quality data with 10% outliers julia noise_level = 10 # the high- and low-quality share the same cluster centroid XHmean = hcat(randn(Float32, d), randn(Float32, d)) XLmean = copy(XHmean) n_samples = 1000 # there are two clusters class = rand(1:2, n_samples) # this is ytrain # pre-allocate XHtrain, XLtrain = [Matrix{Float32}(undef, d, n_samples) for _ ∈ 1:2] # XLtrain is noisier by a factor of noise_level for c ∈ [1, 2] XHtrain[:, findall(x -> x==c, class)] .= rand(MvNormal(XHmean[:, c], 0.05f0 * I), count(x -> x==c, class)) XLtrain[:, findall(x -> x==c, class)] .= rand(MvNormal(XLmean[:, c], 0.05f0 * noise_level * I), count(x -> x==c, class)) end # 10% of the samples are outliers XLtrain[:, rand(1:n_samples, Int(round(n_samples / 10)))] .= randn(d, Int(round(n_samples / 10))) .* 2  #### Results
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!