@@ -7,13 +7,18 @@ We first evaluated the effect of dimensions (i.e., number of bases) on model acc

*[Figure 2](figs/ukb_bases_comparison.png). shows the model accuracy in predicting task activation on 100 UKB subjects at dimension 25 (left panel), 100 (middle) and 200 (right panel) for ICA (green), PCA (orange) and Laplacian Eigenmaps (blue). Increases in bases dimension tend to decrease model accuracy possibly due to overfitting. Meanwhile, ICA dual regression maps outperforms the other two approaches in reconstructing the three contrast maps. It is also interesting to note that the model accuracy boxplots across subjects has smaller quartile range at higher dimensions (which is also true on HCP data, although not as obvious)

### 2. Effects of averaging coefficients of best matched subjects.

### 2. Effects of averaging coefficients of most-closely matched subjects.

We next matched the bases of each subject to the rest in the pool (HCP: 967 subjects; UKB: 1529 subjects) to find a subset of 100 ''other'' subjects that best resemble the chosen one. Each subject pair has a measure of ''matchness'', which is the average of spatial correlations of the matched bases. To reconstruct task activation maps for a new subject, we took weighted average of the coefficients of the 100 best-matched subjects, benchmarked against 100 randomly chosen subjects in the pool and another 100 subjects that provide the worst match of the bases.

We next matched the bases of each subject to the rest in the pool (HCP: 967 subjects; UKB: 1529 subjects) to find a subset of 100 ''other'' subjects that most closely resemble the chosen one. Each subject pair has a measure of ''matchness'', which is the average of spatial correlations of the matched bases. To reconstruct task activation maps for a new subject, we took weighted average of the coefficients of the 100 best-matched subjects, benchmarked against 100 randomly chosen subjects in the pool and another 100 subjects that provide the worst match of the bases.

* Results are shown in [Figure 3](figs/ukb_match_subjects.png)(UKB subjects) and [Figure 4](figs/hcp_match_subjects.png)(HCP subjects). As expected, reconstruction based on 100 best-matched subjects has highest model accuracy than using 100 unmatched subjects, suggesting a careful selection of subjects can further boost model accuracy in predicting how a new subject respond to tasks (green: 100 best-matched subjects; orange: 100 randomly-chosen subjects; blue: 100 least-matched subjects).

To investigate why there are less improvement in using best matched subjects on HCP data, for each subject we averaged its ``matchness'' with all other subjects to get a scalar value, which roughly measures how much the subject's bases differs from the others. We plotted the distribution of this measure for each dataset ([Figure 5](figs/ukb_matchness.png): UKB data; [Figure 6](figs/hcp_matchness.png), HCP data). The relative range of matchness on UKB dataset is much higher than HCP dataset (far right side matchness is almost seven times larger than the left side on UKB), providing a possible explanation why more improvement can be achieved on UKB dataset by selecting better-matched subjects.

###

Correlation maps between actual and predicted activation were constructed for each dataset ([Figure 7](figs/ukb_corr_mat.png): UKB data; [Figure 8](figs/HCP_corr_mat.png) for HCP data). The diagonals of the matrices show higher correlation for all tasks, indicating that the reconstructed maps more closely matched the same subject compared to other subjects.

### 3. Comparison of prediction using residual bases and original bases