### 1. Comparison of bases in reconstructing task activation maps

We first evaluated the effect of dimensions (i.e., number of bases) on model accuracy of reconstructing task activation maps (i.e., Pearson's correlation between reconstructed task activation maps and the actual activation maps). For each method (PCA, ICA, or Laplacian Eigenmaps), the individual bases were derived by running dual regression at a specific dimension each time for each subject. On HCP data, we compared the bases at dimensions 15, 25, 50, and 100; on UKB data, we compared 25, 100, and 200. The set of bases were then regressed against individual task activation maps to get reconstruction coefficients for each subject. To predict task activation maps of a new subject, we used the subject's own bases and the averaged reconstruction coefficients of 100 unseen subjects.

We first evaluated the effect of dimensions (i.e., number of bases) on model accuracy of predicting task activations (i.e., Pearson's correlation between reconstructed task activation maps and actual activation maps). For each method (PCA, ICA, or Laplacian Eigenmaps), the individual bases were derived by first running dual regression at a specific dimension each time for each subject and then regressing out the group-level spatial maps. Likewise, group-level activations were regressed out from individual activation maps, so that model accuracy is dominated by individual variability. On HCP data, we compared the bases at dimensions 15, 25, 50, and 100; on UKB data, we compared dimensions 25, 100, and 200. The sets of residual bases were then regressed against residual task activation maps to get reconstruction coefficients for each subject. To predict (residual) task activation maps of a new subject, we used the subject's own bases and the averaged reconstruction coefficients across 100 unseen subjects (randomly chosen).

*[Figure 1](figs/hcp_bases_comparison.png). shows the boxplots of model accuracy (correlation) across 100 HCP subjects at four different dimensions (from left to right panel, 15, 25, 50, 100) for each set of bases respectively (ICA: green; PCA: orange; Laplacian Eigenmap: blue). Overall the choice of bases and dimensions has minor effects on model accuracy in predicting task activation maps.

*[Figure 1](figs/hcp_bases_comparison.png). shows the boxplots of model accuracy (correlation) across 100 HCP subjects at four different dimensions (from left to right panel, 15, 25, 50, 100) for each set of bases respectively (ICA: green; PCA: orange; Laplacian Eigenmap: blue). Overall the choice of bases and dimensions has only minor effects on model accuracy in predicting task activation maps.

*[Figure 2](figs/ukb_bases_comparison.png). shows the model accuracy in predicting task activation on 100 UKB subjects at dimension 25 (left panel), 100 (middle) and 200 (right panel) for ICA (green), PCA (orange) and Laplacian Eigenmaps (blue). Increases in bases dimension tend to decrease model accuracy possibly due to overfitting. Meanwhile, ICA dual regression maps outperforms the other two approaches in reconstructing the three contrast maps. It is also interesting to note that the model accuracy boxplots across subjects has smaller quartile range at higher dimensions (which is also true on HCP data, although not as obvious)

*[Figure 2](figs/ukb_bases_comparison.png). shows model accuracy in predicting task activation across 100 UKB subjects at dimension 25 (left panel), 100 (middle) and 200 (right panel) for ICA (green), PCA (orange) and Laplacian Eigenmaps (blue). Increases in bases dimension tend to decrease model accuracy possibly due to overfitting. Meanwhile, ICA bases outperformed the other two approaches in predicting the three tasks (faces, shapes, faces-shapes). It is also interesting to note that the boxplots of model accuracy across subjects have smaller quartile range at higher dimensions (which is also true on HCP data, although not as obvious).

### 2. Effects of averaging coefficients of most-closely matched subjects.

### 2. Effects of averaging coefficients across matched subjects.

We next matched the bases of each subject to the rest in the pool (HCP: 967 subjects; UKB: 1529 subjects) to find a subset of 100 ''other'' subjects that most closely resemble the chosen one. Each subject pair has a measure of ''matchness'', which is the average of spatial correlations of the matched bases. To reconstruct task activation maps for a new subject, we took weighted average of the coefficients of the 100 best-matched subjects, benchmarked against 100 randomly chosen subjects in the pool and another 100 subjects that provide the worst match of the bases.

We next matched the bases of each subject to the rest in the pool (HCP: 967 subjects; UKB: 1529 subjects) to find a subset of 100 ''other'' subjects that most closely resemble the chosen one. Each subject pair has a measure of ''matchness'', which is the average of spatial correlations of the matched bases. To reconstruct task activation maps for a new subject, we took its own bases and the weighted average of the 100 best-matched subjects' coefficients, benchmarked against 100 randomly chosen subjects in the pool and another 100 subjects that provide the worst match.

* Results are shown in [Figure 3](figs/ukb_match_subjects.png)(UKB subjects) and [Figure 4](figs/hcp_match_subjects.png)(HCP subjects). As expected, reconstruction based on 100 best-matched subjects has highest model accuracy than using 100 unmatched subjects, suggesting a careful selection of subjects can further boost model accuracy in predicting how a new subject respond to tasks (green: 100 best-matched subjects; orange: 100 randomly-chosen subjects; blue: 100 least-matched subjects).

* Results are shown in [Figure 3](figs/ukb_match_subjects.png)(UKB subjects) and [Figure 4](figs/hcp_match_subjects.png)(HCP subjects). As expected, reconstruction based on 100 best-matched subjects has highest model accuracy than using 100 unmatched subjects, suggesting a careful selection of subjects can further facilitate model accuracy in predicting how a new subject's task activations (green: 100 best-matched subjects; orange: 100 randomly-chosen subjects; blue: 100 least-matched subjects).

To investigate why there are less improvement in using best matched subjects on HCP data, for each subject we averaged its ``matchness'' with all other subjects to get a scalar value, which roughly measures how much the subject's bases differs from the others. We plotted the distribution of this measure for each dataset ([Figure 5](figs/ukb_matchness.png): UKB data; [Figure 6](figs/hcp_matchness.png), HCP data). The relative range of matchness on UKB dataset is much higher than HCP dataset (far right side matchness is almost seven times larger than the left side on UKB), providing a possible explanation why more improvement can be achieved on UKB dataset by selecting better-matched subjects.

To investigate why there was less improvement on HCP data, for each subject we averaged its ''matchness'' with all other subjects to get a scalar value, which roughly measures how much the subject's bases differentiate from the others. We plotted the distribution of this measure for each dataset ([Figure 5](figs/ukb_matchness.png): UKB data; [Figure 6](figs/hcp_matchness.png), HCP data). The relative range of matchness on UKB dataset is much higher than HCP dataset (far right side matchness is almost seven times larger than the left side on UKB), providing a possible explanation why more improvement can be achieved on UKB dataset by selecting better-matched subjects.

### 3. Reconstructed task activation maps versus actual task activation maps

Correlation maps between actual and predicted activation were constructed for each dataset ([Figure 7](figs/ukb_corr_mat.png): UKB data; [Figure 8](figs/hcp_corr_mat.png) for HCP data). The diagonals of the matrices show higher correlation for all tasks, indicating that the reconstructed maps more closely matched the same subject compared to other subjects. (The correlation matrices are not normalised, showing the actual correlation values).

Correlation maps between actual and predicted activations were constructed for each dataset ([Figure 7](figs/ukb_corr_mat.png): UKB data; [Figure 8](figs/hcp_corr_mat.png) for HCP data). The diagonals of the matrices show higher correlation for all tasks, indicating that the reconstructed maps more closely matched the same subject compared to other subjects. (The correlation matrices are not normalised, showing the actual correlation values).

### 3. Comparison of prediction using residual bases and original bases

All the above results are based on using residual bases to predict residual task activation maps. To investigate how it improves model accuracy compared to using original bases (to predict original task activation maps), we added the effect of group-level task activation maps back to the residual maps (both residual and predicted) so that the two approaches are comparable.

All the above results are based on using residual bases to predict residual task activations. To investigate if it improves overal model accuracy compared to using original bases (to predict original task activation maps), we added the effect of group-level task activations back to the residual maps (both residual and predicted) so that the two approaches can be compared on the same scale.

* We saw that using residual bases to predict task activation further improves model accuracy ([Figure 9](figs/ukb_addback.png): UKB data; [Figure 10](figs/hcp_addback.png): HCP data). Yellow boxes: correlation between actual and predicted task activations (using residual bases) with group-level effects added back; Blue boxes: correlation between actual and predicted task activations (using original bases).

* To quantify how the diagonals of the correlation matrices differentiates from the off diagonal elements, we calculated the Kolmogorov–Smirnov test statistic as a measure of distance between the distributions of diagonal elements and off-diagonal elements (for a given sample size this statistic provides a comparable distance metric). We found using residual bases to make predictions further enhances the diagonal correlations ([Figure 11](figs/ukb_diag.png): UKB data; [Figure 12](figs/hcp_diag.png)), suggesting it has added advantage in capturing the individual variability in how subjects respond to tasks.

* We saw that using residual bases to predict task activation improves model accuracy ([Figure 9](figs/ukb_addback.png): UKB data; [Figure 10](figs/hcp_addback.png): HCP data). Yellow boxes: correlation between actual and predicted task activations (using residual bases) with group-level effects added back; Blue boxes: correlation between actual and predicted task activations (using original bases).

* To quantify how the diagonals of the correlation matrices differentiate from the off diagonal elements, we calculated the Kolmogorov–Smirnov test statistic as a measure of distance between the distributions of diagonal elements and off-diagonal elements (for a given sample size this distance metric is comparable). We found using residual bases to make predictions further enhances the diagonal correlations ([Figure 11](figs/ukb_diag.png): UKB data; [Figure 12](figs/hcp_diag.png)), suggesting it has added advantage in capturing the individual variability in how subjects respond to tasks.

### 4. Prediction of amplitude of group-level activation maps

We also investigated whether amplitudes of group activation maps can be predicted by the amplitude of bases across subjects (UKB: 1529 subjects; HCP: 967 subjects). Amplitudes of group-level activation maps are the effect sizes (betas) of the group-level contrast maps in explaning the individual task activation maps, while the amplitude of bases are the standard derivations of the individual time courses calculated in dual regression. 80% of the subjects were taken as training data, and the rest 20% were used to evaluate the prediction. The process was repeated 1000 times for both UKB ([Figure 13](figs/ukb_amplitude_prediction.png)) and HCP ([Figure 14](figs/hcp_amplitude_prediction.png)) dataset.

We also investigated whether amplitudes of group activation maps can be predicted by the amplitude of bases across subjects (UKB: 1529 subjects; HCP: 967 subjects). Amplitudes of group-level activation maps are the effect sizes (betas) of the group-level contrast maps in explaning the individual task activation maps, while the amplitude of bases are the std. of the individual time courses calculated in dual regression. 80% of the subjects were taken as training data, and the rest 20% were used to evaluate the prediction. The process was repeated 1000 times for both UKB ([Figure 13](figs/ukb_amplitude_prediction.png)) and HCP ([Figure 14](figs/hcp_amplitude_prediction.png)) dataset to show the confidence intervals.

### 5. Reconstruction coefficients across task domain

The reconstruction coefficients (i.e., betas of the residual bases in explaining residual task activations) are relatively consistent across task domain ([Figure 15](figs/ukb_betas.png): UKB data; [Figure 16](figs/hcp_betas.png)). This suggests that the reconstruction coefficient matrix is low-rank, and hopefully the prediction of a specific task can be further facilitated by using information from other task domains.