# LD scores and matrices

## #

OverviewWe computed in-sample dosage-based LD matrices and scores for each of six ancestry group in UKBB. LD matrices are available in Hail's BlockMatrix format on Amazon AWS (see details here). LD scores are available in LDSC-compatible flat files (`.l2.ldscore.gz`

and `.M_5_50`

) here. For large-scale analysis, you can also find a full LD score Hail Table (not restricted to the HapMap3 variants) on Amazon AWS (see details here)

For LD computation, please find technical details below. All the code is also publicly available here. Detailed instruction for how to run LD score regression is available on LDSC's website.

## #

LD matrices- The dosage-based genotype matrix $X$ was column-wise mean-centered and normalized.
- We applied the same variant QC filter used for the Pan-UKB GWAS (INFO > 0.8, MAC > 20 in each population; see details here)
- For covariate correction, the residuals from the regression of $genotype \sim covariates$ were obtained via $X_{adj} = M_cX$ where $M_c = I - C(C^TC)^{-1}C^T$, the residual-maker matrix, and $C$ is the matrix of covariates.
- We used the same covariates used for the Pan-UKB GWAS, namely $age$, $sex$, $age*sex$, $age^2$, $age^2*sex$, and the first 10 PCs of the genotype matrix (see details here).
- We then computed LD matrix $R$ via $R = \frac{X_{adj}^TX_{adj}}{n}$ with a radius of
. Each element $\hat{r}_{jk}$ of $R$ represents the Pearson correlation coefficient of genotypes between variant $j$ and $k$.__10 Mb__ - For X-chromosome, we computed a LD matrix jointly using both males and females where male genotypes are coded 0/1 and female genotypes are coded 0/1/2.

## #

LD scores- To account for an upward bias of the standard estimator of the Pearson correlation coefficient, we applied a bias adjustment for $\hat{r}^2_{jk}$ using $\tilde{r}^2_{jk} = \frac{n-1}{n-2}\hat{r}^2_{jk} - \frac{1}{n-2}$.
- LD scores for variant $j$ were subsequently computed via $l_j = \sum_k \tilde{r}^2_{jk}$ with a radius of
.__1 MB__ - For LDSC-compatible flat files, we only exported LD scores of high-quality HapMap 3 variants that are 1) in autosomes, 2) not in the MHC region, 3) biallelic SNPs, 4) with INFO > 0.9, and 5) MAF > 1% in UKB and gnomAD genome/exome (if available).
- We note that, since we applied covariate adjustment above, these LD scores are equivalent to the covariate-adjusted LD scores as described in Luo, Y. & Li, X. et al., 2020