LD scores and matrices release
We are excited to announce the release of genome-wide LD scores and matrices from the Pan-UK Biobank resource, which contains LD scores for (non-partitioned) LD score regression analysis and full in-sample LD matrices for more extensive analyses such as fine-mapping.
Computing large-scale LD matrices is always challenging: Thanks to the Hail team for continuous development and support, we utilized Hail's block-distributed matrix implementation (BlockMatrix
) to enable massively scalable linear algebra on genotype matrix. With BlockMatrix
, we computed genome-wide in-sample LD matrices (10 Mb radius) and scores for all six ancestry groups in approximately 16 hours (wall-clock) using 500 preemptible workers (n1-standard-8) on Google Cloud Dataproc (~64,000 CPU-hours).
Since our analysis included ancestry groups that are admixed and have more complex population structure, we need to account for that in downstream analyses such as LD score regression and fine-mapping. To this end, we applied covariate adjustment on genotype matrix for LD computation using the first 10 PCs as well as the other covariates used in the Pan-UKB GWAS (, , , , ). Please find more details in our technical documentation.
Another potential concern might be using imputation-based LD scores for LD score regression, which is generally not recommended by the authors (Bulik-Sullivan, BK. et al., 2015). We rationalized our approach because not every ancestry group in our analysis has a decent sequence-based LD score available; but we also compared our UKBB LD scores (imputation-based) with gnomAD LD scores (sequence-based) for those populations available in gnomAD, and observed high concordance between the two LD scores. Please find our blog post here by Rahul.
With these LD resources, we will continue to further analyze our GWAS results, including:
- LD score regression analysis to estimate heritabilities
- Fine-mapping analysis to identity causal variants of well-powered complex traits
We hope this additional release will encourage researchers to analyze diverse ancestry groups and/or develop statistical methods using population-matched in-sample LD.