You are prohibited from using or uploading content you accessed through this website into external applications, bots, software, or websites, including those using artificial intelligence technologies and infrastructure, including deep learning, machine learning and large language models and generative AI.
You have accessJournal of UrologyJU Forum1 Mar 2024

Comparison of Genomic Inflation Estimates in Genome-Wide Association Studies Using Genetically Identified Ancestry vs Self-Identified Race/Ethnicity in Prostate Cancer Patients in ELLIPSE Cohort

    View All Author Information

    Prostate cancer is one of the most heritable cancers.1 Genome-wide association studies (GWAS) are observational studies of genetic variants among patients with a defined trait or disease, such as prostate cancer. Single nucleotide polymorphisms (SNPs) implicated by prostate cancer GWAS are used to build polygenic risk scores (PRS) which have proven useful for identifying individuals at higher risk of aggressive prostate cancer,2-4 and can complement PSA measurements to reduce overdiagnosis.5 The rates of incidence and mortality for prostate cancer can differ across ancestry groups and are highest in African American men. While socioeconomic factors are a significant driver of health disparities, there is also a recognized need to better understand the role of genetics in prostate cancer and how this could be modified by ancestral background. Most PRS are based on GWAS in European ancestry groups; however, constructing prostate cancer PRS applicable to diverse ancestral groups requires tailored genetic studies which incorporate ancestry information.2,3,6

    Ancestry can be determined from SNP information; however, clinically, self-identified ancestry is often used. Self-identified ancestry can capture cultural and social distinctions that influence prostate cancer risk, while genetic ancestry can shed light on more ancestry-related risk factors. These categories are further complicated by admixture. Understanding how both measures of ancestry relate and influence downstream genetic analysis is important. We show through genetic ancestry inference, self-identified race/ethnicity aligned relatively well with genetic ancestry; however, use of genetic ancestry groups better controlled for genomic inflation or false associations compared to race/ethnicity. These findings are valuable as genetic risk scores applicable for diverse patient populations are developed.


    The ELLIPSE (Elucidating Loci Involved in Prostate Cancer Susceptibility) Consortium is an international collaboration that pools genetic data to discover risk loci for patients with prostate cancer.7 The ELLIPSE database contains information about self-identified race/ethnicity (European, African American, Hispanic, Asian) as well as a patient’s genomic information.

    SNPs were imputed with the Michigan Imputation Server8 using the 1000 Genomes Project as the reference population. Genetic ancestry analysis was conducted using a merged identity-by-state matrix (ELLIPSE and International HapMap Project) and principal component analysis (PCA). A k-means clustering model trained on HapMap data phase 3 was used to determine that the ELLIPSE cohort best fit with 11 ancestral groups from the HapMap data. These include patients who identified as having African ancestry in the Southwestern United States (ASW), Utah; United States residents with Northern and Western European Ancestry (CEU); Han Chinese in Beijing, China (CHB); Chinese in Metropolitan Denver, Colorado (CHD); Gujarati Indians in Houston, Texas (GIH); Japanese in Tokyo, Japan (JPT); Luhya in Webuye, Kenya (LWK); Mexican Ancestry in Los Angeles, California (MEX); Tuscany, Italy (TSI); Maasai in Kinyaawa, Kenya; and Yoruba in Ibadan, Nigeria (YRI). Ancestral groups were further grouped into European, African, Mexican, Asian, and South Asian ancestry based on the following groupings: CHB, JPT, CHD: Asian; LWK, Maasai in Kinyaawa, Kenya, ASW, YRI: African; CEU, TSI: European; MEX: Mexican; GIH: South Asian.

    Self-identified and genetically identified ancestral groups were compared through PCA. Genomic inflation estimates were calculated with analysis in self-identified and HapMap-identified groups using either age or age with top 10 principal components as covariates. Significant associations used GWAS significance threshold of 5 × 10−8.


    The ELLIPSE prostate cancer cohort consisted of 91,644 patients.7 Each individual reported self-identified race/ethnicity, resulting in 82,247 European (89.7%), 6516 African (7.1%), 1761 Latinx/Hispanic (1.9%), and 1120 Asian (1.2%) men. Each self-identified race/ethnicity was broken down into 9 genetic ancestry groups, as defined by cluster analysis with the HapMap Project phase 3 cohort. In our PCA, Supplemental Figure 1A ( demonstrates a scatterplot of principal components 1 and 3 (PC1 and PC3), the components that explain the most variance, for the prostate cancer analysis of genotypes for ELLIPSE Consortium labeled with self-identified ancestry. Supplemental Figure 1B ( shows the same scatterplot of PC1 and PC3, now labeled with k-means cluster estimation (k = 11) of HapMap group for the same set of ELLIPSE Consortium individuals.

    Most self-identifying African ancestry patients were identified through genetic ancestry analysis as ASW (n = 3716, 77.72%), YRI (n = 1735, 36.29%), LWK (n = 872, 18.24%), or TSI (n = 116, 2.43%). Fisher’s exact testing demonstrated higher odds of being self-identified African if genetic ancestry group was ASW, YRI, or LWK (P < 10−16) compared to TSI (odds ratio = 0.25). Among the patients who self-identified as Asian, genetic ancestry groups were CHB, CHD, JPT (n = 516, 46.06%), MEX (n = 297, 26.52%), and GIH (n = 269, 24.02%). Among patients who self-identified as European, the largest subgroups were CEU (n = 76,586, 93.12%) and TSI (n = 5600, 6.81%). Lastly among the patients who identified as Latinx/Hispanic, the largest subgroups were from MEX (n = 1558, 88.47%) and TSI (n = 139, 7.89%; Supplemental Table 1, These results suggest that overall self-identified ancestry aligns well with genetic ancestral groups.

    We then ran association with prostate cancer status controlling for age or age plus principal components 1 to 10 within HapMap-identified or self-identified ancestral groups (Figure). We compared lambda values, which quantify genomic inflation in these analyses for each chromosome. Using only age as a covariate, genomic inflation was significantly higher in self-identified African (P < 1−4), Latinx/Hispanic (P < 7−9), and Asian (P < 6−10) analyses compared to HapMap-identified African, Mexican, and East Asian groups, respectively. When controlling for the top 10 principal components, genomic inflation remained significantly higher for self-identified Mexican and Asian analyses compared to HapMap-identified Mexican and Asian groups. These results suggest analyses using HapMap-identified groups better controlled for effects of population stratification and reduced genomic inflation compared to self-identified race/ethnicity groups.

    Figure.Genomic inflation factors in European, African, Mexican, East Asian, and South Asian men in ELLIPSE (Elucidating Loci Involved in Prostate Cancer Susceptibility) cohort based on genetic or self-identified ancestry.

    Figure. Genomic inflation factors in European, African, Mexican, East Asian, and South Asian men in ELLIPSE (Elucidating Loci Involved in Prostate Cancer Susceptibility) cohort based on genetic or self-identified ancestry.

    Running GWAS analyses within HapMap-identified groups, 19,970, 181, and 1 significant associations for European, African, and Mexican groups were identified, respectively. For self-identified race/ethnicity, 19,492, 190, and 5 significant associations for European, African, and Asian groups were identified, respectively. Associations identified for Asian race/ethnicity groups did not pass the suggestive GWAS threshold in the HapMap-identified Asian ancestry group (P < 1−5). Associations identified for the African race/ethnicity group all passed the suggestive threshold in the African HapMap-identified group. Generally, significant associations identified using both self-identified and HapMap-identified were consistent, although increased variability in associations were seen in groups with fewer individuals.


    In this analysis of the ELLIPSE cohort consisting of men who were diagnosed with prostate cancer, we assessed concordance between self-identified race/ethnicity and HapMap-identified ancestry groups and effects on downstream analyses. We found the self-identified race/ethnicity and HapMap-identified ancestry were generally concordant; however, analyses within HapMap-identified ancestry groups had significantly less genomic inflation, which can confound association analyses, compared to self-identified race/ethnicity.

    The findings of this study draw attention to the ongoing debate in the literature about the use of self-identified race/ethnicity in medical research. Race and ethnicity are usually assigned, whereas genetic ancestry is based on an individual’s genetic makeup. Notably, race and ethnicity can be shaped by a variety of factors, including cultural, socioeconomic, and geographic factors, while ancestry refers to someone’s lineage by studying genetic diversity. It is important to note that these categorizations are distinct, but both are valuable to study.

    GWAS aim to study the genetic basis of complex diseases, such as prostate cancer. Although a proportion of prostate cancer risk may be mediated by high-effect variants, most risk variants may be modifiers and can likely be influenced by both ancestry and race/ethnicity. The genomic inflation factor measures overdispersion or deviation of the distribution of the observed test statistic compared to the expected distribution of the test statistics. Often, high genomic inflation is due to population stratification leading to spurious associations. Lower genomic inflation in HapMap-identified groups likely reflects better control of population structure. Thus, GWAS associations with HapMap-identified groups more likely are associated with prostate cancer. Although significant associations with prostate cancer for European and African ancestry groups did not vary greatly, significant associations identified in the self-identified Asian group did not meet significance threshold in the HapMap-identified group. Combined with higher genomic inflation factors found in the self-identified Asian group, these results suggest that these are likely false-positive associations.

    Understanding the differences in self-identified and genetic ancestry is important to clinical research and practice as more PRS are developed and used to guide clinical decision-making.2,4 Primarily, understanding the ancestral background of GWAS used for PRS construction is critical. Most PRS are based on GWAS of European populations and may have different efficacy in diverse ancestry groups. Secondly, no significant differences in genomic inflation factors in European populations based on self-identified vs genetic ancestry were seen. However, differences in genomic inflation factors when using self-identified vs genetic ancestry in African, Hispanic, and East Asian ancestry groups suggest the presence of confounding factors with self-identified race/ethnicity. These confounding factors may be cultural, socioeconomic, or occupational. Understanding these factors is critical as they may influence genetic association studies. Developing a score which incorporates genetic risk determined through GWAS and other social determinants of cancer risk may be necessary to better guide clinical decision-making. The importance of these factors is seen in studies of low-risk prostate cancer patients where there were no differences in prostate cancer–specific mortality when comparing Black patients to non-Hispanic White patients.9 Lastly, self-identified race/ethnicity is often used clinically; however, genetic ancestry–specific associations in African American men have been identified and can boost PRS performance.10 Thus, a holistic approach to assess cancer risk using both genetic and self-identified ancestry can improve future clinical practice.


    Prostate cancer patients stratified by genetically identified ancestry compared to self-identified race/ethnicity had significantly less genomic inflation. Generally, results from using genetically identified and self-identified race/ethnicity were concordant in the ELLIPSE Consortium. However, our findings suggest that self-identified race/ethnicity, which is commonly used for prostate cancer research, may introduce confounding factors within genetic studies, especially for groups with fewer individuals. Our findings suggest that when self-identified race/ethnicity is used in prostate cancer research, conclusions of these studies likely do not accurately reflect the individual’s genomic risk, but rather reflect disparities in care.


    • 1. . Familial risk and heritability of cancer among twins in Nordic countries. JAMA. 2016; 315(1):68-76. Crossref, MedlineGoogle Scholar
    • 2. . Trans-ancestry genome-wide association meta-analysis of prostate cancer identifies new susceptibility loci and informs genetic risk prediction. Nat Genet. 2021; 53(1):65-75. Crossref, MedlineGoogle Scholar
    • 3. . Polygenic risk of any, metastatic, and fatal prostate cancer in the Million Veteran Program. J Natl Cancer Inst. 2023; 115(2):190-199. Crossref, MedlineGoogle Scholar
    • 4. . Prostate cancer risk stratification improvement across multiple ancestries with new polygenic hazard score. Prostate Cancer Prostatic Dis. 2022; 25(4):755-761. Crossref, MedlineGoogle Scholar
    • 5. . Implications of polygenic risk-stratified screening for prostate cancer on overdiagnosis. Genet Med. 2015; 17(10):789-795. Crossref, MedlineGoogle Scholar
    • 6. . Clinical use of current polygenic risk scores may exacerbate health disparities. Nat Genet. 2019; 51(4):584-591. Crossref, MedlineGoogle Scholar
    • 7. . A meta-analysis of 87,040 individuals identifies 23 new susceptibility loci for prostate cancer. Nat Genet. 2014; 46(10):1103-1109. Crossref, MedlineGoogle Scholar
    • 8. . Next-generation genotype imputation service and methods. Nat Genet. 2016; 48(10):1284-1287. Crossref, MedlineGoogle Scholar
    • 9. . Association between African American race and clinical outcomes in men treated for low-risk prostate cancer with active surveillance. JAMA. 2020; 324(17):1747-1754. Crossref, MedlineGoogle Scholar
    • 10. . PRState: incorporating genetic ancestry in prostate cancer risk scores for men of African ancestry. BMC Cancer. 2022; 22(1):1289. Crossref, MedlineGoogle Scholar

    Recusal: Dr Javier-DesLoges is an AUA publications online content editor and an early career editor for The Journal of Urology® and was recused from the editorial and peer review processes.

    Support: This work was supported in part by the 2022 Urology Care Foundation™ Research Scholar Award Program and Bristol Myers Squibb (J.F.J.-D.), and the National Institutes of Health (M.S.P.).

    Conflict of Interest Disclosures: The Authors have no conflicts of interest to disclose.

    Ethics Statement: This study was deemed exempt from Institutional Review Board review.