Substituting genetic ancestry for race in research? Not so fast

RAss, widely used as a variable in biomedical research and medicine, is an appropriate proxy for racism—but not anything biological. Proposals to use genetic ancestry instead of race risk perpetuating the same problems.

Dozens of algorithms widely used in clinical care include an adjustment factor for a patient’s race. For example, estimating kidney function returns different results depending on whether the patient’s race is entered as Black or Non-Black, although at least for kidney function, the use of race is questioned. Some drugs have only been approved for those in certain self-identified racial groups. Researchers now routinely consider the race of participants at almost every step of the research process—from recruitment to analysis to interpretation of results.

Racial health disparities have reinvigorated the debate about whether this use of race is appropriate and its possible link to racism.


Of course, race is an important variable to track in order to understand the social drivers of health, including the impact of racism. But it is a highly problematic proxy for everything biological.

In an attempt to reflect on how to better capture potentially relevant biological differences between groups, a common suggestion is to turn to concepts from genetics, and in particular genetic ancestry.


But using genetic ancestry risks perpetuating the same problems as invoking race, several colleagues and I argue in a Policy Forum essay in the journal Science. We argue that genetic ancestry may be part of the solution to understanding our differential risks for disease development and response to therapies, but only if an appropriately complex conceptualization of it is adopted.

The danger of turning to genetic ancestry comes from the more prevalent way ancestry is currently used in genetics than continental categories like African ancestry, European ancestry, and the like. These categories are easily merged with racial categories. For example, European descent is equated with “white” race. As a result, a socio-political term is confused with a biological one. This well-intentioned “solution” ends up perpetuating the same problem inherent in racial categories: that people can be divided into a small number of types based on their biology. Such beliefs are the source of great harm. This represents an ethical imperative to move away from the use of continental ancestry categories.

There is also a scientific need to move away from their use. Here’s what genetic lineage means: A person’s genetic lineage is the pathways through their family tree by which they inherited each segment of their DNA. Population categories are not an integral part of this definition; Imposing any set of categories is a decision that researchers must make and justify.

There are good reasons not to impose continental ancestry categories.

Continental ancestry categories do not adequately capture human diversity. Reassembled datasets, as referenced in Science, highlight that there are no clear-cut categories of genetic variability, only fuzzy continuities. Recent high-profile studies in statistical genetics have shown that in many cases where the use of population categories was previously deemed necessary, categories can be avoided entirely. If basic and translational researchers can avoid categories, they should.

Continental descent categories also give a very incomplete picture of our ancestors. Each of us has ancestors from every point in our species’ past. A group of ancestry categories reflects only one point in that past, so referring to only one group of categories flattens this multidimensional historical picture.

New data are increasingly enabling us to examine different time slices. For example, the human species interbred with Neanderthals 50,000 years ago. The best model suggests that 5,000 years ago, three distinct human groups in Europe mixed to forge modern-day Europeans. 500 years ago, waves of migration and trade in enslaved peoples created new patterns of genetic diversity in America. These different time slices can be medically relevant. For example, one of the key genetic variants linked to the severity of Covid-19 was later linked to a genomic region that humans inherited from Neanderthals. When researchers attempt to understand the relevance of human genetic background, they should routinely consider multiple sets of categories representing multiple time slices.

A consideration of the values, ethics, and purpose of human biology research should compel researchers and those who apply the findings of that research to move away from simple categorization and embrace a more complex version of genetic lineage – one that is continuous in nature of genetics reflects variation and its historical depth. Change is never easy. To achieve this, the research community needs at least new, widely available software tools to enable the use of categories representing multiple time slices, as well as educational materials for researchers, scientists and clinicians. And publishers and funders need to consider what types of works they will fund.

The willingness of academic and healthcare institutions to re-examine their use of race presents an opportunity to move away from the use of race as a biological variable. To make the most of this opportunity, they must embrace a complex conceptualization of genetic ancestry and not allow continental designations that affirm previous racial groups under supposedly race-blind language to become the new standard.

Anna CF Lewis is a Research Fellow at the EJ Safra Center for Ethics at Harvard University.