How to Measure Diversity

Aww, I thought John Doerr’s joke was funny. Many people didn’t know it was a joke, but I think that was Kim-Mai’s intention.

Doerr’s comment surfaces a real issue, however. In Silicon Valley, diversity has become a pissing match.

    We’re so diverse, we just hired eight women.
    Oh yeah? We’re so diverse, we hired two illegal immigrants.
    Well we’re so diverse, we added Caitlyn Jenner to our advisory board.
    Please. We’re so diverse, we hired an amputee, seven dwarves, and a donkey.
    Well we’re so diverse, I can’t pronounce any of my coworkers’ names!

We’ve lost sight of what “diversity” even means.

Let’s bring it back to science.

The evolutionary relationships between ethnic groups. The numbers denote bootstrap values, or confidence levels for each branch.
The evolutionary relationships between ethnic groups.

First we examine genetic diversity. Genetic diversity can be measured in terms of genetic distance between population groups. In many genetic variation studies, 100 distinct Alu elements, or genomic mutations, are quantified [1]. Descendants from the same continent have about 86% of these mutations in common, but descendants of different continents only have 10% in common.

An appropriate diversity metric would be the variance in genetic distance using Alu elements.

Geographic distance is plotted against genetic distance for pairwise comparisons between population groups.
Geographic distance is plotted against genetic distance for pairwise comparisons between population groups.

Genetic distance in the above chart is calculated using Nei’s standard genetic distance [2].

Let jX be the probability for two members of population X having the same Alu element at a particular locus, and jY be the corresponding probability in group Y. Also, let jXY be the probability for a member of X and a member of Y having the same Alu element. Now let JX, JY, and JXY represent the arithmetic mean of jX, jY, and jXY over 100 Alu elements.

Then, distance is expressed as:

distance

And variance in genetic distance is:

CodeCogsEqn (1)

where N is the number of employees.

A note on gender: The human genome is estimated to contain 20,000-25,000 genes. The Y chromosome has 231. Because X chromosomes are matching pairs, any genetic difference between sexes would be isolated to the Y chromosome, for a less than 1.2% difference. A European male is more genetically similar to a European female than he is to an Asian male.

Of course, we cannot rely on genetic variance alone — Otherwise we would soon see tech companies trying to hire employees with extra chromosomes to increase diversity.

We must account for environmental factors.

Assume that every trait with phenotype value P is determined by genetic (G) and environmental factors (E) and their interaction (G x E), expressed as [3]:

CodeCogsEqn (2)

Then the true variance should be calculated as

CodeCogsEqn (3)

A simplified representation for environmental factor (E) could be the year that population group was allowed to vote: 1828 for Jewish Americans, 1856 for poor people, 1870 for African Americans, 1910 for the illiterate, 1920 for women, 1924 for Native Americans, 1952 for Asian Americans, and never for convicted felons.

Finally, we note that a group of entirely Native American engineers is no more diverse than a group of entirely white engineers; however only one is worth advertising.

Thus we conclude that when we say “diversity”, we really mean “check your privilege.” So just do that and everything will be cool.

References:
1. Watkins WS, Rogers AR, Ostler CT, et al. Genetic Variation Among World Populations: Inferences From 100 Alu Insertion Polymorphisms. Genome Research. 2003;13(7):1607-1618. doi:10.1101/gr.894603
2. Masatoshi Nei. Genetic Distance between Populations. The American Naturalist. 1972;106(949):283-292
3. Wu R, Ma CX, Casella G. Statistical Genetics of Quantitative Traits: Linkage, Maps and QTL. Springer 2007.

Note: I am not a geneticist nor a statistician. If any of this sounds like crap, please call me out.