How to Measure Diversity

Aww, I thought John Doerr’s joke was funny. Many people didn’t know it was a joke, but I think that was Kim-Mai’s intention.

Doerr’s comment surfaces a real issue, however. In Silicon Valley, diversity has become a pissing match.

    We’re so diverse, we just hired eight women.
    Oh yeah? We’re so diverse, we hired two illegal immigrants.
    Well we’re so diverse, we added Caitlyn Jenner to our advisory board.
    Please. We’re so diverse, we hired an amputee, seven dwarves, and a donkey.
    Well we’re so diverse, I can’t pronounce any of my coworkers’ names!

We’ve lost sight of what “diversity” even means.

Let’s bring it back to science.

The evolutionary relationships between ethnic groups. The numbers denote bootstrap values, or confidence levels for each branch.
First we examine genetic diversity. Genetic diversity can be measured in terms of genetic distance between population groups. In many genetic variation studies, 100 distinct Alu elements, or genomic mutations, are quantified [1]. Descendants from the same continent have about 86% of these mutations in common, but descendants of different continents only have 10% in common.

An appropriate diversity metric would be the variance in genetic distance using Alu elements.

Geographic distance is plotted against genetic distance for pairwise comparisons between population groups.
Genetic distance in the above chart is calculated using Nei’s standard genetic distance [2].

Let jX be the probability for two members of population X having the same Alu element at a particular locus, and jY be the corresponding probability in group Y. Also, let jXY be the probability for a member of X and a member of Y having the same Alu element. Now let JX, JY, and JXY represent the arithmetic mean of jX, jY, and jXY over 100 Alu elements.

Then, distance is expressed as:


And variance in genetic distance is:

where N is the number of employees.

A note on gender: The human genome is estimated to contain 20,000-25,000 genes. The Y chromosome has 231. Because X chromosomes are matching pairs, any genetic difference between sexes would be isolated to the Y chromosome, for a less than 1.2% difference. A European male is more genetically similar to a European female than he is to an Asian male.

Of course, we cannot rely on genetic variance alone — Otherwise we would soon see tech companies trying to hire employees with extra chromosomes to increase diversity.

We must account for environmental factors.

Assume that every trait with phenotype value P is determined by genetic (G) and environmental factors (E) and their interaction (G x E), expressed as [3]:

Then the true variance should be calculated as

A simplified representation for environmental factor (E) could be the year that population group was allowed to vote: 1828 for Jewish Americans, 1856 for poor people, 1870 for African Americans, 1910 for the illiterate, 1920 for women, 1924 for Native Americans, 1952 for Asian Americans, and never for convicted felons.

Finally, we note that a group of entirely Native American engineers is no more diverse than a group of entirely white engineers; however only one is worth advertising.

Thus we conclude that when we say “diversity”, we really mean “check your privilege.” So just do that and everything will be cool.

1. Watkins WS, Rogers AR, Ostler CT, et al. Genetic Variation Among World Populations: Inferences From 100 Alu Insertion Polymorphisms. Genome Research. 2003;13(7):1607-1618. doi:10.1101/gr.894603
2. Masatoshi Nei. Genetic Distance between Populations. The American Naturalist. 1972;106(949):283-292
3. Wu R, Ma CX, Casella G. Statistical Genetics of Quantitative Traits: Linkage, Maps and QTL. Springer 2007.

Note: I am not a geneticist nor a statistician. If any of this sounds like crap, please call me out.

Anne Frank: Too Hot For TechCrunch Disrupt

Our TechCrunch Disrupt Hackathon entry was blocked from the submission gallery.

Here’s what we made:

Build-a-Buddy: Create Your Own Virtual Friend with a Custom Personality

We characterized over a hundred different personalities using the autobiographies of famous people from Anne Frank to Donald Trump.

When a user creates a new buddy profile, we match it to the closest existing personality in our database using the shortest total Euclidean distance between all the personality traits.

Then, we load the dialog file corresponding to the matching personality. We enable conversation using the IBM Watson Dialog Service API.

We had chosen to include Anne Frank to illustrate diversity. There’s no shortage of autobiographical material to profile powerful white males. With Anne Frank, we gain the personality of a persecuted 13-year-old girl.

But fine whatever. We had been up for over 24 hours and had no desire to argue. I replaced Anne Frank with Ben Franklin, a noncontroversial Anglo-Saxon slaveowner.

The Diary of Anne Frank was on my fourth grade reading list. It tells the story of six million genocide victims through the voice of a child.

When it came time for presentations, three separate hackathon organizers approached our team to ensure we would not include or display anything about “Anne Frank”. A TechCrunch editor stopped us (and only us) to review the app before we could be allowed on stage.

How did one of the most important cultural figures of the 20th century become grounds for controversy?

I’m sad about the state of Silicon Valley. I’m sad that an event that awards $5000 to a Donald Trump drinking game finds Anne Frank “potentially offensive”. I’m sad that an industry that bills itself as “disruptive” needs to police its public image.

And most of all I’m sad that writers for a leading tech publication can’t even spell “Anne Frank”.

Build-A-Buddy: Create Your Own Virtual Friend with a Custom Personality