Q&A | Algorithmic monoculture in employment

Machine Learning


“Algorithmic Monoculture in Hiring,” by Rishi Bommasani, Sarah Bana, Kathleen A. Creel, Dan Jurafsky, and Percy Liang, examines how automated systems built by the same few algorithm vendors can reject the same applicants over and over again, highlighting the “glaring racial disparities.”

The study’s unique examination of each position is an eye-opener for both job seekers and those looking to find the best candidate for the job. We spoke to the authors about what they discovered.

What is “algorithmic monoculture”?

Sarah Bana: To me, algorithmic monoculture refers to any situation where algorithms produce similar results. There are many simple algorithms, such as requiring a college degree or three years of experience to get a job. However, more complex algorithms now entering the labor market produce similar results through more complex processes. These machine learning tools are built to characterize opaque factors such as “fit” and produce similar results across the enterprise, albeit with less interpretability.

Kathleen Creel: Algorithmic monoculture occurs when the same algorithms dominate a sector, or in a weaker but more typical form, when there are algorithms that are created in a similar way using similar data and make similar decisions.

Could you explain the research methodology?

sarah: We looked at 4 million applications from 3 million applicants, all reviewed by our vendor pymetrics. We performed two main sets of analyses. One is about bias and the other is about homogenization.

For bias analysis, we examined applicants who provided racial data at the position level. US employment law flags a position if one group’s recommendation rate is less than 80% of the most recommended group. This is the “four-fifths rule.” We therefore calculated the recommendation rate for each group and tested whether any group had a recommendation rate that was statistically significantly different from the highest passing group and less than 80% of the recommendation rate of the highest passing group.

The homogenization analysis began by looking at the number of models used across the company. This number was 42.

We also looked at the probability of systematic rejection, that is, the probability of being rejected from every position you applied to. In our other studies, we established the baseline concept. This helps you understand what the rates would be if the models were independent. Pymetrics data shows that 10 percent of applicants for four positions are systematically rejected. Furthermore, there is a large discrepancy between the observed rate and the benchmark rate.

However, when we used our methodology to analyze the largest prior study of hiring decisions, involving 83,000 applications to 108 Fortune 500 companies, we found that the systematic rejection rates observed in our data were predicted very accurately by employers making statistically independent decisions. This means that even for similar time periods, something different is happening in the algorithmically mediated data.

What do you think are the main points?

Kathleen: In previous research, we have hypothesized that when many companies rely on the same AI vendor to screen job applicants, some applicants may miss out on interviews. However, this study was the first to demonstrate this effect using actual recruitment data.

sarah: I think the most important result of our study is how much bias we found in this algorithmic recruitment system. Vendors have published aggregated audits proving that their tools demonstrate no measurable bias. In that sense, I was surprised because I expected their algorithm to be an example of best practice. When we read that something we are purchasing has been audited, we tend to take the results at face value, but that may be part of what is actually going on.

Before we get into the results, could you explain the Pymetrics platform at the heart of your research?

sarah: of course. Pymetrics is a response to a long tradition in industrial-organizational (I/O) psychology aimed at improving the work performance of individuals and organizations. Previous generations of job selection tools used computer-administered personality tests. For example, Autor and Scarborough (2008) analyzed a test built on the five-factor model (conscientiousness, agreeableness, extraversion, openness, and neuroticism).

The founders of Pymetrics argued that there were better ways to measure personality than asking people about themselves. The games candidates are asked to play (short gamified tasks based on neuroscience) are designed to reveal those traits through behavior rather than self-report. In many ways, Pymetrics seeks to improve a system that has historically been highly biased. People often have a hard time getting jobs because of what’s on their resume (or lack thereof), but a resume-agnostic process could, in principle, remove that barrier.

But our actions also code who we are. One of the pymetric games is to pop balloons to measure risk tolerance, and a friend recently pointed out to me that the risk aversion of someone on the poverty line seems to be very different from that of someone who has never missed a meal.

Ideally, the future of this type of assessment will include a simulation of the actual work as part of the interview. I have a lot of sympathy for companies looking to hire. I recently went through an interview process that took over 5 months from start to finish, and had far fewer applicants than the median for a position at Pymetrics. AI makes adoption faster and cheaper, but at the cost of being proactive in measurement and auditing.

Which groups are most negatively affected?

sarah: We see that many black and Asian applicants are negatively affected. There’s no evidence of causation here, but my guess is that the behavior being detected by the game is acting as a proxy for race. This is the type of bias that is difficult to remove without explicitly adjusting the trained model.

There are also structural parts that cannot be observed with the data we have access to. The models are trained on current employees in specific roles at each company, which may not be very diverse to begin with.



Source link