AI + ML
Stanford researchers argue for transparency and independent testing
AI algorithms exhibit racial bias in selecting job applicants, and more often discriminate against people who apply for multiple jobs at different companies, according to Stanford University-led researchers.
boffins evaluated algorithmic hiring decisions across multiple employers using the same hiring vendor. The resulting algorithmic monoculture is problematic, they say.
The vendor in this case was Pymetrics, a talent platform acquired by Harbor in 2022. Harbor did not immediately respond to a request for comment.
The researchers – Rishi Bommasani, Sarah H. Bana, Kathleen A. Creel, Dan Jurafsky, and Percy Liang – acquired a pymetrics dataset spanning the period from December 2018 to December 2022. It included 4,197,168 job applications submitted by 3,372,132 applicants for 1,746 vacancies.
The dataset details hiring recommendations provided to 156 employers with total annual revenue of $225 billion. It spans 11 industries including finance, manufacturing, and warehousing.
When people applied for jobs at these companies, they were directed to pymetrics’ machine learning platform to play a rating game. The platform’s algorithms measure gameplay performance and recommend an average of 58.2 percent of applicants for each position. Employers decide who gets interviewed, and candidates not recommended by the recruitment platform are usually rejected.
Researchers claim the pymetrics algorithm is unfair.
The researchers said they “found substantial evidence of racial disparities in AI-based candidate screening.”
They made that decision by applying the U.S. Equal Employment Opportunity Commission’s “four-fifths rule.” The rule, at least on paper, draws the agency’s attention when the selection rate for a particular group is less than 80 percent of the most recommended group of job applicants.
“We found that 26% of black applicants and 15% of Asian applicants applied for jobs where their racial group was discriminated against by AI systems,” the researchers said.
If these black and Asian candidates progressed through their job applications at the same rate as the most advantaged groups (usually white applicants), about 40,000 more candidates would move on to the next round of selection.
Additionally, the report’s authors say that when people submit multiple applications to different companies that use the same hiring algorithm, they are more likely to be rejected everywhere than if the companies use different hiring technologies. They found that 10 percent of job seekers who submitted four applications were rejected from all the places they applied.
They say this pattern does not appear in recruitment studies that consider hiring without considering the use of AI. The rejection rate is in line with what would be expected if all companies made their own decisions without relying on a single algorithm.
The study authors state in their paper: [PDF] “Algorithmic Monoculture in Hiring” states that previous research has documented patterns of discrimination when decisions are made based on an applicant’s resume (for example, when a name or activity is more common among certain groups).
The gameplay approach used by pymetrics may lack such demographic information, but researchers say the negative effects are seen despite pymetrics’ efforts to eliminate demographic details and application bias.
They say that because AI focuses on and models variables that are proxies for demographic data (for example, if a demographic group is overrepresented in a particular zip code or a particular school), this supports previous research showing that AI can have a discriminatory impact even in the absence of demographic data.
Pymetrics researchers investigated the impact of AI on employment in a 2022 paper and found that their algorithm did not violate EEOC standards. They argue that fair hiring is complex and that candidate selection was also problematic before the advent of AI.
”[W]”While it is true that machine learning has the potential to cause harm in the form of codifying bias and concealing discrimination, these effects are already prevalent due to the widespread use of traditional assessments in many industries,” the authors of the Pymetrics study said.
The Stanford group attributes these results to pymetrics’ approach of pooling all recommendations and considering them holistically. Discrimination does not appear when it is averaged out. The authors argue that there is a need to separate tasks.
“For example, imagine that an AI tool frequently recommends black applicants for warehouse jobs, but rarely for finance jobs,” they explain. “If we averaged across all jobs, these two patterns would cancel each other out, making it appear that discrimination does not exist. The global average masks the discrimination that is actually occurring on a job-by-job basis.” ®
