Training large AI models is energy-intensive by design. Data centers consume gigawatts of electricity, and AI companies are now publishing carbon reports. But while the energy is measured and remains in the conversation, the metals in AI chips are largely unexamined.
A new study takes the chip apart and analyzes it element by element to arrive at a number the industry doesn’t want to calculate.
To find out what AI hardware actually contains, a team at the University of Bonn took apart an Nvidia A100 (the chip that powered the initial boom in AI chatbots) and analyzed it in a chemistry lab.
Sophia Falk, a researcher at Bonn’s Sustainable AI Lab and lead author of the study, worked with colleagues to catalog all the elements in the device. They found 32 pieces.
Approximately 90% of the chip’s mass consists of heavy metals. Copper alone weighs about 3 pounds (1.4 kilograms) each, with iron, tin, silicon, and nickel making up the top five. Gold, silver, platinum, and palladium are present in only trace amounts.
toxic mixture
Of the 32 elements the research team cataloged, a surprising number are classified as hazardous, including arsenic, mercury, lead, cadmium, chromium, zinc, nickel, antimony, cobalt, and beryllium.
On a mass basis, approximately 93% of a single A100 is composed of elements with documented toxic properties. These materials are sealed inside the device, so a technician can slip them into a server rack without any danger.
The danger does not lie in the chips inside the servers. It’s in the ground where those metals are mined and in the e-waste piles where old hardware ends up. Another paper from the same group documents the entire cradle-to-grave cycle.
How many chips do you actually need for one training run? It depends on two factors: how hard the chip is pushed and how long it takes for the chip to fail.
Under what the team calls a most reasonable baseline (35% utilization, two-year lifespan), one round of GPT-4 training consumes the equivalent of approximately 2,515 A100 chips.
Increasing lifespan to 3 years reduces it to about 1,676 years. Conversely, at low utilization and short lifetimes, a single training run can consume up to 8,800 GPUs.
In any case, that’s thousands of devices for one model. The researchers estimate that each GPT-4 training run extracts approximately 4 tons (3.6 tons) of material.
Declining benefits of AI
The most impressive numbers in this paper aren’t about a single model. It’s a jump between the two. OpenAI’s migration from GPT-3.5 to GPT-4 required approximately 31x more GPU resources, which is more than a 3,000% increase in compute power.
Performance returns were uneven. GPT-4 jumped 61% over the previous generation in hard math benchmarks and 39% in coding. But common sense dictates that only 14% has improved.
“Architectural innovations and training methodologies can yield more effective performance gains than simply scaling raw resources,” Falk and co-authors wrote. The paper argues that big and smart are not the same thing.
expensive place
At the chip level, the numbers tell one story and the geography tells another. The metal in the A100 comes from mines and refineries far from the data centers that power the chips.
Across the nine models the team analyzed, in the most plausible scenario, a total of about seven tons (6.4 tons) of material was extracted, almost all of which was classified as hazardous. In the worst-case scenario, nearly 22 tons (20 tons) would fall.
Most of that environmental impact does not occur near the data centers that use the chips. They tend to occur near mines in areas with less environmental monitoring than in cities that buy computing power.
Longer lever life
The researchers write that two things could significantly change these numbers. Run the chip harder while it’s running. Please make it last longer. The effects are compounded.
Utilization increases from 20% to 60%, reducing the number of GPUs for a given training job by about two-thirds. Increasing the hardware lifespan from 1 to 3 years results in a similar reduction.
Combine both approaches. If you run A100 for five years at 60% utilization instead of one year at 20% utilization, the required chips for training GPT-4 will drop from 8,800 to 587. This is a 93% reduction.
Separate analysis predicts that AI workloads will account for nearly 70% of total data center demand by 2030, increasing risks if efficiency gains are not realized.
What does this change?
The central contribution of this study is the bridge. GPUs are known to contain heavy metals, and so is AI training’s desire for those chips. What no one had ever done was connect these facts together to quantify the material cost of training a particular model.
A GPT-4 baseline currently exists. Each training run uses thousands of chips and tons of mined material, most of which is toxic. Policy makers, AI developers, and chip manufacturers have something concrete to address.
Energy and water are no longer the entire footprint needed to train large-scale AI models. Metals, many of which are toxic, belong to the same category.
Falk’s group is calling on AI institutes to disclose their training structure as part of their standard sustainability reports. So the next model’s footprint doesn’t have to be pieced together from leaked spec sheets by outsiders.
The research will be published in a journal Communication Earth and Environment.
—–
Like what you read? Subscribe to our newsletter for fascinating articles, exclusive content and the latest updates.
Check us out on EarthSnap, the free app from Eric Ralls and Earth.com.
—–
