CERN brainiacs bake AI into silicon to stop data overflow • The Register

Features CERN is quite different from today’s agent AI jockeys, which rely primarily on pre-configured weights and general-purpose TPUs and GPUs to generate slop. CERN bakes custom nanosecond-speed AI into the silicon itself just to eliminate redundant data.

Like a major league pitcher who picks up and drops his parents from school, CERN’s Thea Ahlestad gave a presentation at the Virtual Monster Scale Summit earlier this month about meeting an ultra-strict set of requirements that most of her colleagues have never experienced.

Ahrestad is an assistant professor of particle physics at ETH Zurich. CERN (European Organization for Nuclear Research) uses machine learning to optimize data collection from the Large Hadron Collider (LHC). Her specialty is anomaly detection, a core component of any good observability system.

Each year, the LHC generates 40,000 EB of unfiltered sensor data alone, which Ahlstad estimates is about a quarter of the size of the entire Internet. CERN cannot store all of its data. As a result, “we need to reduce that data in real time to the amount we can keep.”

“Real time” means extreme real time. The LHC detection system processes data at speeds of up to hundreds of terabytes per second. This is much faster than Google or Netflix, whose latency requirements are also easier to meet.

The algorithms that process this data have to be very fast,” Ahlstad said, so fast that the decisions have to be baked into the chip design itself.

smash burger

The LHC is housed in a 27-kilometer ring located 100 meters underground between the Swiss and French borders, crushing subatomic particles at near-light speeds. The resulting collisions are expected to produce new types of matter that fill the standard model of particle physics, our understanding of the universe’s operating system.

At any given time, bundles of about 2,800 protons fly around the ring at nearly the speed of light, separated by 25 nanosecond intervals. Just before reaching one of the four underground detectors, special magnets compress these populations to increase the chance of interaction. Nevertheless, direct collisions are incredibly rare. Of the billions of protons in each bundle, only about 60 pairs actually collide during the crossing.

When particles collide, their energy is converted into a new mass of outgoing particles (E=MC)² In the house! ). These new particles “shower” through CERN’s detector, and “we’re trying to reconstruct it” to identify new particles produced in the ensuing scuffle, she said.

Each collision generates several megabytes of data, approximately 1 billion collisions per second, resulting in approximately 1 petabyte of data (equivalent to the size of the entire Netflix library).

Rather than trying to transfer all this data to the ground, CERN decided it was more practical to create a monster-sized edge computing system that would instead sort out the interesting bits at the detector level.

Huge Edge Computing

“If we had infinite computing, we could look at all of that,” Ahlestad said. However, less than 0.02% of the data is actually stored and analyzed. It is up to the detector itself to identify action scenes.

The detector, built on an ASIC, buffers the acquired data for up to 4 microseconds, after which the data “falls off a cliff” and is permanently lost from history if not saved.

The decision is made by a Level 1 trigger, a collection of approximately 1,000 FPGAs. This trigger digitally reconstructs event information from a reduced set of event information provided by the detector at approximately 10 TB/s over a fiber optic line. A trigger produces a single value, either “accept” (1) or “reject” (“0”).

It is the job of the anomaly detection algorithm to decide whether to maintain the collision or not. It is incredibly selective and should completely reject over 99.7% of input. The algo, affectionately named AXOL1TL, is trained in the “background,” or in the realm of the standard model that has already been largely explored. It is aware of the typical topology of standard collisions and can instantly flag events that are out of bounds. It’s looking for “unusual physics,” as Ahlstad puts it.

The algorithm must make a decision within 50 nanoseconds. Only about 0.02% of all impact data, or about 110,000 events per second, is cut out and then stored and transmitted to the ground. Even with this streamlined throughput, only terabytes per second are sent to the ground server.

Once the data comes to the surface, a second round of filtering occurs, called a “high-level trigger.” This again discards the majority of the captured collisions and only identifies about 1,000 interesting collisions out of the 100,000 events per second passing through the pipe. The system has 25,600 CPUs and 400 GPUs to reproduce the original collision and analyze the results, generating approximately petabytes per day.

“This is the data that we actually analyze,” Ahlstad said.

From there, the data will be replicated to 170 locations in 42 countries, making it available for analysis by researchers around the world with the power of a total of 1.4 million computer cores.

Greenhouse environment for AI

The LHC detector is a greenhouse environment that AI rarely encounters. So CERN engineers had to create their own toolbox.

Indeed, there are already many real-time libraries for consumer applications such as noise-cancelling headphones such as MLPerfMobile and MLPerfTiny. However, it falls far short of supporting the streaming data rates and ultra-low latency required by CERN.

So CERN trained its machine learning models “to be small from the beginning,” she said. They were quantized, pruned, parallelized, and only the essential knowledge was extracted. All operations on the FPGA are quantized. Each parameter has a unique bit width defined and differentiated, so it can be optimized using gradient descent.

The engineering team developed HLS4ML, a transpiler that writes models in C++ code targeted to specific platforms. This allows it to run on accelerators, systems on chips, custom FPGAs, or even be used to “print silicon” on ASICs.

The detector architecture departs from the traditional von Neumann model of memory, processor, and I/O. Nothing is driven sequentially. Rather, she said, it is based on “data availability.” “As soon as this data is available, the next process will begin.”

Most importantly, decisions must be made on-chip. You can’t pass anything, even very fast memory. All hardware is tailored to specific models. Decisions are made at design time. Each layer of an FPGA is a separate computational unit.

A significant portion of the on-chip silicon is used for pre-computation to save processing on each new computer run. The outputs of all possible inputs are referenced in a lookup table.

Naturally, it is not possible to place huge models on these pieces of silicon. There is no room here for deep learning models that are transformative at scale. Here CERN found that tree-based models are significantly more powerful compared to deep learning models.

CERN’s experience shows that tree-based models offer the same performance as deep learning models at a fraction of the cost. This is not surprising, considering that the standard model can be thought of as a collection of tabular data. For each collision, the LHC spits out a structured set of discrete measurements.

Please give me more data

CERN seeks to measure all parameters of the collision to the five sigma level, or 99.999%, to five nines, the gold standard for claiming a discovery. The Higgs particle was discovered using this standard.

The LHC collider has discovered at least 80 other hadrons, or particles held together by strong nuclear forces, including one last week.

The search continues for new processes that occur in less than one in a trillion collisions.

At the end of this year, the LHC will close in favor of the High-Luminosity LHC, which is scheduled to become operational in 2031. The LHC will provide even more fascinating data that particle physicists crave.

More powerful magnets are included to focus the beam into a very small spot. The size of the proton bundle doubles (“thus making it more likely that those protons will interact with each other”).

This means a significant increase in collisions and a 10x increase in data, resulting in a denser “event complexity”. The event size increases from 2MB to 8MB, but the resulting data footprint increases from 4 Tb/sec to 63 Tb/sec.

The detectors have been upgraded to identify each collision and track each pair of particles back to the original impact point, all within microseconds.

While cutting-edge AI labs are building increasingly large models, CERN is in many ways going in the opposite direction, employing aggressive anomaly detection, heterogeneous quantization transformers, and other tricks to make AI smaller and faster than ever before. As we deepen our understanding of the universe, it’s sometimes better to know what information to discard. ®

Source link