WHY THIS MATTERS IN BRIEF
You’ve maybe not heard of either of these companies, but both are revolutionary in their field …
Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, read about exponential tech and trends, connect, watch a keynote, or browse my blog.
Artificial Intelligence (AI) mega chip – like literally mega chips the size of a dinner plate – business Cerebras has unveiled its newest supercomputer nicknamed Condor Galaxy, a distributed and federated supercomputing cluster that, when complete, will span nine sites capable of a staggering 36 exaFLOPS of combined FP16 performance. To put that in laymans speak that’ll be a supercomputing cluster capable of over 36 Quintillion (or 10 with18 zeros after it) floating point operations … per second!
The company revealed the first phase of the system Thursday, which was built for one of the world’s fastest growing and under rated AI companies based in the UAE called G42, a multinational conglomerate with an interest in AI research and development, who’ll be using Cerebras’s CS-2 accelerators to power the development of their future AI models.
The Future of AI and Data, by Futurist Matthew Griffin
Cerebras’s accelerators aren’t like the GPUs or AI accelerators that you’ll find in most AI clusters today. They don’t come as PCIe cards or SXM modules like Nvidia’s mega powerful and mega popular H100.
Instead, the company’s WSE-2 are massive, dinner-plate sized affairs, each of which houses 850,000 cores and 40GB of SRAM capable of 20PBps of bandwidth. That’s an order of magnitude faster than the HBM typical of other accelerators. Each of these wafers pack a dozen 100Gbps interfaces that allow the system to be extended to up to 192 systems.
In its current form, Condor Galaxy 1 (CG-1) spans 32 racks, each of which is equipped with the chipmaker’s waferscale CS-2 accelerators, making it twice the size of Cerebras’s Andromeda system that they built last year.
For the moment, CG-1 has 32 of these systems, which are fed by 36,352 AMD Epyc cores. Assuming Cerebras has stuck with AMD’s 64-core CPUs, that works out to 568 sockets.
Put together, the machine packs 41TB of memory – though the WSE-2 wafer’s SRAM only accounts for 1.28TB of that – 194 Tbps of internal bandwidth, and peak performance of two exaFLOPS. But before you get too excited, we’ll remind you that these aren’t the same exaFLOPS we expect to see from Argonne’s newly completed Aurora supercomputer.
HPC systems are measured in double precision (FP64), often using the LINPACK benchmark. AI systems on the other hand don’t benefit from this level of precision and can get away with FP32, FP16, FP8, and sometimes even Int8 calculations. In this case, Cerebras’s systems achieve their most flattering figures in FP16 with sparsity.
While two exaFLOPS of FP16 is impressive on its own, this is only half the setup. When complete, the roughly $100 million system will span 64 racks each with a CS-2 accelerator.
I’m told the system should scale linearly so that the complete cluster will deliver four exaFLOPS of sparse FP16 performance – four times that of Andromeda. Cerebras expects to complete installation of the final 32 racks within the next three months.
Of course, four exaFLOPS of AI performance commands a substantial amount of power and thermal management. Assuming linear scaling from Andromeda, we estimate the system is capable of drawing upwards of two megawatts.
Because of this, Cerebras is housing the system at Colovore’s Santa Clara facility. The colocation provider specializes in high-performance compute and AI/ML applications, and recently revealed racks capable of cooling up to 250 kilowatts.
“This is the first of three US-based massive supercomputers that we will build with them in the next year,” said Cerebras CEO Andrew Feldman.
Using CG-1 as a template, two more US-based sites will be built in Asheville, North Carolina (CG-2), and another in Austin, Texas (CG-3), with completion slated for the first half of 2024. These systems will then be networked to allow the distribution of models across sites, which Feldman insists is possible for certain large, latency-tolerant workloads.
“Latency is a problem for some problems, not all. In the high-performance compute world, it’s a giant problem,” he said. “I think there are many AI workloads for which it’s not a problem. There are some that we won’t distribute. I think we’ll do this thoughtfully and carefully.”
The chipmaker is also careful to note that the system will be operated under US law and will not be made available to advisory states. This is likely a reference to US trade policy governing the export of AI chips to certain countries including Russia, China, and North Korea, among others.
However, Feldman claims the decision to build the systems in the US was motivated by a desire to move quickly.
“I think standing the first three in the US was a function of a desire for time to market,” he said. “I think it was a desire for G42 to expand beyond the Middle East.”
The final stage will see Cerebras construct an additional six sites – the location for which has not yet been disclosed – using CG-1 as a template. The complete Condor Galaxy system will feature 576 CS-2 accelerators capable of a claimed 36 exaflops of sparse FP16 performance, though we don’t expect to see many, if any, workloads spanning the entire nine site constellation. Cerebras aims to complete installation of all nine sites by the end of 2024.
While Cerebras will operate and manage the systems, they’re owned by G42, which plans to use the systems for its internal workloads. Specifically, Cerebras says it is working with three of the multinational’s divisions, including G42 Cloud, the International Institute for AI (IIAI), and G42 Health.
“They partnered with us because we could build and manage big supercomputers, that we could implement massive generative AI models, and that we had a lot of experience cleaning and manipulating very, very large datasets,” Feldman said. “They have vast internal demand for compute among their portfolio companies. But with very big models, with very big compute, there’s a bin packing problem. There’s always an opportunity to slide in other workloads.”
And this means that any leftover resources not consumed by G42 will be made available to both G42 and Cerebras’s customers. For Cerebras, this is critical as Feldman notes that the company’s cloud is already at capacity.
For Feldman and his company, the collaboration with G42 is an opportunity to expose more people to Cerebras’s architecture and compete more aggressively with Nvidia, which holds an outsized share of the market for AI accelerators.
“Nobody buys your stuff without jumping on your cloud and testing and showing and demonstrating,” Feldman added.
The post Cerebras and G42 launch the world’s largest AI supercomputer with 36 Exaflops appeared first on Matthew Griffin | Keynote Speaker & Master Futurist.