Amazon Is Building a Mega AI Supercomputer With Anthropic

https://www.wired.com/story/amazon-reinvent-anthropic-supercomputer/

Amazon is building one of the world’s most powerful artificial intelligence supercomputers in collaboration with Anthropic, an OpenAI rival that is working to push the frontier of what is possible with artificial intelligence. When completed, it will be five times larger than the cluster used to build Anthropic’s current most powerful model. Amazon says it expects the supercomputer, which will feature hundreds of thousands of Amazon’s latest AI training chip, Trainium 2, to be the largest reported AI machine in the world when finished.

Matt Garman, the CEO of Amazon Web Services, revealed the supercomputer plans, dubbed project Rainer, at the company’s Re:Invent conference in Las Vegas today, along with a host of other announcements cementing Amazon’s rising dark-horse status in the world of generative AI.

AI Lab Newsletter by Will Knight

WIRED’s resident AI expert Will Knight takes you to the cutting edge of this fast-changing field and beyond—keeping you informed about where AI and technology are headed. Delivered on Wednesdays.

Garman also announced that Tranium 2 will be made generally available in so-called Trn2 UltraServer clusters specialized for training frontier AI. Many companies already use Amazon’s cloud to build and train custom AI models, often in tandem with GPUs from Nvidia. But Garman said that the new AWS clusters are 30 to 40 percent cheaper than those that feature Nvidia’s GPUs.

Amazon is the world’s biggest cloud computing provider, but until recently, it might have been considered a laggard in generative AI compared to rivals like Microsoft and Google. This year, however, the company has poured $8 billion into Anthropic, and it has quietly pushed out a range of tools through an AWS platform called Bedrock to help companies harness and wrangle generative AI.

At Re:Invent, Amazon also showcased its next-generation training chip, Trainium 3, which it says will offer four times the performance of its current chip. It will be available to customers in late 2025.

“The numbers are pretty astounding” for the next-generation chip, says Patrick Moorhead, CEO and chief analyst at Moore Insight & Strategy. Moorhead says that Trainium 3 appears to have received a significant performance boost from an improvement in the so-called interconnect between chips. Interconnects are critical in developing very large AI models, as they enable the rapid transfer of data between chips, a factor AWS seems to have optimized for in its latest designs.

Nvidia may remain the dominant player in AI training for a while, Moorehead says, but it will face increasing competition in the next few years. Amazon’s innovation “shows that Nvidia is not the only game in town for training,” he says.

via Wired Top Stories https://www.wired.com

December 3, 2024 at 12:21PM

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.