Tech

Meta unveils its newest custom AI chip as it races to catch up

Meta, determined to catch up with its competitors in the field of generative AI, is spending billions on its own AI efforts. Part of these billions is dedicated to recruiting AI researchers. But an even larger portion is devoted to developing hardware, particularly chips to run and train Meta’s AI models.

Meta revealed the latest fruit of its chip development efforts today, reportedly a day after Intel announced its latest AI accelerator hardware. Called the “next generation” Meta Training and Inference Accelerator (MTIA), the successor to last year’s MTIA v1, the chip runs models including ranking and ad display recommendation on Meta properties (e.g. example Facebook).

Compared to MTIA v1, which was built on a 7nm process, the next-generation MTIA is 5nm. (In chip manufacturing, “process” refers to the size of the smallest component that can be built on the chip.) The next-generation MTIA is a physically larger design, boasting more processing cores than its predecessor . And while it draws more power – 90W versus 25W – it also has more internal memory (128MB versus 64MB) and runs at a higher average clock speed (1.35GHz versus 800 MHz).

Meta claims that next-generation MTIA is currently operational in 16 of its data center regions and delivers up to 3x better overall performance than MTIA v1. If that “3x” claim seems a little vague, you’re not wrong – we think so too. but Meta only said that figure came from testing the performance of “four key models” on the two chips.

“Because we control the entire stack, we can achieve greater efficiency compared to commercially available GPUs,” Meta writes in a blog post shared with TechCrunch.

Meta’s hardware presentation — which comes just 24 hours after a press conference on the company’s various ongoing generative AI initiatives — is unusual for several reasons.

First, Meta reveals in the blog post that it is not using next-generation MTIA for generative AI training workloads at this time, although the company says it has “multiple programs in progress” to explore this. subject. Second, Meta admits that next-generation MTIA will not replace GPUs for running or training models, but rather complement them.

Reading between the lines, Meta moves slowly – perhaps slower than she would like.

Meta’s AI teams are almost certainly under pressure to reduce costs. The company is expected to spend around $18 billion by the end of 2024 on GPUs to train and run generative AI models, and – with training costs for cutting-edge generative models running into the tens of millions of dollars – the internal hardware presents an interesting alternative.

And while Meta’s hardware lags, competitors are getting ahead, much to the dismay of Meta executives, I suspect.

Google this week made its fifth-generation custom chip for training AI models, TPU v5p, generally available to Google Cloud customers, and unveiled its first chip dedicated to running models, Axion. Amazon has several families of custom AI chips under its belt. And last year, Microsoft entered the fray with the Azure Maia AI accelerator and Azure Cobalt 100 processor.

In the blog post, Meta claims that it took less than nine months to “go from first silicon to production models” of the next-generation MTIA, which, to be fair, is shorter than the typical window between TPU from Google. But Meta has a lot of catching up to do if it hopes to achieve some independence from third-party GPUs – and compete with its tough competition.

techcrunch

Back to top button