New AI-optimised Arm technologies to redefine mobile
As AI models continue to rapidly evolve, software is starting to outpace hardware which means additional innovation is required at all levels of the compute stack. To meet these growing demands, Arm says it is evolving its solution offering to gain the maximum benefits of leading process nodes and announcing the newest Arm compute solution for AI smartphones and PCs – Arm Compute Subsystems (CSS) for Client.
Arm CSS for Client is said to provide the performance, efficiency and accessibility to deliver leading AI-based experiences and make it easier and faster for the company's silicon partners to build Arm-based solutions and get to market quickly.
Arm says CSS for Client provides the foundational computing elements for flagship SoCs and features the latest Armv9.2 CPUs and Immortalis GPUs, as well as production ready physical implementations for CPU and GPU on 3nm and the latest Corelink System Interconnect and System Memory Management Units (SMMUs).
Chris Bergey, SVP and GM, Client Line of Business, Arm says: "CSS for Client delivers a step change in platform capabilities to continue pushing the boundaries of premium mobile experiences.
"This is the fastest Arm compute platform addressing demanding real-life Android workloads with greater than 30 percent increase on compute and graphics performance and 59 percent faster AI inference for broader AI/ML and computer vision (CV) workloads.
"At the heart of CSS for Client is Arm’s most performant, efficient and versatile CPU cluster ever for maximum performance and power efficiency. The new Arm Cortex-X925 delivers the highest year-on-year performance uplift in the history of Cortex-X.
"Taking advantage of the leading edge 3nm process nodes, assuming a 3.8GHz clock rate and maximum cache size, the result is a massive 36 per cent increase in single-thread performance when comparing to 2023 smartphone flagship 4nm SoCs.
"For AI, Cortex-X925 provides an incredible 41 per cent performance uplift to dramatically improve the responsiveness of on-device generative AI, like large language models (LLMs).
"The push for leading-edge performance is combined with leading-edge efficiency through our new Arm Cortex-A725 CPU, which delivers a 35 per cent improvement in performance efficiency to target AI and mobile gaming use cases.
"This is supported by a refreshed Arm Cortex-A520 CPU and an updated DSU-120 that provide power efficiency and scalability improvements for consumer devices that adopt the latest Armv9 CPU clusters."
Bergey says Arm is "relentlessly focused on millions of developers worldwide, ensuring they have access to the performance, tools and software libraries required to create the next wave of AI-enabled applications."
To enable developers to land these innovations quickly at the highest performance, the company is introducing Arm Kleidi, which includes KleidiAI for AI workloads and KleidiCV for computer vision applications.
KleidiAI is a set of compute kernels for developers of AI frameworks, providing them with frictionless access to the best performance possible on Arm CPUs, across a wide range of devices, with support for key Arm architectural features such as NEON, SVE2 and SME2.
KleidiAI integrates with popular AI frameworks such as PyTorch, Tensorflow, MediaPipe and Meta Llama 3, and is also backwards and forwards compatible to ensure Arm is 'future fit' as it brings additional technologies to market.