Naren Nayak – Sr. Director, Application Engineering
Today’s cloud is built on commodity hardware and open source software; this potent combination has enabled massive economies of scale for the biggest cloud service providers (CSPs) like AWS and Alibaba. CPUs are among the most expensive components in the bill of materials (BOM) for datacenter servers. In this blog, I will discuss the industry needs and transitions that led to Ampere’s focus on and design for disruption in the cloud computing market, followed by five traits of a CPU designed with a ‘clean slate’ approach for the cloud market.
Datacenter CPUs continue to be designed with the broader computing market in mind. x86 CPUs, for example, run applications from Adobe Photoshop and Microsoft Word to Apache Hadoop and Oracle. In addition, they must bear the burden of legacy – code that ran on an Intel 8086 decades ago must run on the newest Intel processors. This broad market applicability, attractive as it is, comes with one unavoidable characteristic – Jack of all trades, master of none. This trait, coupled with the size of infrastructures CSPs run on, has led some to design custom silicon for specific usages, whether it’s inference accelerators like Google’s Tensor Processing Units (TPUs) or AWS’ Nitro that offloads some of the common datacenter taxes to a purpose-built ASIC. Additionally, software architectures today mimic the usages of their target markets very closely. Cloud-native software looks very different from code written for the enterprise and for laptops, with recent paradigms like containerization, orchestration, and CI/CD. In today’s world, why would anyone continue to design CPUs that attempt to do everything?
Designing a CPU for one specific segment, especially a (relatively) lower volume market like the cloud was not possible a decade ago. The higher margins from selling a few hundred thousand datacenter CPUs have historically been made possible by running fabrication plants at capacity while manufacturing hundreds of millions of client CPUs. The recent rise of “fabless manufacturing,” where the fabrication of hardware devices is outsourced to a foundry that has a diversified portfolio of customers, has lowered the barrier to entry and made it possible for agile semiconductor companies like Ampere to succeed while focusing solely on design, validation, and software enabling.
With that out of the way, if one were to design a CPU purpose-built for cloud usages, what would it look like? What features should be prioritized? What compromises would be made? Based on cloud software stacks that we have studied extensively over the last few years and building upon industry research from the past decades, here are five traits that we believe, make for a good CPU for cloud workloads:
1. Scale out capabilities
Most pre-cloud software applications rely on higher CPU frequencies, larger memory capacity, and micro-architectural optimizations to improve from one generation to the next. This type of vertical scaling has been enabled by observations like Dennard’s Law in the past. These ‘scale up’ improvements have diminished over the last decade (as has Dennard’s Law) and generational improvements in CPUs today are typically achieved through innovations like higher core counts. Cloud software design has embraced this ‘scale out’ philosophy, allowing for elasticity, built-in redundancy, and the ability to essentially treat the datacenter as a large computer. A large CPU core count allows for higher datacenter density, allowing CSPs to extract even more revenue out of their CapEx while lowering total cost of ownership (TCO). The Ampere Altra and the recently-announced Altra Max provide up to 128 cores per socket for extreme scale-out capabilities.
2. High throughput & predictable scaling under SLA constraints
In the 2000s, when applications were monolithic and designed for a bare-metal world, performance was measured in terms of throughput and occasionally with an expected average latency as an SLA. Since then, it has become clear that average latencies tend to hide outliers and that the long tail of latencies really matters for cloud workloads, especially those with microservices and the high levels of inter-process communication they bring with them. Cloud-native CPUs like the Ampere Altra have been designed for predictable frequency scaling and with features like single-threaded cores to lessen the impact of problems like noisy neighbors.
3. Built on an Open Architecture
As Moore’s Law has slowed down, architectural innovations have come from customizations catered to specific workloads and markets. The Arm architecture allows extreme flexibility for a company like Ampere to customize CPUs to meet customer workload requirements. For our customers, the ecosystem openness translates to rapid integration of complementary IP blocks and devices. This flexibility and openness have led Arm to become the dominant microprocessor architecture that sells billions of chips a year in devices from toasters and mobile phones to laptops and now, cloud CPUs.
4. Great out-of-the-box (OOTB) performance
As general-purpose application processors have evolved, an increasing amount of valuable die space has been devoted to new functionality for specific workloads, like wider vector units that HPC applications might benefit from. These features might not have broader applicability, leading to ‘one-off’ solutions. In addition, some of these features end up requiring a fair bit of system and software tuning to fully take advantage of. Given a choice between spending resources to tune and optimize platforms to get good performance and focusing on product functionality and features, most developers would opt for the latter. So would infrastructure operators! Great OOTB performance using industry standard compilers, SDKs, and open source software stacks is a promise that has not yet been delivered in the cloud. At Ampere, we strive for competitive OOTB performance while treating the upside from performance tuning as icing on the cake.
5. Superior energy efficiency
With the cloud computing market growing in the double digits every year, power consumption and energy efficiency are no longer ‘good to have’ metrics. CSPs these days start off with rack-level power and space as their primary constraints when designing datacenters. Given Arm’s roots in the mobile segment, energy efficiency has always been an important metric. With our collective industry experience and focus, the newly-designed Ampere Altra product family has the highest energy efficiency of any datacenter CPU while making no compromises on raw performance.
At Ampere, we are proud of the work we have done with our Ampere Altra product family to build CPUs from the ground up to address these must-have requirements. To learn more about the Altra product family and to participate in our early access program to try our systems, visit https://cascade-stage-572021.amperecomputing.com/altra/