Delving into the APEX Cloud Platform for Microsoft Azure

June 14th at 5:26pm Kenny Lowe

At Dell Technologies World 2023, Dell and Microsoft together announced the APEX Cloud Platform for Microsoft Azure, and ever since I have been deluged with requests for more information and a real thirst and desire to delve deeper. Following Scott Hanselman's principle of conserving keystrokes, this blog will aim to (hopefully) clearly lay out the why and the what of this new platform.

Before we can delve into the new though, we first need to level set on where we are and how we got here. The APEX Cloud Platform (ACP) for Azure builds on a rich heritage of collaboration and innovation between Dell and Microsoft which started with the Cloud Platform System (CPS) in 2014, andd has iterated through many evolutions since then through the Dell Hybrid Cloud System (DHCS), Storage Spaces Direct (S2D) reference architectures, to Ready Nodes, to a Validated Solution, and finally to the current in market offering which is the Dell Integrated System for Microsoft Azure.

Through this near decade of experience, we have always worked extremely hard to be continually improving our customer experience within these platforms - every evolution and next step forward needs to enhance the deployment, day to day management, and support experience. We're proud of where we've come from and what we've achieved, today delivering the most deployed Azure Stack HCI Integrated System in the market.

The Integrated System program has been the premier way to deliver Azure Stack HCI capabilities, requiring not insignificant investments from OEMs including implementation of full stack lifecycle management capabilities, Windows Admin Center integrations, joint support motions and ticketing capabilities with Microsoft, and a bunch more. Because there's investment required to deliver those capabilities that not all OEMs are prepared to make, there's a subservient program available as well, the Validated Node program, which requires validation of the Azure Stack HCI OS on the hardware, plus guarantee of continued hardware support for the product.

In these programs today there are 4 OEMs who provide Integrated Systems, and a further 29 providing Validated Nodes.

We are moving a leap beyond what these programs can deliver.

In a blog published on May 30th, Microsoft noted:

With one of our key partners, we recently announced the Dell APEX Cloud Platform for Microsoft Azure, the first Azure Stack HCI solution of its kind, delivering integration and capabilities beyond the Validated Nodes and Integrated Systems in the Azure Stack HCI portfolio today.

This is a powerful statement from Microsoft, and indeed they go on to note:

Dell APEX Cloud Platform for Azure is the result of extensive engineering collaboration between Microsoft and Dell. It natively integrates with Azure Arc and Azure Stack HCI to provide a turnkey experience to customers, including simplified deployment, seamless management, and orchestration capabilities for hyperconverged infrastructure deployments. Building on Dell’s track record of delivering market-leading integrated infrastructure solutions with extensive software-driven management and orchestration (M&O) automation, this comprehensive solution enables IT admins and operators to focus less on managing the day-to-day operational tasks and more on innovation and achieving desired business outcomes.

This is and always has been our overriding goal - to deliver a solution which best frees up IT admins to focus on the workloads they can run on the platform and the outcomes they can achieve from it, not on the day to day management. With the APEX Cloud Platform we have built on years of existing Dell innovation across multiple ecosystems, and brought it to bear in one place in the new APEX Cloud Platform. In this article I'm going to focus in on three specific areas: Deployment, Management, and Storage.

Deployment

No one invests in a platform primarily because of its deployment experience - this is a one and done activity, right?

Well yes and no. The initial deployment process and experience is actually super critical, because it's what ensures you are set up for future success... or not. Delivering a robust, consistent, repeatable, and automated deployment experience which is guaranteed to be the same every time is the way to achieve this, and this is something we have a huge amount of experience in another of our platforms - VxRail.

VxRail is the single most deployed HCI platform in the world today across all OEMs, platforms, and ecosystems. Throughout VxRail's history, a huge amount of engineering effort has gone into ensuring the deployment experience is as described above - robust, repeatable, automated, consistent. With the APEX Cloud Platform we are bringing that deployment experience (in all senses) to the Azure Stack HCI world for the first time. Deployment is now an inbox experience, with a deployment wizard built into the platform which you can connect to via IP and walk through from first boot. The deployment experience leverages the best of Dell deployment capability coupled with new Microsoft deployment tooling in a seamless and integrated manner to provide the most straightfoward and automatable Azure Stack HCI deployment to date. I say new Microsoft deployment capabilities, but actually if you delve into and explore them they are built from the heritage Azure Stack Hub deployment bits, which is awesome because that is also an automated, repeatable, robust deployment experience.

The deployment experience can either be walked through in the inbox GUI, or you can upload a prepopulated JSON template and hit go. Either way, deployment automation will take you to a defined known good state whereupon you can dive in and start using the platform, deploying workloads, getting value.

Step by step you are walked through the deployment, leveraging the latest and greatest capabilities such as NetworkATC for intent based network deployment, until the deployment is complete and you can login to Windows Admin Center to manage your freshly deployed cluster.

Management

Once the initial deployment is done, the day to day operations rear their head - how do I manage lifecycle management, how do I perform node expansion, how do I monitor and manage the underlying physical infrastructure. Again, all of these elements are what I'd term solved problems in the VxRail ecosystem, and we are again bringing that innovation to bear in the APEX Cloud Platform world. Within VxRail there's a VM which runs what we call VxRail HCI System Software. This VM hosts a whole bunch of software elements that Dell has created to integrate and automate the underlying platform, the ESXi host OS, the management pane through vSphere, and other external Dell properties like CloudIQ.

We are bringing these same software elements to the Azure Stack HCI world, and in the APEX Cloud Platform, the software is called 'APEX Cloud Platform Foundations Software'. This will leverage the same code and capabilities to provide experiences customers are already used to and rave about into the Azure Stack HCI ecosystem for the first time, integrating and automating the platform, and surfacing its capabilities into Windows Admin Center via a new WAC Extension.

We will delve deeper into these capabilities in a future blog, but for now, just from a lifecycle management perspective, we've significantly advanced the experience and capabilities. Intelligent lifecycle management functionality can now automatedly update a cluster with prevalidated, pretested Dell and Microsoft software and firmware updates, ensuring that the APEX Cloud Platform first updates to a known good state, and then remains in a continuously validated state.

Once update is complete, the APEX Cloud Platform Foundations Software will continuously monitor the state of the cluster, and if it deviates from its known good state, it will flag up a compliance warning in Windows Admin Center to let you know that a drift in configuration has been detected, allowing you to remediate it.

Storage

Storage in Azure Stack HCI is pretty awesome. The inbox Storage Spaces Direct (S2D) functionality is provided at no additional cost (no per TB or per IO charges), and delivers extremely high performance. We've previously written about performance here and here, where we show that we can breach 1m 4k 100% read IOPS on a 2 node, single socket, half depth cluster with just 8 SAS SSDs per node. Amazing.

It's also true however that there are some scenarios that we can't cover with S2D. For example where you have a small compute and large storage requirement - a hyperconverged infrastructure doesn't allow you to scale compute and storage independently, so to date you've had to buy unnecessary compute to meet the storage needs. Similarly there are scale boundaries within Azure Stack HCI - a 16 node cluster is the maximum size, and while this can fit a big chunk of storage, we've seen instances where it just isn't enough.

As a core part of the close co-engineering effort referenced in the Microsoft blog I posted earlier, Dell and Microsoft have worked together to enable the fully supported addition of Dell Software Defined Storage in a disaggregated setup to extend the storage capabilities of Azure Stack HCI. Not just for VM workloads, but with a CSI driver validated for CBL-Mariner for AKS workloads as well. This I think shows some of the depth and closeness of collaboration between Dell and Microsoft here, where we together identify boundaries within the platform, and then leverage Dell platform capabilities with Microsoft software to bridge the gap.

I'm super excited to launch this new platform later this year, and if what I've touched on so far sounds good to you, know that we're just getting started, and there is far more awesomeness to come around ecosystem integrations, hardware form factors, and a whole bunch more.

Kenny Lowe