Virtual-hiking

Thursday, September 10, 2015

Cloud Foundry Setup on Nutanix

After 3 months at Nutanix, I’ve already seen customers realizing the value in consolidating their hardware stack. They want to focus on their platform of choice and spend less time chasing the exponential problem of aligning the perfect hardware and software matrices. Now, what the platform of choice is (or Platform-as-a-Service) can vary widely.

Consistent with other technical doctrines, there is still a lot of separation in how customers regard and evaluate what actually constitutes a PaaS. I would wholly agree with customers falling into two categories, i.e. a “Structured and Unstructured PaaS” dichotomy I first saw published by Brian Gracely.

Choosing either type, a structured or turnkey PaaS vs a build and customize PaaS, indicates a desire to spend more time on development than ops. I spoke about operationalizing containers at both the Hadoop Summit in San Jose this summer with Mesos and Myriad:
https://www.youtube.com/watch?v=FAxmal6ozLY
and at VMworld last month in #CNA4725 when I spoke about Mesos with Marathon(and Docker) as another potential platform. Replays available hopefully from vmworld.com but will require a login. In future articles I will walkthrough deploying Mesos, Kubernetes, and other potential developer platforms on a given Nutanix cluster.

The quickest way to get started from deploying a PaaS in your Nutanix environment is to download and setup Pivotal's Cloud Foundry which I will walkthrough below. PCF is arguably the best example of a turnkey PaaS today as it comes with the Ops Manager tool for very minimal, straightforward deployment and configuration of the Pivotal Elastic Runtime (the primary PaaS environment) as well as supplemental services for SQL and NoSQL, all available from: https://pivotal.io/platform.

Just like the storage and management layers are ubiquitous and IO accelerated across the cluster by Nutanix for simplicity and scalability, the communication, scheduling, load-balancing, and logging of app services is handled by the PaaS management layer.

For the quickest out-of-the-box experience today, setup of Pivotal Cloud Foundry is really easy:

· Make sure your Nutanix cluster is imaged with vSphere 5.5 or 6.0.

· Upload the vCenter Server Appliance (directions for 5.5 and 6.0) to one of the nodes and initialize it, or if you already have vCenter up and running, you can go straight to the next step.

· Download the Pivotal Ops Manager ova and Elastic Runtime from http://network.pivotal.io. You may also download additional service components for later like Datastax Cassandra or MySQL. (Pivotal account required, but does not require purchase to evaluate.)

· Upload the Pivotal Ops Manager ova to vCenter and give it a name, cluster to be deployed on, and network address settings.

· Log into the Ops Manager IP in a web browser and give the admin user a name and password.

Run the Ops Manager configuration. It will ask for a vSphere admin user credentials, the datacenter name and cluster name. You’ll also need a VM network port group name and range of IP addresses that you want to include or exclude for the individual VMs. More detailed requirements here: http://docs.pivotal.io/pivotalcf/customizing/requirements.html

· You will need at least 1 wildcard domain (2 recommended) to assign to the environment for an apps and system domain so that these resolve to the HAproxy IP address(es). The method will depend on your DNS server of choice, but basically any *.apps.yourdomain.com or *.system.yourdomain.com subdomain should resolve to the load-balancer of choice (HAproxy by default) where it can then be resolved internally by Cloud Foundry. If this is not pre-created before trying to configure the Elastic Runtime piece, you will get an error and the installation will likely fail around the smoke tests run for validation.

· Upload and configure the Pivotal Elastic Runtime. At a minimum, the Cloud Controller and Security line items will need additional configuration. You may configure the HAproxy or custom load-balancer piece for your environment if you prefer Nginx or something else.

· After the installation and validation is complete, you should have all you need to start playing around with Pivotal Cloud Foundry on your Nutanix cluster. You may also upload additional services for your apps like Cassandra or MySQL:

· In order to login interactively, you can copy the Admin credentials from within the Ops Manager UI, click on the Elastic Runtime component and the Credentials tab, then scroll to the UAA heading and Admin row for its current password.

· From a command prompt, you can use the cf login command and push your first app. A helpful blog to using these commands is here:

http://blog.pivotal.io/labs/labs/worlds-smallest-iaas-part-4-hello-world

From there, in Cloud Foundry, you can create more Orgs and Spaces, set quotas and focus on deploying apps that scale on your Nutanix infrastructure. Another interesting project to play with in your deployment is the Chaos Lemur, the Cloud Foundry version of the Chaos Monkey to simulate targeted failures and determine the resiliency and availability of the platform in your environment.

http://blog.pivotal.io/pivotal-cloud-foundry/products/chaos-lemur-testing-high-availability-on-pivotal-cloud-foundry

In the next part of this series, I will be working on how to deploy Cloud Foundry on the Nutanix Acropolis environment.

Additional links:
http://wikibon.com/cloud-native-application-platforms-structured-and-unstructured/

http://events.linuxfoundation.org/sites/events/files/slides/Cloud_Foundry_and_OpenStack.pdf

Tuesday, September 8, 2015

An Introduction to Next-Gen Apps on Nutanix

I spent the first decade of my career doing managed and professional IT services around SAN and NAS for EMC, and I remember rigorously checking the EMC compatibility matrix to ensure an environment was ready to go before it was even built in the datacenter. But, did that actually guarantee no issues?

Of course not. There were still plenty of support calls filed—from lack of consistency in the environment, to firmware issues, to independent hardware failures that still incurred faults in other parts of the solution. Part of a project sign-off involved getting a HEAT report, a scripted check against the EMC support matrix, that didn’t show any mismatches or configuration issues. Then came E-lab advisor and many other iterations trying to solve the interoperability problem, but they were fundamentally unable to outpace the exponential growth of an HCL for a best-of-breed approach. Opposite this perspective, you have the undeniable acceleration of public cloud providers where you only pay for a virtual form factor. The underlying hardware is (and should be) irrelevant to what you, the customer, concentrates on—the software you want to build.

Customers have an abundance of software stacks to deliver, from traditional web/app/database platforms to more loosely coupled platform components designed for rapid iteration. The expectation of quick and constant evolution in any given constituent component at any given time is, in my opinion, the defining characteristic of the next generation of app environments, or “cloud-native apps”. For a far more rigorous rubric and definition, see http://12factor.net/. I’ve seen firsthand with Hadoop and HPC environments as customers evaluate virtualization and try to decide whether to go with a siloed bare-metal approach, internal virtualization, or a service provider.

If you take the evolution of Hadoop with regards to Big Data for example, traditionally product management, marketing or R&D business units would provide input for a data warehouse with arbitrary expectations set a year or two in the future, and the DBAs would design for that without the same stepping-model insight that you only get with experience. Compare that to HPC programmers, who may be building and tuning code for hardware that hasn’t even hit a datacenter floor yet, trying to optimize compilers for potentially theoretical working sets and hardware-accelerated solutions. In HPC and Hadoop, it has been very exciting to witness a shift in perspective. Customers are able to learn and scale their approach constantly. This gives them more options to experiment and grow along the way because their business goals and technical roadblocks are always evolving as well.

Nutanix aims to give these environment owners more time to focus on their specialty and less on infrastructure as more than Yet-Another-Hyperconverged-Vendor by:

· A distributed management layer across the cluster for resiliency and durability of meta-data. This also becomes the distributed endpoint for API calls and stacking of higher-level services. A quickly changing environment means a lot of API interaction, so this by necessity is fault-tolerant and without bottlenecks.

· A distributed logical storage space for performance, availability, and durability. At the same time the storage pool is a singular abstraction for transient and persistent data across any VMs, containers, or applications (or app-building platform).

While simplifying the management and storage layers, customers are allowed to choose:

· Their virtualization hypervisor and tooling available.

· Their hardware form factors from Nutanix and Supermicro and Dell

Holistically, the Nutanix platform is designed to support all of these ideals to minimize bespoke architectural designs and provide straightforward manageability and scalability. In the next of this series of blog posts I will review deploying Pivotal Cloud Foundry on Nutanix, here:
http://virtual-hiking.blogspot.com/2015/09/cloud-foundry-setup-on-nutanix.html

Saturday, August 29, 2015

My VMworld 2015 Schedule

I am heading to San Francisco in a few hours for VMworld 2015. I will only be presenting at a single session this year:

CNA4725 - Scalable Cloud Native Apps with Docker and Mesos. Weds 8:30am-9:30am

After initially thinking I had been rejected for all of my sessions, I consider myself fortunate to have gotten any speaking sessions and happy that I get to speak about something that I've been spending a lot of time on recently. Two other sessions from Nutanix employees:

SDDC6827 - Nutanix Industry Panel including Hallmark Business Connections
STP6311 - Datacenter Battles: Hyperconvergence vs 3-Tier Infrastructure

I also would highly recommend some High Performance Computing sessions. The first run by my former vHPC partner in VMware's Office of the CTO, Josh Simons and the second with Mark Achtemichuk who is the performance guru for VMware's Tech Marketing group:

CTO6454 - Delivering Maximum Performance for Scale-out Applications With ESX 6
VAPP5724 - Extreme Performance Series: BCA High Performance Panel

If anyone wants to Hadoop, HPC, or cloud-native platforms, please come find me at the Nutanix booth #1729.

Thursday, August 6, 2015

Platforms for CI/CD: Cloud Foundry, Mesos, and Kubernetes

Working with customers on next-gen platforms, and watching the container ecosystem evolve, I have been able to see what gets attention and what is ubiquitous across the platforms. My talk at Hadoop Summit where I advocated Mesos and a platform for building platforms was recently published:
https://www.youtube.com/watch?v=FAxmal6ozLY

Spending more time with customers since then, I have seen the arguments evolve around Mesos or Kubernetes or Kubernetes on Mesos or something else driven by proponents of each side. This kind of debate is important as everyone shares the same goal of encapsulating and scheduling the next generation of decoupled yet collaborative app architecture (or the overloaded term microservices?). I have had to get into more nuanced conversations about where each platform differentiates itself and what parts of each are most conducive to a modern data pipeline. Extra complexity and layers only make a complicated system more unreliable.

A modern data pipeline is, in my mind, a very complex system in itself and just like the data flowing through, it must constantly adapt and evolve to drive useful results. Shown below is a slide given from a Chris Mutchler's and my presentation from VMworld 2014 that gives an (albeit very busy) illustration of the different components that could comprise a modern data pipeline.

So besides providing the largest, most flexible resource pool, which of these platforms supports the most straightforward method of change and injection of new updates to a running service? Specifically, how does each platform choose to endorse continuous integration and/or continuous delivery of new updates?

A model for this story, as for most cloud-native developments, begins with Netflix:
http://highscalability.com/blog/2011/12/12/netflix-developing-deploying-and-supporting-software-accordi.html

Some key principles of the Netflix article, including but not limited to:

Launch new "canary" instances and evaluate health
Every component is behind a load-balancer
Facilitate rolling upgrades and tear-down of old running components

Fast-forward a few years with Docker and containers as the new "unit of work". Developers inject self-contained code effectively compiled and built with its environment into a container. This by itself is more reliable and scalable since the local app dependencies are inherent instead of assumed in the broader operating environment.

Cloud Foundry facilitates blue-green testing reminiscent of Netflix's approach with load-balancing and canary deployment:

https://docs.cloudfoundry.org/devguide/deploy-apps/blue-green.html

Mesos by itself isn't actually doing this since it allows its constituent frameworks to determine rolling updates. For example, in my opinion, a good demonstration of the granularity and abstraction of job updates via the Aurora scheduler:

https://github.com/apache/aurora/blob/master/docs/client-commands.md#updating-a-job

Last but not least, Kubernetes uses the replication controller to abstract updates to a given pod when a new Docker image is incorporated. This innate Replica service handles the ongoing orchestration of pushing new pod templates and cleaning up the old instances:

https://github.com/GoogleCloudPlatform/kubernetes/tree/master/docs/user-guide/update-demo

I'm sure there are more examples out there and the number of container platforms will probably continue to grow, but the flexibility and granularity of will be key differentiators in my opinion. Because scaling out and perpetual updates appear to be a given for these new cloud-native apps, scheduling patterns and what system builders find most adaptive and reliable should determine adoption.

Monday, March 9, 2015

How Can Virtual Performance Beat Native Performance?

Since the inception of virtualization, it has been accepted that some amount of overhead gets added to any workload running in a virtual machine. Over time, VMware's focus has increased from consolidating workloads to handling mission-critical or tier-1 apps and moving on to high performance apps. When developing high-performance apps in any context, it is key for the workload to leverage the native hardware acceleration wherever possible. Scroll down for my "Virtualizing HPC and Latency-Sensitive Apps on vSphere Primer".

It is important to size the workload appropriate to the physical platform, matching physical to virtual NUMA node awareness and alignment for processes and memory, matching threads to cores, leveraging offloads of processing in the IO pipeline wherever possible, and that's just to start. From that, for the performance of a virtual machine to approach or beat a workload running on baremetal, it is key for the virtualization platform to expose as much of those feature-sets as it can. Perhaps most importantly the hypervisor should get out of the way whenever possible to let the VM's processes run without interruption.

However, whether you want to optimize for a latency-sensitive workload or a high-throughput workload will depend on how much you want the hypervisor to get out of the way. For high-throughput, you may want to let the VMs run as much as possible without interrupts. However, this may cause additional latency in the IO-path. For a latency-sensitive workload, you may want to disable interrupt coalescing, but you are deliberately servicing IO instead of focusing on compute. Remember that since you are trading off throughput and parallelization for latency, the settings and recommendations below should be evaluated and tested thoroughly to understand if they fit the workload. If you have a workload that prescribes both high throughput and low latency, you may have no choice but to adopt VMDirectPath or SR-IOV which have their own set of tradeoffs listed in the docs here: http://pubs.vmware.com/vsphere-55/index.jsp?topic=%2Fcom.vmware.vsphere.networking.doc%2FGUID-BF2770C3-39ED-4BC5-A8EF-77D55EFE924C.html

Along the way of VMware's hypervisor development, you could say there have been plenty of milestones that contribute to its performance-honed characteristics and features. A good yet not definitive list:

· VMware's first product was Workstation, but ESX 1.0 was its first Type 1 hypervisor

· ESX 3 introduced a service console VM where previously ESX had to statically assign IO devices

· ESX 4.0 when VMDirectPath was introduced

· ESXi 4.1 was when the service console VM was eliminated

· ESXi 5 where the hypervisor was rewritten to become the best platform to run Cloud Foundry, an entirely new set of requirements around very fast provisioning and power-on of large numbers of virtual machines. Arguably this is where ESXi really learned how to get out of the way of a workload for near-native performance for most cases.

· ESXi 5.1 introduced some of the latency sensitive tuning primitives but needed advanced options to set these for the vmkernel

· ESXi 5.5 built on some of the latency sensitive tuning and granularity of the hypervisor to include a simple checkbox to indicate a VM as latency-sensitive

In addition, each major and minor version of ESX(i) has included hardware updates to include support for the latest and greatest chipsets from Intel and AMD, NICs and storage adapters. These advancements were accompanied by updates to the virtual hardware of a VM and the VMware Tools or in-guest set of drivers recommended for best performance and manageability.

By allowing the best translation of the native functionality and offloading of the underlying hardware, ESXi gets VMs to near-native performance for most throughput driven workloads. However, there are cases where benchmarks show that virtualized workloads can exceed the performance characteristics of their native equivalent. So how is this possible? To put another way, can the hypervisor be a better translation, management and scheduling engine for the hardware than an OS kernel itself? Why not keep the workload physical?

A workload running as baremetal will of course have direct access to all the hardware on that server, however, the sizing you will have to accept at that point is the size of the total CPUs and memory on that server and the sizing is static. However, with distributed systems or cloud-native apps or platform 3 apps, it is rarely about a single server. It is more about the aggregate performance across tens or hundreds or, for some, even thousands of servers, instead of the one server. In a discrete and multi-tenant (or "microservices" if you want) architecture, the requirements for dynamic and flexible sizing in aggregate is a natural fit for virtualization.

Virtualizing HPC and Latency-Sensitive Apps on vSphere Primer

Understanding the associated workload is critical, of course, in order to size the VM optimally. For traditional IT workloads, it was more likely to have to deal with oversized VMs. However, for high-performance apps, ESXtop can aid in determining if the VM is constrained by CPU, memory, storage or network. My ESXtop checklist is this kb article from VMware here: http://kb.vmware.com/kb/2001003.

For platform configuration checklist for high-performance workloads, see here, http://www.vmware.com/files/pdf/techpaper/VMW-Tuning-Latency-Sensitive-Workloads.pdf:

· Make sure the BIOS is updated and set for maximum performance. Your mileage may always vary due to the BIOS and firmware configuration of different components in the hardware. Even virtualized, these issues can still cause performance to lag.

· C-states support should be disabled.

· Power management in the BIOS should be disabled.

· Use the latest stable ESXi version that you can. See the ESXi generation improvements above. The caveat being that the drivers for the hardware may differ per different ESXi versions which can cause poor or inconsistent performance results. Throughput testing when adding new drivers is definitely recommended.

· Size VMs to fit within a NUMA node of the chipset. This will depend on the processor generation. For example, see here for Dell's recommendations for Haswell: http://en.community.dell.com/techcenter/high-performance-computing/b/general_hpc/archive/2014/09/23/bios-tuning-for-hpc-on-13th-generation-haswell-server Also here's an older but still relevant article describing the NUMA affinity and migration techniques of the vmkernel: http://blogs.vmware.com/vsphere/2012/02/vspherenuma-loadbalancing.html

· When sizing VMs, do not plan on overprovisioning any hardware component as any bottleneck will typically determine the overall performance and the aggregate performance will be less than optimal.

· Choose the latest stable generation OS that you can. Later OS versions typically have more optimized hardware interrupt handling mechanisms. For example performance tuning recommendations for RHEL 7 (support subscription required) or for RHEL 6.

· Use the latest version of VMware Tools which includes the latest paravirtualized drivers such as PVSCSI and VMXNET3.

· Use PVSCSI when you can, but be careful of high throughput issues with default queue settings. For example, this kb article describes where the default queue depth for PVSCSI may be insufficient: http://kb.vmware.com/kb/2053145

· Use VMXNET3 when you can and pay attention to how much you can offload to the hardware with regards to LRO, RSS, multiqueue and other NIC-specific optimizations. Some relevant VMware kb articles here: http://kb.vmware.com/kb/2020567 and http://kb.vmware.com/kb/1027511 http://www.vmware.com/files/pdf/VMware-vSphere-PNICs-perf.pdf

· Low throughput on UDP in a Windows VM is another case to consider if your application IO depends on it. You may need to modify the vNIC settings: http://kb.vmware.com/kb/2040065

· Overprovisioning the network capacity can be significantly trickier than sizing for CPU and memory, especially if using NFS for storage network traffic. It's key to understand whether this will be storage accessed by the vmkernel from SAN or NAS sources or by the VM itself. Use Network IO Control to enable more fair-sharing of network bandwidth. However understand this may cause more interrupts resulting in more overhead switching between VMs so if these are high throughput (compute and memory) VMs then consider placing them on separate hosts: http://www.vmware.com/files/pdf/techpaper/Network-IOC-vSphere6-Performance-Evaluation.pdf

· If you have a particularly latency-sensitive workload, consider using SR-IOV or VMDirectPath. See the latest benchmarks here for Infiniband and RDMA http:/ /www.vmware.com/files/pdf/techpaper/latency-sensitive-perf-vsphere55.pdf

· For Infiniband workloads, plan on using SR-IOV or VMDirectPath. The latest benchmarks are here: http://blogs.vmware.com/cto/running-hpc-applications-vsphere-using-infiniband/ http://blogs.vmware.com/cto/hpc-update/

· For Nvidia GPGPU (general purpose or non-VDI) workloads, plan on using VMDirectPath. With vSphere 6, Nvidia vGRID (think of SR-IOV for GPUs instead of NICs) vGRID will be supported by VMware Horizon View so hopefully vGRID support for GPGPU workloads will be available soon. More details from Nvidia here: http://blogs.nvidia.com/blog/2015/02/03/vmware-nvidia-gpu-sharing/

· For Xeon Phi, there is no support today on vSphere 5.5. This feature, the MIC or specialized "Many Integrated Core", is ignored by the hypervisor.

You'll also need to review in-guest OS settings and performance tuning variables which I won't detail in this post. And finally, you'll need to consider the application-specific tuning and optimizations. Given all of that, it should be possible to achieve better than native performance for certain high-performance workloads. For specific examples, see the excellent write-ups by VMware's performance team.

Hadoop on vSphere 6.0:

http://blogs.vmware.com/performance/2015/02/virtualized-hadoop-performance-vsphere-6.html

Redis on vSphere 6.0:

http://blogs.vmware.com/performance/2015/02/scaling-redis-performance-docker-vsphere-6-0.html

Links:

http://en.community.dell.com/techcenter/high-performance-computing/b/general_hpc/archive/2014/09/23/bios-tuning-for-hpc-on-13th-generation-haswell-servers

http://www.vmware.com/files/pdf/techpaper/latency-sensitive-perf-vsphere55.pdf

http://blogs.vmware.com/vsphere/2012/02/vspherenuma-loadbalancing.html

http://blogs.vmware.com/performance/2015/02/scaling-redis-performance-docker-vsphere-6-0.html

http://blogs.vmware.com/performance/2015/02/virtualized-hadoop-performance-vsphere-6.html

http://blogs.vmware.com/vsphere/2012/02/vspherenuma-loadbalancing.html

http://pubs.vmware.com/vsphere-55/index.jsp?topic=%2Fcom.vmware.vsphere.networking.doc%2FGUID-BF2770C3-39ED-4BC5-A8EF-77D55EFE924C.html