By Dan Sylvester, Manager, Technical Pre-Sales, Western Canada.
A while back I was in San Francisco for VMware Partner Exchange (PEX) and attended a session called “Software Defined Storage” which was all about VMware’s vSAN product, which was recently moved from beta to General Availability. The vSAN is a new VMware product and should not be confused with VMware’s Virtual Storage Appliance (VSA). While the VSA will still continue to be a product for the time being, my understanding is that vSAN will eventually replace it completely. However, we were told that upgrade pricing will be available to upgrade your VSA to a vSAN.
So, what is a vSAN? In a nutshell, it’s a software that you use to combine a bunch of disks in your VMware servers in a highly available manner, so that you don’t need a SAN or NAS to store your VMs. The idea here is to distribute your storage workload across the VMware nodes the same way you distribute your CPU/Memory workload across the nodes today. The reasons VMware invented vSAN are to overcome some of the complexity involved with having a dedicated storage appliance, to allow VMware customers to solve some of their storage performance challenges in a cost effective manner, and to lay the foundations for future VMware feature enhancements which depend on storage.
The reality in the initial release is that VMware recommends vSAN mainly for three workloads, which are:
- Virtual Desktop Infrastructure (VDI)
- Tier 2/3 Virtualized Workloads such as Test/Dev
- Backup and/or as a Disaster Recovery (DR) Target
In order to properly support Backup or DR, vSAN would have to have integration and compatibility with VMware Data Protection, VMware Site Recovery Manager (SRM), and vSphere Replication – which it does. vSAN also works with other VMware features such as: vMotion, High Availability (HA), Distributed Resource Scheduler (DRS), and VMware View. However there are exceptions to what it supports. Integration or support with the following features and products are absent:
- Storage I/O Control (SIOC)
- Storage DRS
- Distributed Power Management
- vCenter Operations Manager (vCOPS)
- vCloud Automation Center (vCAC)
It should be pointed out that the lack of support for SIOC and Storage DRS are due to the fact that the features are not relevant to vSAN.
So, that’s great – you get what it does and doesn’t do. How does it actually work?
vSAN exists and is enabled at the VMware cluster level just like HA, DRS or any other cluster property. Each node that participates in it must have at a minimum one spinning HDD AND one SSD – but there is, however, the concept of a compute-only member. The concept of a compute-only member is possible because you may have a server in the cluster that doesn’t have the minimum of 1 HDD AND 1 SSD installed. The maximum disk configurations today are 30 HDDs + 5 SSDs (35 disks total) per host, and a maximum of 16 hosts, so a vSAN can be pretty huge.
The next logical question for a storage-savvy person would be “What does it use on the back end? iSCSI? NFS? FibreChannel? Infiniband?!” The answer is none of those things. vSAN doesn’t use any open standard file or block protocols, it is a proprietary protocol which uses IP over a VMKernel port to communicate between the nodes. That means we require Ethernet and IP connectivity for vSAN to work, preferably 10GB Ethernet. 1GB is supported, though it may have problems with maintenance operations such as drive rebuilds. Either way, we definitely want to separate this traffic from our regular production server traffic into a dedicated VLAN. For anyone curious, Jumbo Frames are supported and offer a nominal performance increase – VMware is recommending Jumbo Frames on vSAN for green field installs.
The next logical question is how we combine all of these disks into something useable by VMware? The answer is that all of the disks across all of the servers can be combined into a single large pool of storage, which is accessible across all of the hosts in the cluster. This type of topology is called a Redundant Array of Independent Nodes (RAIN). The good thing about this is that we share the performance of all of the HDDs and SSDs in the cluster in a simple way, but the bad thing is that we can only make a SINGLE pool of storage. There is no way for us to physically isolate how many spindles a particular VM has access to. The SSDs are all treated as cache across the nodes and VMware does provide a cache reservation feature in order to tune specific workloads – but as with any VMware resource reservation settings, these should be used sparingly and only to resolve specific issues.
Instead of choosing which datastore a VM would live in, we assign a storage policy to a VM. If we were to change the policy assigned to a VM, all of the attributes assigned to the VM would be changed on the back end. For example: if we had a VM in a storage policy called “Silver” where the “number of failures to tolerate” was set to 1, and we decided to move that VM into another policy called “Gold” where “the number of failures to tolerate” was 2, the vSAN would create a third copy of the VM data across the cluster. Why 3? Because the number of failures to tolerate is how many extra copies of data you want across the cluster (N+1 redundancy).
Obviously this type of redundancy setting can have huge capacity implications across the cluster. For example, if we had 30TB of raw capacity in the storage pool and we had to tolerate at least 1 host failure, that really means we only have 15TB of useable capacity (minus any other overhead in the vSAN). If we had to tolerate at least 2 host failures, we’d be down to 10TB useable (minus any other vSAN overhead). The maximum is to tolerate 3 host failures today.
That being said, the sizing of a vSAN solution is still as complicated as any storage solution today. We need to factor in Hosts, number of disks, number of VMs, number of failures to tolerate, and other metrics I didn’t even get into during this article, such as number of vSAN disk groups, number of disks per disk group, number of Members per VM, etc. There are a few limitations imposed by vSAN that you should be aware of:
- vSAN was designed to use up to 10% of the host CPU, but could be more in production today as feedback from the field comes in
- Only 2TB VMs are supported today
- The recommended SSD:HDD ratio is 10% SSD for the projected used (as opposed to useable) capacity in the pool. This is changed from 10% SSD of the useable capacity in the beta
- 100 VMs / node is a recommended (soft) limit from VMware
- You do not have to add disks symmetrically across all the nodes, you can have different numbers of SSDs/HDDs in every node if you choose to
I would encourage anyone looking into this technology to reach out to myself or any of the Scalar vSAN experts in order to help properly size a solution for their environment.