Quantcast
Channel: Jose Barreto's Blog
Viewing all 182 articles
Browse latest View live

Windows Server 2012 File Server Tip: Run the File Services Best Practices Analyzer (BPA)

$
0
0

Windows Server 2012 includes a built-in mechanism called Best Practices Analyzer (BPA) to check your configuration and make sure everything is set to the proper values. These set of rules, which come in specific sets for each role you install, can be run through Server Manager or also via PowerShell.

For the Windows Server 2012 File Services role, the BPA includes a number of rules, including 99 rules for SMB. Here are some of the rules included in the SMB portion of the File Services BPA:

  • Scoped shares should be continuously available
  • Scaleout shares shouldn't have FolderEnumerationMode = AccessBased
  • Scaleout shares should have CachingMode = None
  • Shares should have CATimeout value >= 25
  • SMB server configuration should be consistent across cluster node

You should definitely run the BPA when you have a chance. Here is a series of PowerShell examples on how to do it:

1) Find which BPA Models are available:

PS C:\> Get-BpaModel | Select Id

Id
--
Microsoft/Windows/ADRMS
Microsoft/Windows/CertificateServices
Microsoft/Windows/ClusterAwareUpdating
Microsoft/Windows/DHCPServer
Microsoft/Windows/DirectoryServices
Microsoft/Windows/DNSServer
Microsoft/Windows/FileServices
Microsoft/Windows/Hyper-V
Microsoft/Windows/LightweightDirectoryServices
Microsoft/Windows/NPAS
Microsoft/Windows/RemoteAccessServer
Microsoft/Windows/TerminalServices
Microsoft/Windows/UpdateServices
Microsoft/Windows/VolumeActivation
Microsoft/Windows/WebServer

2) Run the File Services BPA:

PS C:\> Invoke-BpaModel Microsoft/Windows/FileServices

ModelId           : Microsoft/Windows/FileServices
SubModelId        :
Success           : True
ScanTime          : 11/15/2012 10:48:02 PM
ScanTimeUtcOffset : -08:00:00
Detail            : {FST2-FS1, FST2-FS1}

3) View a summary of the BPA results by Severity


PS C:\> Get-BpaResult Microsoft/Windows/FileServices | Group Severity

Count Name                      Group
----- ----                      ----- 
   96 Information               {Microsoft.BestPractices.CoreInterface.Result, Microsoft.BestPractices.CoreInterface... 
    3 Warning                   {Microsoft.BestPractices.CoreInterface.Result, Microsoft.BestPractices.CoreInterface...

4) View the details for all results with “Warning” severity level :

PS C:\> Get-BpaResult Microsoft/Windows/FileServices | ? Severity -eq "Warning"

ResultNumber : 3
ResultId     : 1041159855
ModelId      : Microsoft/Windows/FileServices
SubModelId   : SMB
RuleId       : 3
ComputerName : fst2-fs1
Context      : FileServices
Source       : fst2-fs1
Severity     : Warning
Category     : Configuration
Title        : Short file name creation should be disabled
Problem      : In addition to the normal file names, the server is creating short, eight-character file names with a
               three-character file extension (8.3 file names) for all files.
Impact       : Creating short file names in addition to the normal, long file names can significantly decrease file
               server performance.
Resolution   : Disable short file name creation unless short file names are required by legacy applications.
Compliance   :
Help         :
http://go.microsoft.com/fwlink/?LinkId=165013
Excluded     : False

ResultNumber : 86
ResultId     : 2816034575
ModelId      : Microsoft/Windows/FileServices
SubModelId   : SMB
RuleId       : 86
ComputerName : fst2-fs1
Context      : FileServices
Source       : fst2-fs1
Severity     : Warning
Category     : Configuration
Title        : Scaleout shares should have CachingMode = None
Problem      : At least one scale out share doesn't have CachingMode = None.
Impact       : Scale out shares having a CachingMode value other than 'None' isn't supported.
Resolution   : Set scale out shares' CachingMode to None.
Compliance   :
Help         :
http://go.microsoft.com/fwlink/?LinkId=248013
Excluded     : False

ResultNumber : 92
ResultId     : 4167795643
ModelId      : Microsoft/Windows/FileServices
SubModelId   : SMB
RuleId       : 92
ComputerName : fst2-fs1
Context      : FileServices
Source       : fst2-fs1
Severity     : Warning
Category     : Configuration
Title        : Enable Checksum Offload on a network adapter
Problem      : Some network adapters are capable of Checksum Offload, but the capability is disabled.
Impact       : Windows system performance may be degraded since TCP/IP checksum calculations are not being offloaded
               from the CPU to the network adapter.
Resolution   : Enable Checksum Offload with PowerShell cmdlet: Enable-NetAdapterChecksumOffload, or in the network
               adapter Advanced Properties.
Compliance   :
Help         :
http://go.microsoft.com/fwlink/p/?LinkId=243160
Excluded     : False

Note: At this point, some of the extended help on the web for the new Windows Server 2012 BPA rules is still being published. For some of them, at this point, you might be redirected to a page saying “Windows Server Future Resources”.


Windows Server 2012 File Server Tip: Use PowerShell to find the free space on the volume behind an SMB file share

$
0
0

A while back, I showed how to use PowerShell V2 and our old SMB WMIv1 object to explain how to find the free space behind a file share (essentially the free space for the volume that contains the file share). That post is available at http://blogs.technet.com/b/josebda/archive/2010/04/08/using-powershell-v2-to-gather-info-on-free-space-on-the-volumes-of-your-remote-file-server.aspx. While that post was a good example of how to construct a more elaborate solution using PowerShell, it was a little complicated :-).

Now, with Windows Server 2012, PowerShell V3 and SMB PowerShell, things got much simpler. I can essentially do the same thing with a simple one-liner.

For instance, to see the free space on the volume behind a specific share named TEST, you can use

Get-Volume -Id (Get-SmbShare TEST).Volume

To list the volumes for all shares on a specific server, you can use:

Get-SmbShare | ? Volume -ne $null | % { $_ | FL ; Get-Volume -Id $_.Volume | FL }

Note that you can also execute those remotely, pointing to the server where the shares are located:

Get-Volume  -CimSession Server1 -Id (Get-SmbShare TEST -CimSession Server1).Volume

Get-SmbShare -CimSession Server1 | ? Volume  -ne $null | % { $_ | FL ; Get-Volume -CimSession Server1 -Id $_.Volume | FL  }

Here is a complete example, with output. First a simple query to find the information for a volume behind a share

PS C:\> Get-Volume -Id (Get-SmbShare VMS1).Volume

DriveLetter       FileSystemLabel  FileSystem       DriveType        HealthStatus        SizeRemaining             Size
-----------       ---------------  ----------       ---------        ------------        -------------             ----
I                                  NTFS             Fixed            Healthy                  78.85 GB           100 GB

PS C:\> Get-Volume -Id (Get-SmbShare Projects).Volume | Select *

HealthStatus          : Healthy
DriveType             : Fixed
DriveLetter           :
FileSystem            : CSVFS
FileSystemLabel       :
ObjectId              : \\?\Volume{20795fea-b7da-43dd-81d7-4d346c337a73}\
Path                  : \\?\Volume{20795fea-b7da-43dd-81d7-4d346c337a73}\
Size                  : 107372081152
SizeRemaining         : 85759995904
PSComputerName        :
CimClass              : ROOT/Microsoft/Windows/Storage:MSFT_Volume
CimInstanceProperties : {DriveLetter, DriveType, FileSystem, FileSystemLabel...}
CimSystemProperties   : Microsoft.Management.Infrastructure.CimSystemProperties

Now a more complete query, showing all shares starting with VMS and information on the volume behind them:

PS C:\> Get-SmbShare VMS* | ? Volume -ne $null | % { $_ | FL ; Get-Volume -Id $_.Volume | FL }

Name        : VMS1
ScopeName   : FST2-FS
Path        : I:\VMS
Description :

DriveLetter     : I
DriveType       : Fixed
FileSystem      : NTFS
FileSystemLabel :
HealthStatus    : Healthy
ObjectId        : \\?\Volume{b02c4ba7-e6f1-11e1-93eb-0008a1c0ef0d}\
Path            : \\?\Volume{b02c4ba7-e6f1-11e1-93eb-0008a1c0ef0d}\
Size            : 107372081152
SizeRemaining   : 84665225216
PSComputerName  :

Name        : VMS2
ScopeName   : FST2-FS
Path        : J:\VMS
Description :

DriveLetter     : J
DriveType       : Fixed
FileSystem      : NTFS
FileSystemLabel :
HealthStatus    : Healthy
ObjectId        : \\?\Volume{b02c4bb1-e6f1-11e1-93eb-0008a1c0ef0d}\
Path            : \\?\Volume{b02c4bb1-e6f1-11e1-93eb-0008a1c0ef0d}\
Size            : 107372081152
SizeRemaining   : 84665225216
PSComputerName  :

Name        : VMS3
ScopeName   : FST2-SO
Path        : C:\ClusterStorage\Volume1\VMS
Description :

DriveLetter     :
DriveType       : Fixed
FileSystem      : CSVFS
FileSystemLabel :
HealthStatus    : Healthy
ObjectId        : \\?\Volume{20795fea-b7da-43dd-81d7-4d346c337a73}\
Path            : \\?\Volume{20795fea-b7da-43dd-81d7-4d346c337a73}\
Size            : 107372081152
SizeRemaining   : 85759995904
PSComputerName  :

Name        : VMS4
ScopeName   : FST2-SO
Path        : C:\ClusterStorage\Volume2\VMS
Description :

DriveLetter     :
DriveType       : Fixed
FileSystem      : CSVFS
FileSystemLabel :
HealthStatus    : Healthy
ObjectId        : \\?\Volume{fb69e20a-5d6a-4dc6-a0e9-750291644165}\
Path            : \\?\Volume{fb69e20a-5d6a-4dc6-a0e9-750291644165}\
Size            : 107372081152
SizeRemaining   : 84665225216
PSComputerName  :

Name        : VMS5
ScopeName   : *
Path        : D:\VMS
Description :

DriveLetter     : D
DriveType       : Fixed
FileSystem      : NTFS
FileSystemLabel : LocalFS1
HealthStatus    : Healthy
ObjectId        : \\?\Volume{58a38e4e-e2fd-11e1-93e8-806e6f6e6963}\
Path            : \\?\Volume{58a38e4e-e2fd-11e1-93e8-806e6f6e6963}\
Size            : 181336535040
SizeRemaining   : 136311140352
PSComputerName  :

Finally, not for the faint of heart, a more complex query from a remote file server which creates a custom result combining share information and volume information:

PS C:\> Get-SmbShare -CimSession FST2-FS2 | ? Volume  -ne $null | % { $R = "" | Select Share, Path, Size, Free; $R.Share=$_.Name; $R.Path=$_.Path; Get-Volume -CimSession FST2-FS2 -Id $_.Volume | % { $R.Size=$_.Size; $R.Free=$_.SizeRemaining; $R | FL }}


Share : ADMIN$
Path  : C:\Windows
Size  : 68352471040
Free  : 44692242432

Share : C$
Path  : C:\
Size  : 68352471040
Free  : 44692242432

Share : ClusterStorage$
Path  : C:\ClusterStorage
Size  : 68352471040
Free  : 44692242432

Share : D$
Path  : D:\
Size  : 181336535040
Free  : 177907777536

Share : Projects
Path  : C:\ClusterStorage\Volume1\SHARES\PROJECTS
Size  : 107372081152
Free  : 85759995904

Share : VMFiles
Path  : C:\ClusterStorage\Volume1\VMFiles
Size  : 107372081152
Free  : 85759995904

Share : VMS3
Path  : C:\ClusterStorage\Volume1\VMS
Size  : 107372081152
Free  : 85759995904

Share : VMS4
Path  : C:\ClusterStorage\Volume2\VMS
Size  : 107372081152
Free  : 84665225216

Share : Witness
Path  : C:\ClusterStorage\Volume1\Witness
Size  : 107372081152
Free  : 85759995904

Windows Server 2012 File Server Tip: New per-share SMB client performance counters provide great insight

$
0
0

Windows Server 2012 and Windows 8 include a new set of performance counters that can greatly help understand the performance of the SMB file protocol. These include new counters on both the server side and the client side.

In this post, I wanted to call your attention to the new client-side counters that show the traffic going to a specific file share. In the example below, we see the performance counters on a computer called FST2-HV1, which is accessing some files on the VMS1 share on a file server cluster called FST2-FS. This is actually a Hyper-V over SMB scenario, where a VM is storing its VHDX file on a remote file share. I started a light copy inside the VM to generate some activity.

 

image

 

This simple set of counters offers a lot of information. If you know how to interpret it, you can get information on IOPS (data requests per second), throughput (data bytes per second), latency (average seconds per request), IO size (average bytes per request) and outstanding IOs (data queue length) . You can also get all this information for read and write operations.  There’s a lot you can tell from these data points. You can see we’re doing about 910 IOPS and we’re doing more writes than reads (648 write IOPS vs. 262 read IOPS). However, we’re doing larger reads than writes (in average, 104KB reads vs. 64KB writes). Reads are faster than writes (5 milliseconds vs. 8 milliseconds). In case you’re wondering, the workload running here was a simple copy of lots of fairly large files from one folder to another inside the VM.

If you’re familiar with the regular disk counters in Windows, you might notice a certain resemblance. That’s not by accident. The SMB client shares performance counters were designed to exactly match the disk counters. This way you can easily reuse any guidance on application disk performance tuning you currently have . Here’s a little table comparing the two sets:

TypeDisk Object Counter SMB Client Shares Counter
IOPS Disk transfers / sec Data Requests/sec
Disk reads / sec Read Requests/sec
Disk writes / sec Write Requests/sec
Latency Avg disk sec / transfer Avg. sec/Data Request
Avg disk sec / read Avg. sec/Read
Avg disk sec / write Avg. sec/Write
IO Size Avg disk bytes / transfer Avg. Bytes/Data Request
Avg disk bytes / read Avg. Bytes/Read
Avg disk bytes / write Avg. Bytes/Write
Throughput Disk bytes / sec Data Bytes/sec
Disk read bytes / sec Read Bytes/sec
Disk write bytes / sec Write Bytes/sec
Queue Length Avg. Disk Read Queue Length Avg. Read Queue Length
Avg. Disk Write Queue Length Avg. Write Queue Length
Avg Disk Queue Length Avg. Data Queue Length

I would encourage you to understand your application storage performance and capture/save some samples after you tune your applications. This way you can build a library of baseline performance counters that you can later use for troubleshooting performance issues.

Updated Links on Windows Server 2012 File Server and SMB 3.0

$
0
0

In this post, I'm providing a reference to the most relevant content related to Windows Server 2012 that is related to the File Server, the SMB 3.0 features and its associated scenarios like Hyper-V over SMB and SQL Server over SMB. It's obviously not a complete reference (there are new blog posts every day), but hopefully this is a useful collection of links for Windows Server 2012 users.

Summaries of SMB 3.0 features in Windows Server 2012

Articles on File Storage for Application Servers (Hyper-V over SMB, SQL Server over SMB)

Articles on SMB Transparent Failover and SMB Scale-Out

Articles on SMB Direct (SMB over RDMA) and SMB Multichannel

Articles on Failover Clustering related to File Server Clusters

Articles on other SMB 3.0 features and capabilities

Windows Server File Server Tips

Private Cloud Solution Architecture

TechNet Radio (includes Video) with Bob Hunt and Jose Barreto

Knowledge Base articles (Support KBs) about Windows Server 2012 SMB 3.0

Protocol Documentation

Older posts and videos

-------

Change tracking:

  • 04/24/2012: Original post
  • 05/01/2012: Update: Added links to two SNW Spring 2012 presentation
  • 05/03/2012: Update: Added links to protocol documentation, blog post on SMB Encryption and private could blog post 
  • 05/18/2012: Update: Added links to SDC presentations, plus blogs on basics of SMB PowerShell and SMB PowerShell
  • 06/13/2012: Update: Added 3 new blog post, one new KB article, one new video link
  • 08/02/2012: Update: Additional blog posts and links to TechEd recordings
  • 08/26/2012: Update: Two additional blog posts
  • 11/27/2012: Update: Added 10 file server tips, SNW Fall 2012 presentations, TechNet Radio links. Moved older posts down.

TechNet Radio series covers Windows Server 2012 File Server and SMB 3.0 scenarios

$
0
0

I have been working with Bob Hunt at the TechNet Radio team to provide a series of webcasts with information about SMB 3.0 and the File Server role in Windows Server 2012.

These are fairly informal conversations, but Bob is really good at posing interesting questions, clarifying the different scenarios and teasing out relevant details on SMB 3.0.

By the way, don’t be fooled by the “Radio” in the name. These are available as both video and audio, typically including at least one demo for each episode.

Here is a list of the TechNet Radio episodes Bob and I recorded (including one with Claus Joergensen, another PM in the SMB team), in the order they were published:

1) Windows Server 2012 Hyper-V over SMB (August 31st)

Summary: Bob Hunt and Jose Barreto join us for today’s show as they discuss Windows Server 2012 Hyper-V support for remote file storage using SMB 3.0. Tune in as they discuss the basic requirements for Hyper-V over SMB, as well as its latest enhancements and why this solution is an easy and affordable file storage alternative.

Link: http://channel9.msdn.com/Shows/TechNet+Radio/TechNet-Radio-Windows-Server-2012-Hyper-V-over-SMB

2) Windows Server 2012 - How to Scale-Out a File Server and use it for Hyper-V (September 5th)

Summary: Bob Hunt and Jose Barreto are back for today’s episode where they show us how to scale-out a file server in Windows Server 2012 and how to use it with Hyper-V. Tune in as they discuss the advancements made to file servers in terms of scale, storage, virtual processors, support for VMs per host and per cluster as well as demoing a classic vs. scaled out file server.

Link: http://channel9.msdn.com/Shows/TechNet+Radio/TechNet-Radio-Windows-Server-2012-How-to-Scale-Out-a-File-Server-using-Hyper-V

3) Hyper-V over SMB: Step-by-Step Installation using PowerShell (September 16th)

Summary: Bob Hunt and Jose Barreto continue their Hyper-V over SMB for Windows Server 2012 series, and in today’s episode they discuss how you can configure this installation using PowerShell. Tune in as they take a deep dive into how you can leverage all the features of SMB 3.0 as they go through this extensive step-by-step walkthrough.

Link: http://channel9.msdn.com/Shows/TechNet+Radio/TechNet-Radio-Hyper-V-over-SMB-Step-by-Step-Installation-using-PowerShell

4) SMB Multi-channel Basics for Windows Server 2012 and SMB 3.0 (October 8th)

Summary: Bob Hunt and Jose Barreto continue their SMB 3.0 for Windows Server 2012 series, and in today’s episode they discuss the recent improvements made around networking capabilities found within SMB Multichannel which can help increase network performance and availability for File Servers.

Link: http://channel9.msdn.com/Shows/TechNet+Radio/TechNet-Radio-SMB-Multi-channel-Basics-for-Windows-Server-2012-and-SMB-30

5) SQL Server over SMB 3.0 Overview (October 23rd)

Summary: Principal Program Manager from the Windows File Server team, Claus Joergensen joins Bob Hunt and Jose Barreto as they discuss how and why you would want to implement SQL Server 2012 over SMB 3.0. Tune in as they chat about the benefits, how to set it up as well as address any potential concerns you may have such as performance issues.

Link: http://channel9.msdn.com/Shows/TechNet+Radio/TechNet-Radio-SQL-Server-over-SMB-30-Overview

6) SMB 3.0 Encryption Overview (November 26th)

Summary: Bob Hunt and Jose Barreto continue their SMB series for Windows Server 2012 and in today’s episode they chat about SMB Encryption. Tune in as they discuss this security component from what it is and why its important as well as how this is implemented and configured in your environment with a quick demo on how to do this via the GUI and in PowerShell.

Link: http://channel9.msdn.com/Shows/TechNet+Radio/TechNet-Radio-SMB-30-Encryption-Overview

7) SMB 3.0 Deployment Scenarios (December 6th)

Summary: Bob Hunt and Jose Barreto continue their SMB series for Windows Server 2012 and in today’s episode they chat about deployment scenarios and ways in which you can implement all of the new features in SMB 3.0. Tune in as they dive deep into various deployment strategies for SMB.

Link: http://channel9.msdn.com/Shows/TechNet+Radio/TechNet-Radio-SMB-30-Deployment-Scenarios

New ESG Lab Validation Report shows Performance of Windows Server 2012 Storage and Networking

$
0
0

There is a new ESG report out that shows the Storage and Networking performance of Windows Server 2012.

It highlights the findings around a few key new features, including:

  • Storage Spaces
  • SMB 3.0 File Servers
  • Deduplication
  • CHKDSK Online Scanning
  • Offloaded Data Transfers (ODX)

The numbers speak for themselves and the report provides plenty of tables, charts and configuration details.

Check it out at http://download.microsoft.com/download/8/0/F/80FCCBEF-BC4D-4B84-950B-07FBE31022B4/ESG-Lab-Validation-Windows-Server-Storage.pdf

How does New-SmbShare know whether the new share should be standalone, clustered or scale-out?

$
0
0

I got a question the other day about one of the scripts I published as part of a step-by-step for Hyper-V over SMB, Here's the relevant line from that script:

New-SmbShare -Name VMS3 -Path C:\ClusterStorage\Volume1\VMS -FullAccess FST2.Test\Administrator, FST2.Test\FST2-HV1$, FST2.Test\FST2-HV2$, FST2.Test\FST2-HVC$

The question was related to how does New-SmbShare know to create the share on the cluster as a continuously available share. Nothing in the cmdlet or its parameters tells it that. So the puzzled reader was asking what did he miss. It worked, but he could not figure out how.

The answer is quite simple, although it's not obvious. This is done automatically based on where the folder (specified the -Path parameter) lives.

Here are the rules:

  • If the path is on a local, nonclustered disk, New-SmbShare creates a standalone share
  • If the path is on a classic cluster disk, New-SmbShare creates a classic cluster file share on the group that owns that disk.
  • if the path is on a clustered shared volume (CSV), it creates a scale-out file share.

There is actually a -ScopeName parameter for New-SmbShare, which can be used to specify the cluster name (either the netname for a classic cluster or the DNN for a Scale-Out cluster), but in most cases this is entirely optional.

There is also a -ContinuoulyAvailable parameter, but it automatically defaults to $true if the share is on a cluster, so it also optional (unless you want to create a non-CA share on a cluster - not a good idea anyway).

You can read more about these automatic behaviors in SMB 3.0 at http://blogs.technet.com/b/josebda/archive/2012/10/08/windows-server-2012-file-servers-and-smb-3-0-simpler-and-easier-by-design.aspx

For more details about SMB PowerShell cmdlets, check out http://blogs.technet.com/b/josebda/archive/2012/06/27/the-basics-of-smb-powershell-a-feature-of-windows-server-2012-and-smb-3-0.aspx?Redirected=true

How to use the new SMB 3.0 WMI classes in Windows Server 2012 and Windows 8 (from PowerShell)

$
0
0

If you're an IT Administrator, you're likely to use the new SMB PowerShell cmdlets to manage your SMB 3.0 file shares. You can find details about those at http://blogs.technet.com/b/josebda/archive/2012/06/27/the-basics-of-smb-powershell-a-feature-of-windows-server-2012-and-smb-3-0.aspx

However, if you're a developer, you might be interested in learning about the WMI v2 classes that are behind those PowerShell cmdlets. They are easy to use and exactly match the PowerShell functionality. In fact, you can test them via PowerShell using the Get-WMIObject cmdlet. These WMIv2 classes are available for both Windows 8 and Windows Server 2012.

What is sometimes a little harder to figure out is exactly how to find detailed information about them if you don't know exactly where to look. The key information you need is the namespace for those classes. In the case of SMB, the namespace is Root\Microsoft\Windows\SMB. Here is a sample PowerShell cmdlet to query the WMI information:

PS C:\Windows\system32> Get-WMIObject -Namespace "root\Microsoft\Windows\SMB" -List "MSFT_*"

   NameSpace: ROOT\Microsoft\Windows\SMB

Name                                Methods              Properties
----                                -------              ----------
MSFT_SmbShare                       {CreateShare, Gra... {AvailabilityType, CachingMode, ...
MSFT_SmbShareAccessControlEntry     {}                   {AccessControlType, AccessRight,...
MSFT_WmiError                       {}                   {CIMStatusCode, CIMStatusCodeDes...
MSFT_ExtendedStatus                 {}                   {CIMStatusCode, CIMStatusCodeDes...
MSFT_SmbClientNetworkInterface      {}                   {FriendlyName, InterfaceIndex, I...
MSFT_SmbServerNetworkInterface      {}                   {FriendlyName, InterfaceIndex, I...
MSFT_SmbConnection                  {}                   {ContinuouslyAvailable, Credenti...
MSFT_SmbOpenFile                    {ForceClose}         {ClientComputerName, ClientUserN...
MSFT_SmbMultichannelConnection      {Refresh}            {ClientInterfaceFriendlyName, Cl...
MSFT_SmbClientConfiguration         {GetConfiguration... {ConnectionCountPerRssNetworkInt...
MSFT_SmbShareChangeEvent            {}                   {EventType, Share}
MSFT_SmbServerConfiguration         {GetConfiguration... {AnnounceComment, AnnounceServer...
MSFT_SmbSession                     {ForceClose}         {ClientComputerName, ClientUserN...
MSFT_SmbMapping                     {Remove, Create}     {LocalPath, RemotePath, Status}
MSFT_SmbMultichannelConstraint      {CreateConstraint}   {InterfaceAlias, InterfaceGuid, ...

To test one of these via PowerShell, you can use the same Get-WMIObject cmdlet. Here's a sample (with the traditional PowerShell and the WMI equivalent):

PS C:\Windows\system32> Get-SmbShare | Select Name, Path

Name                                                        Path
----                                                        ----
ADMIN$                                                      C:\Windows
C$                                                          C:\
IPC$
Users                                                       C:\Users

 

PS C:\Windows\system32> Get-WMIObject -Namespace "root\Microsoft\Windows\SMB" MSFT_SmbShare | Select Name, Path

Name                                                        Path
----                                                        ----
ADMIN$                                                      C:\Windows
C$                                                          C:\
IPC$
Users                                                       C:\Users

Obviously the two outputs are exactly the same and the PowerShell version is much simpler. You'll only use WMI if you're really running this from an application, where using WMI classes might be simpler than invoking PowerShell.

The WMI side could get even more complex if you have to filter things or invoke a method of the WMI class (instead of simply getting properties of the returned object).

For instance, here's how you would use WMI to invoke the GetConfiguration method of the MSFT_SmbClientConfiguration class, which would be the equivalent of using the Get-SmbClientConfiguration PowerShell cmdlet:

PS C:\Windows\system32> Get-SmbClientConfiguration

ConnectionCountPerRssNetworkInterface : 4
DirectoryCacheEntriesMax              : 16
DirectoryCacheEntrySizeMax            : 65536
DirectoryCacheLifetime                : 10
EnableBandwidthThrottling             : True
EnableByteRangeLockingOnReadOnlyFiles : True
EnableLargeMtu                        : True
EnableMultiChannel                    : True
DormantFileLimit                      : 1023
EnableSecuritySignature               : True
ExtendedSessionTimeout                : 1000
FileInfoCacheEntriesMax               : 64
FileInfoCacheLifetime                 : 10
FileNotFoundCacheEntriesMax           : 128
FileNotFoundCacheLifetime             : 5
KeepConn                              : 600
MaxCmds                               : 50
MaximumConnectionCountPerServer       : 32
OplocksDisabled                       : False
RequireSecuritySignature              : False
SessionTimeout                        : 60
UseOpportunisticLocking               : True
WindowSizeThreshold                   : 8 

PS C:\Windows\system32> $cc = Invoke-WMIMethod -Namespace "root\Microsoft\Windows\SMB" -Class MSFT_SmbClientConfiguration -Name GetConfiguration
PS C:\Windows\system32> $cc

__GENUS          : 1
__CLASS          : __PARAMETERS
__SUPERCLASS     :
__DYNASTY        : __PARAMETERS
__RELPATH        : __PARAMETERS
__PROPERTY_COUNT : 2
__DERIVATION     : {}
__SERVER         : <REMOVED>
__NAMESPACE      : ROOT\Microsoft\Windows\Smb
__PATH           : \\<REMOVED>\ROOT\Microsoft\Windows\Smb:__PARAMETERS
Output           : System.Management.ManagementBaseObject
ReturnValue      : 0
PSComputerName   : <REMOVED>

PS C:\Windows\system32> $cc.Output

__GENUS                               : 2
__CLASS                               : MSFT_SmbClientConfiguration
__SUPERCLASS                          :
__DYNASTY                             : MSFT_SmbClientConfiguration
__RELPATH                             :
__PROPERTY_COUNT                      : 23
__DERIVATION                          : {}
__SERVER                              : <REMOVED>
__NAMESPACE                           : ROOT\Microsoft\Windows\SMB
__PATH                                :
ConnectionCountPerRssNetworkInterface : 4
DirectoryCacheEntriesMax              : 16
DirectoryCacheEntrySizeMax            : 65536
DirectoryCacheLifetime                : 10
DormantFileLimit                      : 1023
EnableBandwidthThrottling             : True
EnableByteRangeLockingOnReadOnlyFiles : True
EnableLargeMtu                        : True
EnableMultiChannel                    : True
EnableSecuritySignature               : True
ExtendedSessionTimeout                : 1000
FileInfoCacheEntriesMax               : 64
FileInfoCacheLifetime                 : 10
FileNotFoundCacheEntriesMax           : 128
FileNotFoundCacheLifetime             : 5
KeepConn                              : 600
MaxCmds                               : 50
MaximumConnectionCountPerServer       : 32
OplocksDisabled                       : False
RequireSecuritySignature              : False
SessionTimeout                        : 60
UseOpportunisticLocking               : True
WindowSizeThreshold                   : 8
PSComputerName                        : <REMOVED>

You can find more information about these SMB WMI classes at http://msdn.microsoft.com/en-us/library/windows/desktop/hh830479.aspx


Is accessing files via a loopback share the same as using a local path?

$
0
0

Question from a user (paraphrased): When we access a local file via loopback UNC path, is this the same as accessing via the local path? I mean, is  "C:\myfolder\a.txt" equal to "\\myserver\myshare\a.txt" or I'll be using TCP/IP in any way?

Answer from SMB developer: When accessing files over loopback, the initial connect and the metadata operations (open, query info, query directory, etc.) are sent over the loopback connection. However, once a file is open we detect it and forward reads/writes directly to the file system such that TCP/IP is not used. Thus there is some difference for metadata operations, but data operations (where the majority of the data is transferred) behave just like local access.

How much traffic needs to pass between the SMB Client and Server before Multichannel actually starts?

$
0
0

One smart MVP was doing some testing and noticed that SMB Multichannel did not trigger immediately after an SMB session was established. So, he asked: How much traffic needs to pass between the SMB Client and Server before Multichannel actually starts?

Well... SMB Multichannel works slightly different in that regard depending on whether the client is running Windows 8 or Windows Server 2012.

On Windows Server 2012, SMB Multichannel starts whenever an SMB read or SMB write is issued on the session (but not other operations). For servers, network fault tolerance is a key priority and sessions are typically long lasting, so we set up the extra channels as soon as we detect any read or write.

SMB Multichannel in Windows 8 will only engage if there are a few IOs in flight at the same time (technically, when the SMB window size get to a certain point). The default for this WindowSizeThreshold setting is 8 (meaning that there are at least 8 packets asynchronously in flight). That requires some level of activity on the SMB client, so a single small file copy won't trigger it. We wanted to avoid starting Multichannel for every connection from a client, especially if just doing a small amount of work. You can query this client setting via "Get-SmbClientConfiguration". You can set it with "Set-SmbClientConfiguration -WindowSizeThreshold n". If you set it to 1, for instance, to have a behavior similar to Windows Server 2012.

Even after SMB Multichannel kicks in, the extra connections might take a few seconds to actually get established. This is because the process involves querying the server for interface information, there's some thinking involved about which paths to use and SMB does this as a low priority activity. However, SMB traffic continues to use the initial connection and does not wait for additional connections to be established. Once the extra connections are setup, they won't be torn down even if activity level drops. If the session ends and is later restarted, though, the whole process will start again from scratch. 

You can learn more about SMB Multichannel at http://blogs.technet.com/b/josebda/archive/2012/06/28/the-basics-of-smb-multichannel-a-feature-of-windows-server-2012-and-smb-3-0.aspx

Minimum version of Mellanox firmware required for running SMB Direct in Windows Server 2012

$
0
0

There are two blog posts explaining in great detail what you need to do to use Mellanox ConnectX-2 or ConnectX-3 cards to implement RDMA networking for SMB 3.0 (using SMB Direct). You can find them at:

However, I commonly get some question where SMB cmdlets reports a Mellanox NIC as not being RDMA-capable. Over time, I learned that the most common issue around this is an outdated firmware. Windows Server 2012 now comes with an inbox driver for these Mellanox adapters, but it is possible that your firmware on the adapter itself is old. This will cause the NIC to not use RDMA.

To be clear, your Mellanox NIC must have firmware version 2.9.8350 or higher to work with SMB. The driver actually checks the firmware version on start up and logs a message if the firmware does not meet this criteria: "The firmware version that is burned on the device <device name> does not support Network Direct functionality. This may affect the File Transfer (SMB) performance. The current firmware version is <current version> while we recommend using firmware version 2.9.8350 or higher. Please burn a newer firmware and restart the Mellanox ConnectX device. For more details about firmware burning process please refer to Support information on http://mellanox.com".

However, since the NIC actually works fine without RDMA (at reduced performance and higher CPU utilization), some administrators might fail to identify this issue. If they are following the steps outlined in the links above, the verification steps will point to the fact that RDMA is actually not being used and the NIC is running only with TCP/IP.

The solution is obviously to download the firmware update tools from the Mellanox site and fix it. It will also come with the latest driver version, which is newer than the inbox driver. The direct link to that Mellanox page is http://www.mellanox.com/content/pages.php?pg=products_dyn&product_family=32&menu_section=34. You need to select the “Windows Server 2012” tab at the bottom of that page and download the "MLNX WinOF VPI for x64 platforms" package, shown in the picture below.

image

Sample PowerShell Scripts for Storage Spaces, standalone Hyper-V over SMB and SQLIO testing

$
0
0

These are some PowerShell snippets to configure a specific set of systems for Hyper-V over SMB testing.
Posting it here mainly for my own reference, but maybe someone else out there is configuring a server with 48 disks split into 6 pools of 8 disks.
These systems do not support SES (SCSI Enclosure Services) so I could not use slot numbers.
This setup includes only two computers: a file server (ES-FS2) and a Hyper-V host (ES-HV1).
Obviously a standalone setup. I'm using those mainly for some SMB Direct performance testing.

Storage Spaces - Create pools

$s = Get-StorageSubSystem -FriendlyName *Spaces*

$d = Get-PhysicalDisk | ? { ";1;2;3;4;5;6;7;8;".Contains(";"+$_.DeviceID+";") }
New-StoragePool -FriendlyName Pool1 -StorageSubSystemFriendlyName $s.FriendlyName -PhysicalDisks $d

$d = Get-PhysicalDisk | ? { ";33;34;35;36;37;38;39;40;".Contains(";"+$_.DeviceID+";") }
New-StoragePool -FriendlyName Pool2 -StorageSubSystemFriendlyName $s.FriendlyName -PhysicalDisks $d

$d = Get-PhysicalDisk | ? { ";25;26;27;28;29;30;31;32;".Contains(";"+$_.DeviceID+";") }
New-StoragePool -FriendlyName Pool3 -StorageSubSystemFriendlyName $s.FriendlyName -PhysicalDisks $d

$d = Get-PhysicalDisk | ? { ";17;18;19;20;21;22;23;24;".Contains(";"+$_.DeviceID+";") }
New-StoragePool -FriendlyName Pool4 -StorageSubSystemFriendlyName $s.FriendlyName -PhysicalDisks $d

$d = Get-PhysicalDisk | ? { ";9;10;11;12;13;14;15;16;".Contains(";"+$_.DeviceID+";") }
New-StoragePool -FriendlyName Pool5 -StorageSubSystemFriendlyName $s.FriendlyName -PhysicalDisks $d

$d = Get-PhysicalDisk | ? { ";41;42;43;44;45;46;47;48;".Contains(";"+$_.DeviceID+";") }
New-StoragePool -FriendlyName Pool6 -StorageSubSystemFriendlyName $s.FriendlyName -PhysicalDisks $d

Storage Spaces - Create Spaces (Virtual Disks)

1..6 | % {
Set-ResiliencySetting -Name Mirror -NumberofColumnsDefault 4 -StoragePool  ( Get-StoragePool -FriendlyName Pool$_ )
New-VirtualDisk -FriendlyName Space$_ -StoragePoolFriendlyName Pool$_ -ResiliencySettingName Mirror -UseMaximumSize
}

Initialize disks, partitions and volumes

1..6 | % {
$c = Get-VirtualDisk -FriendlyName Space$_ | Get-Disk
Set-Disk -Number $c.Number -IsReadOnly 0
Set-Disk -Number $c.Number -IsOffline 0
Initialize-Disk -Number $c.Number -PartitionStyle GPT
$L = “DEFGHI”[$_-1]
New-Partition -DiskNumber $c.Number -DriveLetter $L -UseMaximumSize
Initialize-Volume -DriveLetter $L -FileSystem NTFS -Confirm:$false
}

Confirm everything is OK

Get-StoragePool Pool* | sort FriendlyName | % { $_ ; ($_ | Get-PhysicalDisk).Count }
Get-VirtualDisk | Sort FriendlyName
Get-VirtualDisk Space* | % { $_ | FT FriendlyName, Size, OperationalStatus, HealthStatus ; $_ | Get-PhysicalDisk | FT DeviceId, Usage, BusType, Model }
Get-Disk
Get-Volume | Sort DriveLetter

Verify SMB Multichannel configuration

Get-SmbServerNetworkInterface -CimSession ES-FS2
Get-SmbClientNetworkInterface -CimSession ES-HV1 | ? LinkSpeed -gt 1
Get-SmbMultichannelConnection -CimSession ES-HV1

On the local system, create files and run SQLIO

1..6 | % {
   $d = “EFGHIJ”[$_-1]
   $f=$d+”:\testfile.dat”
   fsutil file createnew $f (256GB)
   fsutil file setvaliddata $f (256GB)
}

c:\sqlio\sqlio2.exe -s10 -T100 -t4 -o16 -b512 -BN -LS -fsequential -dEFGHIJ testfile.dat
c:\sqlio\sqlio2.exe -s10 -T100 -t16 -o16 -b8 -BN -LS -frandom -dEFGHIJ testfile.dat

Create the SMB Shares for use by ES-HV1

1..6 | % {
   $p = “EFGHIJ”[$_-1]+":\"
   $s = "Share"+$_
   New-SmbShare -Name $s -Path $p -FullAccess ES\User, ES\ES-HV1$
   (Get-SmbShare -Name $s).PresetPathAcl | Set-Acl
}

On remote system, map the drives and run SQLIO again:

1..6 | % {
   $l = “EFGHIJ”[$_-1]+":"
   $r = "\\ES-FS2\Share"+$_
   New-SmbMapping -LocalPath $l -RemotePath $r
}

c:\sqlio\sqlio2.exe -s10 -T100 -t4 -o16 -b512 -BN -LS -fsequential -dEFGHIJ testfile.dat
c:\sqlio\sqlio2.exe -s10 -T100 -t16 -o16 -b8 -BN -LS -frandom -dEFGHIJ testfile.dat

Creating the VM BASE

New-VM -Name VMBASE -VHDPath C:\VMS\BASE.VHDX -Memory 8GB
Start-VM VMBASE
Remove-VM VMBASE

Set up VMs - Option 1 - from a BASE and empty shares

1..6 | % {
   Copy C:\VMS\Base.VHDX \\ES-FS2\Share$_\VM$_.VHDX
   New-VHD -Path \\ES-FS2\Share$_\Data$_.VHDX -Fixed -Size 256GB
   New-VM -Name VM$_ -VHDPath \\ES-FS2\Share$_\VM$_.VHDX -Path \\ES-FS2\Share$_ -Memory 8GB
   Set-VM -Name VM$_ -ProcessorCount 8
   Add-VMHardDiskDrive -VMName VM$_ -Path \\ES-FS2\Share$_\Data$_.VHDX
   Add-VMNetworkAdapter -VMName VM$_ -SwitchName Internal
}

Set up VMS - Option 2 - when files are already in place

1..6 | % {
   New-VM -Name VM$_ -VHDPath \\ES-FS2\Share$_\VM$_.VHDX -Path \\ES-FS2\Share$_ -Memory 8GB
   Set-VM -Name VM$_ -ProcessorCount 8
   Add-VMHardDiskDrive -VMName VM$_ -Path \\ES-FS2\Share$_\Data$_.VHDX
   Add-VMNetworkAdapter -VMName VM$_ -SwitchName Internal

Setting up E: data disk inside each VM

Set-Disk -Number 1 -IsReadOnly 0
Set-Disk -Number 1 -IsOffline 0
Initialize-Disk -Number 1 -PartitionStyle GPT
New-Partition -DiskNumber 1 -DriveLetter E -UseMaximumSize
Initialize-Volume -DriveLetter E -FileSystem NTFS -Confirm:$false

fsutil file createnew E:\testfile.dat (250GB)
fsutil file setvaliddata E:\testfile.dat (250GB)

Script to run inside the VMs

PowerShell script sqlioloop.ps1 on a shared X: drive (mapped SMB share) run from each VM
Node identified by the last byte of its IPv4 address.
Empty file go.go works as a flag to start running the workload on several VMs at once.
Also saves SQLIO output to a text file on the shared X: drive
Using a separate batch files to run SQLIO itself so they can be easily tuned even when the script is running on all VMs

CD X:\SQLIO
$node = (Get-NetIPAddress | ? IPaddress -like 192.168.99.*).IPAddress.Split(".")[3]
while ($true)
{
  if ((dir x:\sqlio\go.go).count -gt 0)
  {
     "Starting large IO..."
     .\sqliolarge.bat >large$node.txt
     "Pausing 10 seconds..."
      start-sleep 10
     "Starting small IO..."
      .\sqliosmall.bat >small$node.txt
     "Pausing 10 seconds..."
      start-sleep 10
   }
   "node "+$node+" is waiting..."
   start-sleep 1
}

PowerShell script above uses file sqliolarge.bat

.\sqlio2.exe -s20 -T100 -t2 -o16 -b512 -BN -LS -fsequential -dE testfile.dat

Also uses sqliosmall.bat

.\sqlio2.exe -s20 -T100 -t4 -o16 -b8 -BN -LS -frandom -dE testfile.dat

Hyper-V over SMB – Sample Configurations

$
0
0

This post describes a few different Hyper-V over SMB sample configurations with increasing levels of availability. Not all configurations are recommended for production deployment, since some do not provide continuous availability. The goal of the post is to show how one can add redundancy, Storage Spaces and Failover Clustering in different ways to provide additional fault tolerance to the configuration.

 

1 – All Standalone

 

image

 

Hyper-V

  • Standalone, shares used for VHD storage

File Server

  • Standalone, Local Storage

Configuration highlights

  • Flexibility (Migration, shared storage)
  • Simplicity (File Shares, permissions)
  • Low acquisition and operations cost

Configuration lowlights

  • Storage not fault tolerant
  • File server not continuously available
  • Hyper-V VMs not highly available
  • Hardware setup and OS install by IT Pro

 

2 – All Standalone + Storage Spaces

 

image

 

Hyper-V

  • Standalone, shares used for VHD storage

File Server

  • Standalone, Storage Spaces

Configuration highlights

  • Flexibility (Migration, shared storage)
  • Simplicity (File Shares, permissions)
  • Low acquisition and operations cost
  • Storage is Fault Tolerant

Configuration lowlights

  • File server not continuously available
  • Hyper-V VMs not highly available
  • Hardware setup and OS install by IT Pro

 

3 – Standalone File Server, Clustered Hyper-V

 

image

 

Hyper-V

  • Clustered, shares used for VHD storage

File Server

  • Standalone, Storage Spaces

Configuration highlights

  • Flexibility (Migration, shared storage)
  • Simplicity (File Shares, permissions)
  • Low acquisition and operations cost
  • Storage is Fault Tolerant
  • Hyper-V VMs are highly available

Configuration lowlights

  • File server not continuously available
  • Hardware setup and OS install by IT Pro

 

4 – Clustered File Server, Standalone Hyper-V

 

image

 

Hyper-V

  • Standalone, shares used for VHD storage

File Server

  • Clustered, Storage Spaces

Configuration highlights

  • Flexibility (Migration, shared storage)
  • Simplicity (File Shares, permissions)
  • Low acquisition and operations cost
  • Storage is Fault Tolerant
  • File Server is Continuously Available

Configuration lowlights

  • Hyper-V VMs not highly available
  • Hardware setup and OS install by IT Pro

 

5 – All Clustered

 

image

 

Hyper-V

  • Clustered, shares used for VHD storage

File Server

  • Clustered, Storage Spaces

Configuration highlights

  • Flexibility (Migration, shared storage)
  • Simplicity (File Shares, permissions)
  • Low acquisition and operations cost
  • Storage is Fault Tolerant
  • Hyper-V VMs are highly available
  • File Server is Continuously Available

Configuration lowlights

  • Hardware setup and OS install by IT Pro

 

6 – Cluster-in-a-box

 

image

 

Hyper-V

  • Clustered, shares used for VHD storage

File Server

  • Cluster-in-a-box

Configuration highlights

  • Flexibility (Migration, shared storage)
  • Simplicity (File Shares, permissions)
  • Low acquisition and operations cost
  • Storage is Fault Tolerant
  • File Server is continuously Available
  • Hardware and OS pre-configured by the OEM

 

More details

 

You can find additional details on these configurations in this TechNet Radio show: http://channel9.msdn.com/Shows/TechNet+Radio/TechNet-Radio-SMB-30-Deployment-Scenarios

You can also find more information about the Hyper-V over SMB scenario in this TechEd video recording: http://channel9.msdn.com/Events/TechEd/NorthAmerica/2012/VIR306

Hyper-V over SMB – Performance considerations

$
0
0

1. Introduction

 

If you follow this blog, you probably already had a chance to review the “Hyper-V over SMB” overview talk that I delivered at TechEd 2012 and other conferences. Now I am working on a new version of that talk that still covers the basics, but adds segments focused on end-to-end performance and sample configurations. This post looks at the end-to-end performance portion.

 

2. Typical Hyper-V over SMB configuration

 

End-to-end performance starts by drawing an end-to-end configuration. The diagram below shows a typical Hyper-V over SMB configuration including:

  • Clients that access virtual machines
  • Nodes in a Hyper-V Cluster
  • Nodes in a File Server Cluster
  • SAS JBODs acting as shared storage for the File Server Cluster

 

image

 

The main highlights of the diagram above include the redundancy in all layers and the different types of network connecting the layers.

 

3. Performance considerations

 

With the above configuration in mind, you can then start to consider the many different options at each layer that can affect the end-to-end performance of the solution. The diagram below highlights a few of the items, in the different layers, that would have a significant impact.

 

image

 

These items include:

  • Clients
    • Number of clients
    • Speed of the client NICs
  • Virtual Machines
    • VMs per host
    • Virtual processors and RAM per VM
  • Hyper-V Hosts
    • Number of Hyper-V hosts
    • Cores and RAM per Hyper-V host
    • NICs per Hyper-V host (connecting to clients) and the speed of those NICs
    • RDMA NICs (R-NICs) per Hyper-V host (connecting to file servers) and the speed of those NICs
  • File Servers
    • Number of File Servers (typically 2)
    • RAM per File Server, plus how much is used for CSV caching
    • Storage Spaces configuration, including number of spaces, resiliency settings and number of columns per space
    • RDMA NICs (R-NICs) per File Server (connecting to Hyper-V hosts) and the speed of those NICs
    • SAS HBAs per File Server (connecting to the JBODs) and speed of those HBAs
  • JBODs
    • SAS ports per module and the speed of those ports
    • Disks per JBOD, plus the speed of the disks and of their SAS connections

It’s also important to note that the goal is not to achieve the highest performance possible, but to find a balanced configuration that delivers the performance required by the workload at the best possible cost.

 

4. Sample configuration

 

To make things a bit more concrete, you can look at a sample VDI workload.

Supposed you need to create a solution to host 500 VDI VMs. Here are some steps to consider when planning:

  • Workload, disks, JBODs, hosts
    • Assume this is the agreed upon workload: 500 VDI VMs, 2GB RAM, 1 virtual processor, ~50GB per VM, ~30 IOPS per VM, ~64KB per IO
    • Assume we decided to use this specific type of disks: 900 GB HDD at 10,000 rpm, around 140 IOPS
    • And this type of JBOD: SAS JBOD with dual SAS modules, two 4-lane 6Gbps port per module, up to 60 disks per JBOD
    • Finally, this is the agreed upon spec for the Hyper-V host: 16 cores, 128GB RAM
  • Storage
    • Number of disks required based on IOPS: 30 * 500 /140 = ~107 disks
    • Number of disks required based on capacity: 50GB * 2 * 500 / 900 = ~56 disks.
    • Some additional capacity is required for snapshots and backups.
    • It seems like we need 107 disks for IOPS to fulfill both the IOPS and capacity requirements
    • We can then conclude we need 2 JBODs with 60 disks each (that would give us 120 disks, including some spares)
  • Hyper-V hosts
    • 2 GB VM / 128GB = ~ 50 VM/host – leaving some RAM for host
    • 50 VMs * 1 virtual procs / 16 cores = ~ 3:1 ratio between virtual and physical processors.
    • 500 VMs / 50 = ~ 10 hosts – We could use 11 hosts, filling all the requirements plus one as spare
  • Networking
    • 500 VMs*30 IOPS*64KB = 937 MBps required – This works well with a single 10GbE which can deliver 1100 MBps . 2 for fault tolerance.
    • Single 4-lane SAS at 6Gbps delivers 2200 MBps. 2 for fault tolerance. You could actually use 3Gbps SAS HBAs here if you wanted.
  • File Server
    • 500 * 25 IOPS = 12,500 IOPS. Single file server can deliver that without any problem. 2 for fault tolerance.
    • RAM = 64GB, good size that allows for some CSV caching (up to 20% of RAM)

Please note that this is simply as an example, since your specific workload requirements may vary. There’s also no general industry agreement in exactly what a VDI workload looks like, which kind of disk should be used or how much RAM would work best for the Hyper-V hosts. So, take this example with a grain of salt :-)

With all that in mind, let’s draw this out:

 

image

 

Now it’s up to you to work out the specific details of your own workload and hardware options.

 

5. Configuration Variations

 

It’s also important to notice that there are several potential configuration variations for the Hyper-V over SMB scenario, including:

  • Using a regular Ethernet NICs instead of RDMA NICs between the Hyper-V hosts and the File Servers
  • Using a third-party SMB 3.0 NAS instead of a Windows File Server
  • Using Fibre Channel or iSCSI instead of SAS, along with a traditional SAN instead of JBODs and Storage Spaces

 

6. Speeds and feeds

 

In order to make some of the calculations, you might need to understand the maximum theoretical throughput of the interfaces involved. For instance, it helps to know that a 10GbE NIC cannot deliver up more than 1.1 GBytes per second or that a single SAS HBA sitting on an 8-lane PCIe Gen2 slot cannot deliver more than 3.4 GBytes per second. Here are some tables to help out with that portion:

 

NIC Throughput
1Gb Ethernet~0.1 GB/sec
10Gb Ethernet~1.1 GB/sec
40Gb Ethernet~4.5 GB/sec
32Gb InfiniBand (QDR)~3.8 GB/sec
56Gb InfiniBand (FDR)~6.5 GB/sec

 

HBA Throughput
3Gb SAS x4~1.1 GB/sec
6Gb SAS x4~2.2 GB/sec
4Gb FC~0.4 GB/sec
8Gb FC~0.8 GB/sec
16Gb FC~1.5 GB/sec

 

Bus Slot Throughput
PCIe Gen2 x4~1.7 GB/sec
PCIe Gen2 x8~3.4 GB/sec
PCIe Gen2 x16~6.8 GB/sec
PCIe Gen3 x4~3.3 GB/sec
PCIe Gen3 x8~6.7 GB/sec
PCIe Gen3 x16~13.5 GB/sec

 

Intel QPI Throughput
4.8 GT/s~9.8 GB/sec
5.86 GT/s~12.0 GB/sec
6.4 GT/s~13.0 GB/sec
7.2 GT/s~14.7 GB/sec
8.0 GT/s~16.4 GB/sec

 

Memory Throughput
DDR2-400 (PC2-3200)~3.4 GB/sec
DDR2-667 (PC2-5300)~5.7 GB/sec
DDR2-1066 (PC2-8500)~9.1 GB/sec
DDR3-800 (PC3-6400)~6.8 GB/sec
DDR3-1333 (PC3-10600)~11.4 GB/sec
DDR3-1600 (PC3-12800)~13.7 GB/sec
DDR3-2133 (PC3-17000)~18.3 GB/sec

 

Also, here is some fine print on those tables:

  • Only a few common configurations listed.
  • All numbers are rough approximations.
  • Actual throughput in real life will be lower than these theoretical maximums.
  • Numbers provided are for one way traffic only (you should double for full duplex).
  • Numbers are for one interface and one port only.
  • Numbers use base 10 (1 GB/sec = 1,000,000,000 bytes per second)

 

7. Conclusion

 

I’m still working out the details of this new Hyper-V over SMB presentation, but this posts summarizes the portion related to end-to-end performance.

I plan to deliver this talk to an internal Microsoft audience this week and also during the MVP Summit later this month. I am also considering submissions for MMS 2013 and TechEd 2013.

You can get a preview of this portion of the talk by watching this recent TechNet Radio show I recorded with Bob Hunt: Hyper-V over SMB 3.0 Performance Considerations.

Demo: Hyper-V over SMB at high throughput with SMB Direct and SMB Multichannel

$
0
0

Overview

 

I delivered a new demo of Hyper-V over SMB this week that’s an evolution of a demo I did back in the Windows Server 2012 launch and also via a TechNet Radio session.

Back then I showed a two physical servers running a SQLIO simulation. One played the role of the File Server and the other work as a SQL Server.

This time around I’m using 12 VMs accessing a File Server at the same time. So this is a Hyper-V over SMB demo instead of showing SQL Server over SMB.

 

Hardware

 

The diagram below shows the details of the configuration.

You have an EchoStreams FlacheSAN2 working as File Server, with 2 Intel CPUs at 2.40 Ghz and 64GB of RAM. It includes 6 LSI SAS adapters and 48 Intel SSDs attached directly to the server. This is an impressively packed 2U unit.

The Hyper-V Server is a Dell PowerEdge R720 with 2 Intel CPUs at 2.70 GHz and 128GB of RAM. There are 12 VMs configured in the Hyper-V host, each with 4 virtual processors and 8GB of RAM. 

Both the File Server and the Hyper-V host use three 56Gbps Mellanox ConnectX-3 network interfaces sitting on PCIe Gen3 x8 slots.

 

image

 

Results

 

The demo showcases two workloads are shown: SQLIO with 512KB IOs and SQLIO with 32KB IOs. For each one, the results are shown for a physical host (single instance of SQLIO running over SMB, but without Hyper-V) and with virtualization (12 Hyper-V VMs running simultaneously over SMB). See the details below.

 

image

 

The first workload (using 512KB IOs) shows very high throughput from the VMs (around 15GBytes/sec combined from all 12 VMs). That’s roughly the equivalent of fourteen 10Gbps Ethernet ports combined or around nineteen 8Gbps Fibre Channel ports. And look at that low CPU utilization...

The second workload shows high IOPS (around 300,000 IOPs of 32KB each). That IO size is definitely larger than most high IOPs demos you’ve seen before. This also delivers throughput of around 10GBytes/sec. It’s important to note that this demo accomplishes this on 2-socket/16-core servers, even though this specific workload is fairly CPU-intensive.

Notes:

  • The screenshots above show an instant snapshot of a running workload using Performance Monitor. I also ran each workload for only 20 seconds. Ideally you would run the workload multiple times with a longer duration and average things out.
  • Some of the 6 SAS HBAs on the File Server are sitting on a x4 PCIe slot, since not every one of the 9 slots on the server are x8. For this reason some of the HBAs perform better than others.
  • Using 4 virtual processors for each of the 12 VMs appears to be less than ideal. I'm planning to experiment with using more virtual processors per VM to potentially improve the performance a bit.

 

Conclusion

 

This is yet another example of how SMB Direct and SMB Multichannel can be combined to produce a high performance File Server for Hyper-V Storage.

This specific configuration pushes the limits of this box with 9 PCIe Gen3 slots in use (six for SAS HBAs and three for RDMA NICs).

I am planning to showcase this setup in a presentation planned for the MMS 2013 conference. If you’re planning to attend, I look forward to seeing you there.


Hardware options for highly available Windows Server 2012 systems using shared, directly-attached storage

$
0
0

Highly available Windows Server 2012 systems using shared, directly-attached storage can be built using either Storage Spaces or a validated clustered RAID controller.

 

Option 1 – Storage Spaces

You can build a highly available shared SAS system today using Storage Spaces.

Storage Spaces works well in a standalone PC, but it is also capable of working in a Windows Server Failover Clustering environment. 

For implementing Clustered Storage Spaces, you will need the following Windows Server 2012 certified hardware:

  • Any SAS Host Bus Adapter or HBA (as long as it’s SAS and not a RAID controller, you should be fine)
  • SAS JBODs or disk enclosures (listed under the “Storage Spaces” category on the Server catalog)
  • SAS disks (there’s a wide variety of those, including capacity HDDs, performance HDDs and SSDs)

You can find instructions on how to configure a Clustered Storage Space in Windows Server 2012 at http://blogs.msdn.com/b/clustering/archive/2012/06/02/10314262.aspx.

A good overview of Storage Spaces and its capabilities can be found at http://social.technet.microsoft.com/wiki/contents/articles/15198.storage-spaces-overview.aspx

There's also an excellent presentation from TechEd that covers Storage Spaces at http://channel9.msdn.com/Events/TechEd/NorthAmerica/2012/WSV315

 

Option 2 – Clustered RAID Controllers

The second option is to build a highly available shared storage system using RAID Controllers that are designed to work in a Windows Server Failover Cluster configuration.

The main distinction between these RAID controllers and the ones we used before is that they work in sets (typically a pair) and coordinate their actions against the shared disks.

Here are some examples:

  • The HP StoreEasy 5000 cluster-in-a-box uses Clustered RAID controllers that HP sources and certifies. You can find details at the HP StoreEasy product page.
  • LSI is working on a Clustered RAID controller with Windows Server 2012 support. This new line of SAS RAID Controllers is scheduled for later this year. You can get details on availability dates from LSI.

 

Both options work great for all kinds of Windows Server 2012 Clusters, including Hyper-V Clusters, SQL Server Clusters, Classic File Server Clusters and Scale-Out File Servers.

You can learn more about these solutions in this TechEd presentation: http://channel9.msdn.com/Events/TechEd/Europe/2012/WSV310

Increasing Availability – The REAP Principles (Redundancy, Entanglement, Awareness and Persistence)

$
0
0

Introduction

 

Increasing availability is a key concern with computer systems. With all the consolidation and virtualization efforts under way, you need to make sure your services are always up and running, even when some components fail. However, it’s usually hard to understand the details of what it takes to make systems highly available (or continuously available). And there are so many options…

In this blog post, I will describe four principles that cover the different requirements for Availability: Redundancy, Entanglement, Awareness and Persistence. They apply to different types of services and I’ll provide some examples related to the most common server roles, including DHCP, DNS, Active Directory, Hyper-V, IIS, Remote Desktop Services, SQL Server, Exchange Server, and obviously File Services (I am in the “File Server and Clustering” team, after all). Every service employs different strategies to implement these “REAP Principles” but they all must implement them in some fashion to increase availability.

Note: A certain familiarity with common Windows Server roles and services is assumed here. If you are not familiar with the meaning of DHCP, DNS or Active Directory, this post is not intended for you. If that’s the case, you might want to do some reading on those topics before moving forward here.

 

Redundancy – There is more than one of everythingimage

 

Availability starts with redundancy. In order to provide the ability to survive failures, you must have multiple instance of everything that can possibly fail in that system. That means multiple servers, multiple networks, multiple power supplies, multiple storage devices. You should be seeing everything (at least) doubled in your configuration. Whatever is not redundant is commonly labeled a “Single Point of Failure”.

Redundancy is not cheap, though. By definition, it will increase the cost of your infrastructure. So it’s an investment that can only be justified when there is understanding of the risks and needs associated with service disruption, which should be balanced with the cost of higher availability. Sadly, that understanding sometimes only comes after a catastrophic event (such as data loss or an extended outage).

Ideally, you would have a redundant instance that is as capable as your primary one. That would make your system work as well after the failure as it did before. It might be acceptable, though, to have a redundant component that is less capable. In that case, you’ll be in a degraded (although functional) state after a failure, while the original part is being replaced. Also keep in mind that, these days, redundancy in the cloud might be a viable option.

For this principle, there’s really not much variance per type of Windows Server role. You basically need to make sure that you have multiple servers providing the service, and make sure the other principles are applied.

 

Entanglement – Achieving shared state via spooky action at a distance

 image

Having redundant equipment is required but certainly not sufficient to provide increased availability. Once any meaningful computer system is up and running, it is constantly gathering information and keeping track of it. If you have multiple instances running, they must be “entangled” somehow. That means that the current state of the system should be shared across the multiple instances so it can survive the loss of any individual component without losing that state. It will typically include some complex “spooky action at a distance”, as Einstein famously said of Quantum Mechanics.

A common way to do it is using a database (like SQL Server) to store your state. Every transaction performed by a set of web servers, for instance, could be stored in a common database and any web server can be quickly reprovisioned and connected to the database again. In a similar fashion, you can use Active Directory as a data store, as it’s done by services like DFS Namespaces and Exchange Server (for user mailbox information). Even a file server could serve a similar purpose, providing a location to store files that can be changed at any time and accessed by a set of web servers. If you lose a web server, you can quickly reprovision it and point it to the shared file server.

If using SQL Server to store the shared state, you must also abide by the Redundancy principle by using multiple SQL Servers, which must be entangled as well. One common way to do it is using shared storage. You can wire these servers to a Fibre Channel SAN or an iSCSI SAN or even a file server to store the data. Failover clustering in Windows Server (used by certain deployments of Hyper-V, File Servers and SQL Server, just to name a few) levarages shared storage as a common mechanism for entanglement.

Peeling the onion further, you will need multiple heads of those storage systems and they must also be entangled. Redundancy at the storage layer is commonly achieved by sharing physical disks and writing the data to multiple places. Most SANs have the option of using dual controllers that are connected to a shared set of disks. Every piece of data is stored synchronously to at least two disks (sometimes more). These SANs can tolerate the failure of individual controllers or disks, preserving their shared state without any disruption. In Windows Server 2012, Clustered Storage Spaces provides a simple solution for shared storage for a set of Windows Servers using only Shared SAS disks, without the need for a SAN.

There are other strategies for Entanglement that do not require shared storage, depending on how much and how frequently the state changes. If you have a web site with only static files, you could maintain shared state by simply provisioning multiple IIS servers with the exact same files. Whenever you lose one, simply replace it. For instance, Windows Azure and Virtual Machine Manager provide mechanisms to quickly add/remove instances of web servers in this fashion through the use of a service template.

If the shared state changes, which is often the case for most web sites, you could go up a notch by regularly copying updated files to the servers. You could have a central location with the current version of the shared state (a remote file server, for instance) plus a process to regularly send full updates to any of the nodes every day (either pushed from the central store or pulled by the servers). This is not very efficient for large amounts of data updated frequently, but could be enough if the total amount of data is small or it changes very infrequently. Examples of this strategy include SQL Server Snapshot Replication, DNS full zone transfers or a simple script using ROBOCOPY to copy files on a daily schedule.

In most cases, however, it’s best to employ a mechanism that can cope with more frequently changing state. Going up the scale you could have a system that sends data to its peers every hour or every few minutes, being careful to send only the data that has changed instead of the full set. That is the case for DNS incremental zone transfers, Active Directory Replication, many types of SQL Server Replication, SQL Server Log Shipping, Asynchronous SQL Server Mirroring (High-Performance Mode), SQL Server AlwaysOn Availability Groups (asynchronous-commit mode), DFS Replication and Hyper-V Replica. These models provide systems that are loosely converging, but do not achieve up-to-the-second coherent shared state. However, that is good enough for some scenarios.

At the high end of replication and right before actual shared storage, you have synchronous replication. This provides the ability to update the information on every entangled system before considering the shared state actually changed. This might slow down the overall performance of the system, especially when the connectivity between the peers suffers from latency. However, there’s something to be said of just having a set of nodes with local storage that achieve a coherent shared state using only software. Common examples here include a few types of SAN replication, Exchange Server (Database Availability Groups), Synchronous SQL Mirroring (High Safety Mode) and SQL Server AlwaysOn Availability Groups (synchronous-commit mode).

As you can see, the Entanglement principle can be addressed in a number of different ways depending on the service. Many services, like File Server and SQL Server, provide multiple mechanisms to deal with it, with varying degrees of cost, complexity, performance and coherence.

 

Awareness – Telling if Schrödinger's servers are alive or not

 image

Your work is not done after you have a redundant entangled system. In order to provide clients with seamless access to your service, you must implement some method to find one of the many sources for the service. The awareness principle refers to how your clients will discover the location of the access points for your service, ideally with a mechanism to do it quickly while avoiding any failed instances. There a few different ways to achieve it, including manual configuration, broadcast, DNS, load balancers, or a service-specific method.

One simple method is to statically configure each client with the name or IP Address of two or more instances of the service. This method is effective if the configuration of the service is not expected to change. If it ever does change, you would need to reconfigure each client. A common example here is how static DNS is configured: you simply specify the IP address of your preferred DNS server and also the IP address if an alternate DNS server in case the preferred one fails.

Another common mechanism is to broadcast a request for the service and wait for a response. This mechanism works only if there’s someone in your local network capable of providing an answer. There’s also a concern about the legitimacy of the response, since a rogue system on the network might be used to provide a malicious version of the service. Common examples here include DHCP service requests and Wireless Access Point discovery. It is fairly common to use one service to provide awareness for others. For instance, once you access your Wireless Access Point, you get DHCP service. Once you get DHCP service, you get your DNS configuration from it.

As you know, the most common use for a DNS server is to map a network name to an IP address (using an A, AAAA or CNAME DNS record). That in itself implements a certain level of this awareness principle. DNS can also associate multiple IP addresses with a single name, effectively providing a mechanism to give you a list of servers that provide a specific service. That list is provided by the DNS server in a round robin fashion, so it even includes a certain level of load balancing as part of it. Clients looking for Web Servers and File Servers commonly use this mechanism alone for finding the many devices providing a service.

DNS also provides a different type of record specifically designed for providing service awareness. This is implemented as SRV (Service) records, which not only offer the name and IP address of a host providing a service, but can decorate it with information about priority, weight and port number where the service is provided. This is a simple but remarkably effective way to provide service awareness through DNS, which is effectively a mandatory infrastructure service these days. Active Directory is the best example of using SRV records, using DNS to allow clients to learn information about the location of Domain Controllers and services provided by them, including details about Active Directory site topology.

Windows Server failover clustering includes the ability to perform dynamic DNS registrations when creating clustered services. Each cluster role (formerly known as a cluster group) can include a Network Name resource which is registered with DNS when the service is started. Multiple IP addresses can be registered for a given cluster Network  Name if the server has multiple interfaces. In Windows Server 2012, a single cluster role can be active on multiple nodes (that’s the case of a Scale-Out File Server) and the new Distributed Network Name implements this as a DNS name with multiple IP addresses (at least one from each node).

DNS does have a few limitations. The main one is the fact that the clients will cache the name/IP information for some time, as specified in the TTL (time to live) for the record. If the service is reconfigure and new addresses or service records are published, DNS clients might take some time to become aware of the change. You can reduce the TTL, but that has a performance impact, causing DNS clients to query the server more frequently. There is no mechanism in DNS to have a server proactively tell a client that a published record has changed. Another issue with DNS is that it provides no method to tell if the service is actually being provided at the moment or even if the server ever functioned properly. It is up to the client to attempt communication and handle failures. Last but not least, DNS cannot help with intelligently balancing clients based on the current load of a server.

Load balancers are the next step in providing awareness. These are network devices that function as an intelligent router of traffic based on a set of rules. If you point your clients to the IP address of the load balancer, that device can intelligently forward the requests to a set for servers. As the name implies, load balancers typically distribute the clients across the servers and can even detect if a certain server is unresponsive, dynamically taking it out of the list. Another concern here is affinity, which is an optimization that consistently forwards a given client to the same server. Since these devices can become a single point of failure, the redundancy principle must be applied here. The most common solution is to have two load balancers in combination with two records in DNS.

SQL Server again uses multiple mechanisms for implementing this principle. DNS name resolution is common, both statically or dynamically using failover clustering Network Name resources. That name is then used as part of the client configuration known as a “Connection String”. Typically, this string will provide the name of a single server providing the SQL Service, along with the database name and authentication details. For instance, a typical connection string would be: "Server=SQLSERV1A; Database=DB301; Integrated Security=True;". For SQL Mirroring, there is a mechanism to provide a second server name in the connection string itself. Here’s an example: "Server=SQLSERV1A; Failover_Partner=SQLSRV1B; Database=DB301; Integrated Security=True;".

Other services provide a specific layers of Awareness, implementing a broker service or client access layer. This is the case of DFS (Distributed File System), which simplifies access to multiple file servers using a unified namespace mechanism. In a similar way, SharePoint web front end servers will abstract the fact that multiple content databases live behind a specific SharePoint farm or site collection. SharePoint Server 2013 goes one step further by implementing a Request Manager service that can even be configure as a Web Server farm placed in front of the main SharePoint web front end farm, with the purpose of routing and throttling incoming requests to improve both performance and availability.

Exchange Server Client Access Servers will query Active Directory to find which Mailbox Server or Database Access Group contains the mailbox for an incoming client. Remote Desktop Connection Broker (formerly known as Terminal Services Session Broker), is used to provide users with access to Remote Desktop services across a set of servers. All these brokers services can typically handle a fair amount of load balancing and be aware of the state of the services behind it. Since these can become single point of failures, they are typically placed behind DNS round robin and/or load balancers.

 

Persistence – The one that is the most adaptable to change will survive

 image

Now that you have redundant entangled services and clients are aware of them, here comes the greatest challenge in availability. Persisting the service in the event of a failure. There are three basic steps to make it happen: server failure detection, failing over to a surviving server (if required) and client reconnection (if required).

Detecting the failure is the first step. It requires a mechanism for aliveness checks, which can be performed by the servers themselves, by a witness service, by the clients accessing the services or a combination of these. For instance, Windows Server failover clustering makes cluster nodes check each other (through network checks), in an effort to determine when a node becomes unresponsive.

Once a failure is detected, for services that work in an active/passive fashion (only one server provides the service and the other remains on standby), a failover is required. This can only be safely achieved automatically if the entanglement is done via Shared Storage or Synchronous Replication, which means that the data from the server that is lost is properly persisted. If using other entanglement methods (like backups or asynchronous replication), an IT Administrator typically has to manually intervene to make sure the proper state is restored before failing over the service. For all active/active solutions, with multiple servers providing the same service all the time, a failover is not required.

Finally, the client might need to reconnect to the service. If the server being used by the client has failed, many services will lose their connections and require intervention. In an ideal scenario, the client will automatically detect (or be notified of) the server failure. Then, because it is aware of other instances of the service, it will automatically connect to a surviving instance, restoring the exact same client state before the failure. This is how Windows Server 2012 implements failover of file servers though a process called SMB 3.0 Continuous Availability, available for both Classic and Scale-Out file server clusters. The file server cluster goes one step further, providing a Witness Service that will proactively notify SMB 3.0 clients of a server failure and point them to an alternate server, even before current pending requests to the failed server time out.

File servers might also leverage a combination of DFS Namespaces and DFS Replication that will automatically recover from a failed server situation, with some potential side effects. While the file client will find an alternative file server via DFS Namespaces, the connection state will be lost and need to be reestablished. Another persistence mechanism in the file server is the Offline Files option (also known as Client Side Caching) commonly used with the Folder Redirection feature. This allows you to keep working on local storage while your file server is unavailable, synchronizing again when the server comes back.

For other services, like SQL Server, the client will surface an error to the application indicating that a failover has occurred and the connection has been lost. If the application is properly coded to handle that situation, the end user will be shielded from error message because the application will simply reconnect to the SQL Server using either the same name (in the case of another server taking over that name) or a Failover Partner name (in case of SQL Server Mirroring) or another instance of SQL Server (in case of more complex log shipping or replication scenarios).

Clients of Web Servers and other load balanced workloads without any persistent state might be able to simply retry an operation in case of a failure. This might happen automatically or require the end-user to retry the operation manually. This might also be the case of a web front end layer that communicates with a web services layer. Again a savvy programmer could code that front end server to automatically retry web services requests, if they are idempotent.

Another interesting example of client persistence is provided by an Outlook client connecting to an Exchange Server. As we mentioned, Exchange Servers implement a sophisticated method of synchronous replication of mailbox databases between servers, plus a Client Access layer that brokers connections to the right set of mailbox servers. On top of that, the Outlook client will simply continue to work from its cache (using only local storage) if for any reason the server becomes unavailable. Whenever the server comes back online, the client will transparent reconnect and synchronize. The entire process is automated, without any action required during or after the failure from either end users and IT Administrators.

 

Samples of how services implement the REAP principles

 

Now that you have the principles down, let’s look at how the main services we mentioned implement them.

ServiceRedundancyEntanglementAwarenessPersistence
DHCP, using split scopesMultiple standalone DHCP ServersEach server uses its own set of scopes, no replicationActive/Active, Clients find DHCP servers via broadcast (whichever responds first)DHCP responses are cached. Upon failure, only surviving servers will respond to the broadcast
DHCP, using failover clusterMultiple DHCP Servers in a failover cluster Shared block storage (FC, iSCSI, SAS)Active/Passive, Clients find DHCP servers via broadcastDHCP responses are cached. Upon failure, failover occurs and a new server responds to broadcasts
DNS, using zone transfersMultiple standalone DNS ServersZone Transfers between DNS Servers at regular intervalsActive/Active, Clients configured with IP addresses of Primary and Alternate servers (static or via DHCP)DNS responses are cached. If query to primary DNS server fails, alternate DNS server is used
DNS, using Active Directory integrationMultiple DNS Servers in a DomainActive Directory ReplicationActive/Active, Clients configured with IP addresses of Primary and Alternate servers (static or via DHCP)DNS responses are cached. If query to primary DNS server fails, alternate DNS server is used
Active DirectoryMultiple Domain Controllers in a DomainActive Directory ReplicationActive/Active, DC Locator service finds closest Domain Controller using DNS service recordsUpon failure, DC Locator service finds a new Domain Controller
File Server, using DFS (Distributed File System)Multiple file servers, linked through DFS. Multiple DFS servers.DFS Replication maintains file server data consistency. DFS Namespace links stored in Active Directory.Active/Active, DFS Namespace used to translate namespaces targets into closest file server.Upon failure of file server, client uses alternate file server target. Upon DFS Namespace failure, alternate is used.
File Server for general use, using failover clusterMultiple File Servers in a failover clusterShared Storage (FC, iSCSI, SAS)Active/Passive, Name and IP address resources, published to DNSFailover, SMB Continuous Availability, Witness Service
File Server, using Scale-Out ClusterMultiple File Servers in a failover clusterShared Storage, Cluster Shared Volume (FC, iSCSI, SAS)Active/Active, Name resource published to DNS (Distributed Network Name)SMB Continuous Availability, Witness Service
Web Server, static contentMultiple Web ServersInitial copy onlyActive/Active. DNS round robin, load balancer or combinationClient retry
Web Server, file server back-endMultiple Web ServersShared File Server Back EndActive/Active. DNS round robin, load balancer or combinationClient retry
Web Server, SQL Server back-endMultiple Web ServersSQL Server databaseActive/Active. DNS round robin, load balancer or combinationClient retry
Hyper-V, failover clusterMultiple servers in a clusterShared Storage (FC, iSCSI, SAS, SMB File Share)Active/Passive. Clients connect to IP exposed by the VMVM restarted upon failure
Hyper-V, ReplicaMultiple serversReplication, per VMActive/Passive. Clients connect to IP exposed by the VMManual failover (test option available)
SQL Server, ReplicationMultiple serversReplication, per database (several methods)Active/Active. Clients connect by server nameApplication may detect failures and switch servers
SQL Server, Log ShippingMultiple serversLog shipping, per databaseActive/Passive. Clients connect by server nameManual failover
SQL Server, MirroringMultiple servers, optional witnessMirroring, per databaseActive/Passive, Failover Partner specified in connection stringAutomatic failover if synchronous, with witness. Application needs to reconnect
SQL Server, AlwaysOn Failover Cluster InstancesMultiple servers in a clusterShared Storage (FC, iSCSI, SAS, SMB File Share)Active/Passive, Name and IP address resources, published to DNSAutomatic Failover, Application needs to reconnect
SQL Server, AlwaysOn Availability GroupsMultiple servers in a clusterMirroring, per availability groupActive/Passive, Availability Group listener with a Name and IP address, published to DNSAutomatic Failover if using synchronous-commit mode. Application needs to reconnect
SharePoint Server (web front end)Multiple ServersSQL Server StorageActive/Active. DNS round robin, load balancer or combination.Client retry
SharePoint Server (request manager)Multiple ServersSQL Server StorageActive/Active. Request manager combined with a load balancer.Client retry
Exchange Server (DAG) with OutlookMultiple Servers in a ClusterDatabase Access Groups (Synchronous Replication)Active/Active. Client Access Point (uses AD for Mailbox/DAG information). Names published to DNS.Outlook client goes in cached mode, reconnects

 

Conclusion

 

I hope this post helped you understand the principles behind increasing server availability.

As a final note, please take into consideration that not all services require the highest possible level of availability. This might be an easier decision for certain services like DHCP, DNS and Active Directory, where the additional cost is relatively small and the benefits are sizable. You might want to think twice when increasing the availability of a large backup server, where some hours of down time might be acceptable and the cost of duplicating the infrastructure is significantly higher.

Depending on how much availability you service level agreement states, you might need different types of solutions. We generally measure availability in “nines”, as described in the table below:

Nines%AvailabilityDowntime per yearDowntime per week
190%~ 36 days~ 16 hours
299%~ 3.6 days~ 90 minutes
399.9%~ 8 hours~ 10 minutes
499.99%~ 52 minutes~ 1 minute
599.999%~ 5 minutes~ 6 seconds

You should consider your overall requirements and the related infrastructure investments that would give you the most “nines” per dollar.

SQLIO, PowerShell and storage performance: measuring IOPs, throughput and latency for both local disks and SMB file shares

$
0
0

1. Introduction

 

I have been doing storage-related demos and publishing blogs with some storage performance numbers for a while, and I commonly get questions such as “How do you run these tests?” or “What tools do you use to generate IOs for your demos?”. While it’s always best to use a real workload to test storage, sometimes that is not convenient. So, I very frequently use a free tool from Microsoft to simulate IOs called SQLIO. It’s a small and simple tool that simulate several types of workloads, including common SQL Server ones. And you can apply it to several configurations, from a physical host or virtual machine, using a local disk, a LUN on a SAN, a Storage Space or an SMB file share.

 

2. Download the tool

 

To get started, you need to download and install the SQLIO tool. You can get it from http://www.microsoft.com/en-us/download/details.aspx?id=20163. The setup will install the tool in a folder of your choice. In the end, you really only need one file: SQLIO.EXE. You can copy it to any folder and it runs in pretty much every Windows version, client or server. In this blog post, I assume that you installed SQLIO on the C:\SQLIO folder.

 

3. Prepare a test file

 

Next, you need to create a file in the disk or file share that you will be using for your demo or test.

Ideally, you should create a file as big as possible, so that you can exercise the entire disk. For hard disks, creating a small file causes the head movement to be restricted to a portion of the disk. Unless you’re willing to use only a fraction of the hard disk capacity, these numbers show unrealistically high random IO performance. Storage professionals call this technique “short stroking”. For SANs, small files might end up being entirely cached in the controller RAM, again giving you great numbers that won’t hold true for real deployments. You can actually use SQLIO to measure the difference between using a large file and a small file for your specific configuration.

To create a large file for your test, the easiest way is using the FSUTIL.EXE tool, which is included with all versions of Windows.

For instance, to create a 1TB file on the X: drive, using the following command from a PowerShell prompt:

FSUTIL.EXE file createnew X:\testfile.dat (1TB)
FSUTIL.EXE file setvaliddata X:\testfile.dat (1TB)

Note 1: You must do this from PowerShell, in order to use the convenient (1TB) notation. If you run this from an old command prompt, you need to calculate 1 terabyte in bytes, which is 1099511627776 (2^40). Before you Storage professionals rush to correct me, I know this is technically incorrect. One terabyte is 10^12 (1000000000000) and 2^40 is actually one Tebibyte (1TiB). However, since both PowerShell and SQLIO use the TB/GB/MB/KB notation when referring to powers of 2, I will ask you to give me a pass here.

Note 2: The “set valid data” command lets you move the “end of file” marker, avoiding a lengthy initialization of the file. This is much faster than writing over the entire file. However, there are security implications for “set valid data” (it might expose leftover data on the disk if you don’t properly initialize the file) and you must be an administrator on the machine to use it.

Here’s another example, with output, using a smaller file size:

PS C:\> FSUTIL.EXE File CreateNew X:\TestFile.DAT (40GB)
File X:\TestFile.DAT is created
PS C:\> FSUTIL.EXE File SetValidData X:\TestFile.DAT (40GB)
Valid data length is changed

 

4. Run the tool

 

With the tool installed and the test file created, you can start running SQLIO.

You also want to make sure there’s nothing else running on the computer, so that other running process don’t interfere with your results by putting additional load on the CPU, network or storage. If the disk you are using is shared in any way (like a LUN on a SAN), you want to make sure that nothing else is competing with your testing. If you’re using any form of IP storage (iSCSI LUN, SMB file share), you want to make sure that you’re not running on a network congested with other kinds of traffic.

WARNING: You could be generating a whole lot of disk IO, network traffic and/or CPU load when you run SQLIO. If you’re in a shared environment, you might want to talk to your administrator and ask permission. This could generate a whole lot of load and disturb anyone else using other VMs in the same host, other LUNs on the same SAN or other traffic on the same network.

From an old command prompt or a PowerShell prompt, issue a single command line to start getting some performance results. Here is your first example, with output, generating random 8KB reads on that file we just created:

PS C:\> C:\SQLIO\SQLIO.EXE -s10 -kR -frandom -b8 -t8 -o16 -LS -BN X:\TestFile.DAT
sqlio v1.5.SG
using system counter for latency timings, 2337894 counts per second
8 threads reading for 10 secs from file X:\TestFile.DAT
        using 8KB random IOs
        enabling multiple I/Os per thread with 16 outstanding
        buffering set to not use file nor disk caches (as is SQL Server)
using current size: 40960 MB for file: X:\TestFile.DAT
initialization done
CUMULATIVE DATA:
throughput metrics:
IOs/sec: 36096.60
MBs/sec:   282.00
latency metrics:
Min_Latency(ms): 0
Avg_Latency(ms): 3
Max_Latency(ms): 55
histogram:
ms: 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+
%: 30 19 12  8  6  5  4  3  3  2  2  2  1  1  1  1  0  0  0  0  0  0  0  0  0

So, for this specific disk (a simple Storage Space created from a pool of 3 SSDs), I am getting over 36,000 IOPs of 8KB each with an average of 3 milliseconds of  latency (time it takes for the operation to complete, from start to finish). Not bad in terms of IOPS, but the latency for 8KB IOs seems a little high for SSD-based storage. We’ll investigate that later.

Let’s try now another command using sequential 512KB reads on that same file:

PS C:\> C:\SQLIO\SQLIO.EXE -s10 -kR -fsequential -b512 -t2 -o16 -LS -BN X:\TestFile.DAT
sqlio v1.5.SG
using system counter for latency timings, 2337894 counts per second
2 threads reading for 10 secs from file X:\TestFile.DAT
        using 512KB sequential IOs
        enabling multiple I/Os per thread with 16 outstanding
        buffering set to not use file nor disk caches (as is SQL Server)
using current size: 40960 MB for file: X:\TestFile.DAT
initialization done
CUMULATIVE DATA:
throughput metrics:
IOs/sec:  1376.09
MBs/sec:   688.04
latency metrics:
Min_Latency(ms): 6
Avg_Latency(ms): 22
Max_Latency(ms): 23
histogram:
ms: 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+
%:  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 33 67  0

I got about 688 MB/sec with an average latency of 22 milliseconds per IO. Again, good throughput, but the latency looks high for SSDs. We’ll dig deeper in a moment.

 

5. Understand the parameters used

 

Now let’s inspect the parameters on those SQLIO command line. I know it’s a bit overwhelming at first, so we’ll go slow. And keep in mind that, for SQLIO parameters, lowercase and uppercase mean different things, so be careful.

Here is the explanation for the parameters used above:

 

ParameterDescriptionNotes
-sThe duration of the test, in seconds.You can use 10 seconds for a quick test. For any serious work, use at least 60 seconds.
-kR=Read, W=writeBe careful with using writes on SSDs for a long time. They can wear out the drive.
-fRandom of SequentialRandom is common for OLTP workloads. Sequential is common for Reporting, Data Warehousing.
-bSize of the IO in KB8KB is the typical IO for OLTP workloads. 512KB is common for Reporting, Data Warehousing.
-tThreadsFor large IOs, just a couple is enough. Sometimes just one.
For small IOs, you could need as many as the number of CPU cores.
-oOutstanding IOs or queue depthIn RAID, SAN or Storage Spaces setups, a single disk can be made up of multiple physical disks. You can start with twice the number of physical disks used by the volume where the file sits. Using a higher number will increase your latency, but can get you more IOPs and throughput.
-LSCapture latency informationAlways important to know the average time to complete an IO, end-to-end.
-BNDo not bufferThis asks for no hardware or software buffering. Buffering plus a small file size will give you performance of the memory, not the disks.

 

For OLTP workloads, I commonly start with 8KB random IOs, 8 threads, 16 outstanding. 8KB is the size of the page used by SQL Server for its data files. In parameter form, that would be: -frandom -b8 -t8 -o16. For reporting or OLAP workloads with large IO, I commonly start with 512KB IOs, 2 threads and 16 outstanding. 512KB is a common IO size when SQL Server loads a batch of 64 data pages when using the read ahead technique for a table scan. In parameter form, that would be: -fsequential -b512 -t2 -o16. These numbers will need to be adjusted if you machine has many cores and/or if you volume is backed up by a large number of physical disks.

If you’re curious, here are more details about parameters for SQLIO, coming from the tool’s help itself:

Usage: D:\sqlio\sqlio.exe [options] [<filename>...]
        [options] may include any of the following:
        -k<R|W>                 kind of IO (R=reads, W=writes)
        -t<threads>             number of threads
        -s<secs>                number of seconds to run
        -d<drv_A><drv_B>..      use same filename on each drive letter given
        -R<drv_A/0>,<drv_B/1>.. raw drive letters/number for I/O
        -f<stripe factor>       stripe size in blocks, random, or sequential
        -p[I]<cpu affinity>     cpu number for affinity (0 based)(I=ideal)
        -a[R[I]]<cpu mask>      cpu mask for (R=roundrobin (I=ideal)) affinity
        -o<#outstanding>        depth to use for completion routines
        -b<io size(KB)>         IO block size in KB
        -i<#IOs/run>            number of IOs per IO run
        -m<[C|S]><#sub-blks>    do multi blk IO (C=copy, S=scatter/gather)
        -L<[S|P][i|]>           latencies from (S=system, P=processor) timer
        -B<[N|Y|H|S]>           set buffering (N=none, Y=all, H=hdwr, S=sfwr)
        -S<#blocks>             start I/Os #blocks into file
        -v1.1.1                 I/Os runs use same blocks, as in version 1.1.1
        -F<paramfile>           read parameters from <paramfile>
Defaults:
        -kR -t1 -s30 -f64 -b2 -i64 -BN testfile.dat
Maximums:
        -t (threads):                   256
        no. of files, includes -d & -R: 256
        filename length:                256

 

6. Tune the parameters for large IO

 

Now the you have the basics down, we can spend some time looking at how you can refine your number of threads and queue depth for your specific configuration. This might help us figure out why we had those higher than expected latency numbers in the initial runs. You basically need to experiment with the -t and the -o parameters until you find the one that give you the best results.

Let’s start with queue depth. You first want to find out the latency for a given system with a small queue depth, like 1 or 2. For 512KB IOs, here’s what I get from my test disk with a queue depth of 1 and a thread count of 1:

PS C:\> C:\SQLIO\SQLIO.EXE -s10 -kR -fsequential -b512 -t1 -o1 -LS -BN X:\TestFile.DAT
sqlio v1.5.SG
using system counter for latency timings, 2337894 counts per second
1 thread reading for 10 secs from file X:\TestFile.DAT
        using 512KB sequential IOs
        enabling multiple I/Os per thread with 1 outstanding
        buffering set to not use file nor disk caches (as is SQL Server)
using current size: 40960 MB for file: X:\TestFile.DAT
initialization done
CUMULATIVE DATA:
throughput metrics:
IOs/sec:   871.00
MBs/sec:   435.50
latency metrics:
Min_Latency(ms): 1
Avg_Latency(ms): 1
Max_Latency(ms): 1
histogram:
ms: 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+
%:  0 100  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0

For large IOs, we typically look at the throughput (in MB/sec). With 1 outstanding IO, we are at 435 MB/sec with just 1 millisecond of latency per IO. However, if you don’t queue up some IO, we’re not extracting the full throughput of the disk, since we’ll be processing the data while the disk is idle waiting for more work. Let’s see what happens if we queue up more IOs:

PS C:\> C:\SQLIO\SQLIO.EXE -s10 -kR -fsequential -b512 -t1 -o2 -LS -BN X:\TestFile.DAT
sqlio v1.5.SG
using system counter for latency timings, 2337894 counts per second
1 thread reading for 10 secs from file X:\TestFile.DAT
        using 512KB sequential IOs
        enabling multiple I/Os per thread with 2 outstanding
        buffering set to not use file nor disk caches (as is SQL Server)
using current size: 40960 MB for file: X:\TestFile.DAT
initialization done
CUMULATIVE DATA:
throughput metrics:
IOs/sec:  1377.70
MBs/sec:   688.85
latency metrics:
Min_Latency(ms): 1
Avg_Latency(ms): 1
Max_Latency(ms): 2
histogram:
ms: 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+
%:  0 100  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0

OK. We are now up to 688 MB/sec with 2 outstanding IOs, and our average latency is still at the same 1 milliseconds per IO. You can also see that we now have a max latency of 2 milliseconds to complete, although in the histogram shows that most are still taking 1ms. Let’s double it up again to see what happens:

PS C:\> C:\SQLIO\SQLIO.EXE -s10 -kR -fsequential -b512 -t1 -o4 -LS -BN X:\TestFile.DAT
sqlio v1.5.SG
using system counter for latency timings, 2337894 counts per second
1 thread reading for 10 secs from file X:\TestFile.DAT
        using 512KB sequential IOs
        enabling multiple I/Os per thread with 4 outstanding
        buffering set to not use file nor disk caches (as is SQL Server)
using current size: 40960 MB for file: X:\TestFile.DAT
initialization done
CUMULATIVE DATA:
throughput metrics:
IOs/sec:  1376.70
MBs/sec:   688.35
latency metrics:
Min_Latency(ms): 1
Avg_Latency(ms): 2
Max_Latency(ms): 3
histogram:
ms: 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+
%:  0  0 67 33  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0

Well, at a queue depth of 4, we gained nothing (we are still at 688 MB/sec), but our latency is now solid at 2 milliseconds, with 33% of the IOs taking 3 milliseconds. Let’s give it one more bump to see what happens. Trying now 8 outstanding IOs:

PS C:\> C:\SQLIO\SQLIO.EXE -s10 -kR -fsequential -b512 -t1 -o8 -LS -BN X:\TestFile.DAT
sqlio v1.5.SG
using system counter for latency timings, 2337894 counts per second
1 thread reading for 10 secs from file X:\TestFile.DAT
        using 512KB sequential IOs
        enabling multiple I/Os per thread with 8 outstanding
        buffering set to not use file nor disk caches (as is SQL Server)
using current size: 40960 MB for file: X:\TestFile.DAT
initialization done
CUMULATIVE DATA:
throughput metrics:
IOs/sec:  1376.50
MBs/sec:   688.25
latency metrics:
Min_Latency(ms): 2
Avg_Latency(ms): 5
Max_Latency(ms): 6
histogram:
ms: 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+
%:  0  0  0  0  0 68 32  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0

As you can see, increasing the –o parameter is not helping. After we doubled the queue depth from 4 to 8, there was no improvement in throughput. All we did was more than double our latency to an average of 5 milliseconds, with many IOs taking 6 milliseconds. That’s when you know you’re queueing up too much IO.

So, it seems like 2 outstanding IOs is a reasonable number for this disk. Now we can see if we can gain by spreading this across multiple threads. What we want to avoide here is bottlenecking on a single CPU core, which is very common we doing lots and lots of IO. A simple experiment is to double the number of threads while halfing the queue depth.  Let’s now try 2 threads with 1 outstanding IOs each. This will give us the same 2 outstanding IOs total:

PS C:\> C:\SQLIO\SQLIO.EXE -s10 -kR -fsequential -b512 -t2 -o1 -LS -BN X:\TestFile.DAT
sqlio v1.5.SG
using system counter for latency timings, 2337894 counts per second
2 threads reading for 10 secs from file X:\TestFile.DAT
        using 512KB sequential IOs
        enabling multiple I/Os per thread with 1 outstanding
        buffering set to not use file nor disk caches (as is SQL Server)
using current size: 40960 MB for file: X:\TestFile.DAT
initialization done
CUMULATIVE DATA:
throughput metrics:
IOs/sec:  1377.90
MBs/sec:   688.95
latency metrics:
Min_Latency(ms): 1
Avg_Latency(ms): 1
Max_Latency(ms): 2
histogram:
ms: 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+
%:  0 100  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0

Well, it seems like using two threads here did not buy us anything. We’re still at about the same throughput and latency. That pretty much proves that 1 thread was enough for this kind of configuration and workload. This is not surprising for large IO. However, for smaller IO size, the CPU is more taxed and we might hit a single core bottleneck. Just in case, I looked at the CPU via Task Manager and confirmed we were only using 7% of the CPU and obviously none of the 4 cores were too busy.

 

7. Tune queue depth for for small IO

 

Performing the same tuning exercise for small IO is typically more interesting. For this one, we’ll automate things a bit using a little PowerShell scripting to run SQLIO in a loop and parse its output. This way we can try a lot of different options and see which one works best. This might take a while to run, though… Here’s a script that you can run from a PowerShell prompt, trying out many different queue depths:

1..64 | % { 
   $o = "-o $_"; 
   $r = C:\SQLIO\SQLIO.EXE -s10 -kR -frandom -b8 $o -t1 -LS -BN X:\testfile.dat
   $i = $r.Split("`n")[10].Split(":")[1].Trim()
   $m = $r.Split("`n")[11].Split(":")[1].Trim()
   $l = $r.Split("`n")[14].Split(":")[1].Trim()
   $o + ", " + $i + " iops, " + $m + " MB/sec, " + $l + " ms"
}

The  script basically runs SQLIO 64 times, each time using a different queue depth, from 1 to 64. The results from SQLIO are stored in the $r variable and parsed to show IOPs, throughput and latency on a single line. There is some fun string parsing there, leveraging the Split() function to break the output by line and then again to break each line in half to get the actual numbers. Here’s the sample output from my system:

-o 1, 9446.79 iops, 73.80 MB/sec, 0 ms
-o 2, 15901.80 iops, 124.23 MB/sec, 0 ms
-o 3, 20758.20 iops, 162.17 MB/sec, 0 ms
-o 4, 24021.20 iops, 187.66 MB/sec, 0 ms
-o 5, 26047.90 iops, 203.49 MB/sec, 0 ms
-o 6, 27559.10 iops, 215.30 MB/sec, 0 ms
-o 7, 28666.40 iops, 223.95 MB/sec, 0 ms
-o 8, 29320.90 iops, 229.06 MB/sec, 0 ms
-o 9, 29733.70 iops, 232.29 MB/sec, 0 ms
-o 10, 30337.00 iops, 237.00 MB/sec, 0 ms
-o 11, 30407.50 iops, 237.55 MB/sec, 0 ms
-o 12, 30609.78 iops, 239.13 MB/sec, 0 ms
-o 13, 30843.40 iops, 240.96 MB/sec, 0 ms
-o 14, 31548.50 iops, 246.47 MB/sec, 0 ms
-o 15, 30692.10 iops, 239.78 MB/sec, 0 ms
-o 16, 30810.40 iops, 240.70 MB/sec, 0 ms
-o 17, 31815.00 iops, 248.55 MB/sec, 0 ms
-o 18, 33115.19 iops, 258.71 MB/sec, 0 ms
-o 19, 31290.40 iops, 244.45 MB/sec, 0 ms
-o 20, 32430.40 iops, 253.36 MB/sec, 0 ms
-o 21, 33345.60 iops, 260.51 MB/sec, 0 ms
-o 22, 31634.80 iops, 247.14 MB/sec, 0 ms
-o 23, 31330.50 iops, 244.76 MB/sec, 0 ms
-o 24, 32769.40 iops, 256.01 MB/sec, 0 ms
-o 25, 34264.30 iops, 267.68 MB/sec, 0 ms
-o 26, 31679.00 iops, 247.49 MB/sec, 0 ms
-o 27, 31501.60 iops, 246.10 MB/sec, 0 ms
-o 28, 33259.40 iops, 259.83 MB/sec, 0 ms
-o 29, 33882.30 iops, 264.70 MB/sec, 0 ms
-o 30, 32009.40 iops, 250.07 MB/sec, 0 ms
-o 31, 31518.10 iops, 246.23 MB/sec, 0 ms
-o 32, 33548.30 iops, 262.09 MB/sec, 0 ms
-o 33, 33912.19 iops, 264.93 MB/sec, 0 ms
-o 34, 32640.00 iops, 255.00 MB/sec, 0 ms
-o 35, 31529.30 iops, 246.32 MB/sec, 0 ms
-o 36, 33973.50 iops, 265.41 MB/sec, 0 ms
-o 37, 34174.62 iops, 266.98 MB/sec, 0 ms
-o 38, 32556.50 iops, 254.34 MB/sec, 0 ms
-o 39, 31521.00 iops, 246.25 MB/sec, 0 ms
-o 40, 34337.60 iops, 268.26 MB/sec, 0 ms
-o 41, 34455.00 iops, 269.17 MB/sec, 0 ms
-o 42, 32265.00 iops, 252.07 MB/sec, 0 ms
-o 43, 31681.80 iops, 247.51 MB/sec, 0 ms
-o 44, 34017.69 iops, 265.76 MB/sec, 0 ms
-o 45, 34433.80 iops, 269.01 MB/sec, 0 ms
-o 46, 33213.19 iops, 259.47 MB/sec, 0 ms
-o 47, 31475.20 iops, 245.90 MB/sec, 0 ms
-o 48, 34467.50 iops, 269.27 MB/sec, 0 ms
-o 49, 34529.69 iops, 269.76 MB/sec, 0 ms
-o 50, 33086.19 iops, 258.48 MB/sec, 0 ms
-o 51, 31157.90 iops, 243.42 MB/sec, 1 ms
-o 52, 34075.30 iops, 266.21 MB/sec, 1 ms
-o 53, 34475.90 iops, 269.34 MB/sec, 1 ms
-o 54, 33333.10 iops, 260.41 MB/sec, 1 ms
-o 55, 31437.60 iops, 245.60 MB/sec, 1 ms
-o 56, 34072.69 iops, 266.19 MB/sec, 1 ms
-o 57, 34352.80 iops, 268.38 MB/sec, 1 ms
-o 58, 33524.21 iops, 261.90 MB/sec, 1 ms
-o 59, 31426.10 iops, 245.51 MB/sec, 1 ms
-o 60, 34763.19 iops, 271.58 MB/sec, 1 ms
-o 61, 34418.10 iops, 268.89 MB/sec, 1 ms
-o 62, 33223.19 iops, 259.55 MB/sec, 1 ms
-o 63, 31959.30 iops, 249.68 MB/sec, 1 ms
-o 64, 34760.90 iops, 271.56 MB/sec, 1 ms

As you can see, for small IOs, we got consistently better performance as we increased the queue depth for the first few runs. After 14 outstanding IOs, adding more started giving us very little improvement until things flatten out completely. As we keept adding more queue depth, all he had was more latency with no additional benefit in IOPS or throughput. Here’s that same data on a chart:

clip_image001

So, in this setup, we seem to start losing steam at around 10 outstanding IOs. However, I noticed in Task Manager that one core was really busy and our overall CPU utilization was at 40%.

clip_image002

In this quad-core system, any overall utilization above 25% could mean there was a core bottleneck when using a single thread. Maybe we can do better with multiple threads. Let’s try increasing the number of threads with a matching reduction of queue depth so we end up with the same number of total outstanding IOs.

$o = 32
$t = 1
While ($o -ge 1) { 
   $pt = "-t $t"; 
   $po = "-o $o"; 
   $r = C:\SQLIO\SQLIO.EXE -s10 -kR -frandom -b8 $po $pt -LS -BN X:\testfile.dat
   $i = $r.Split("`n")[10].Split(":")[1].Trim()
   $m = $r.Split("`n")[11].Split(":")[1].Trim()
   $l = $r.Split("`n")[14].Split(":")[1].Trim()
   $pt + “ “ + $po + ", " + $i + " iops, " + $m + " MB/sec, " + $l + " ms"
   $o = $o / 2
   $t = $t * 2
}

Here’s the output:

-t 1 -o 32, 32859.30 iops, 256.71 MB/sec, 0 ms
-t 2 -o 16, 35946.30 iops, 280.83 MB/sec, 0 ms
-t 4 -o 8, 35734.80 iops, 279.17 MB/sec, 0 ms
-t 8 -o 4, 35470.69 iops, 277.11 MB/sec, 0 ms
-t 16 -o 2, 35418.60 iops, 276.70 MB/sec, 0 ms
-t 32 -o 1, 35273.60 iops, 275.57 MB/sec, 0 ms

As you can see, in my system, adding a second thread improved things by about 10%, reaching nearly 36,000 IOPS. It seems like we were a bit limited by the performance of a single core. We call that being “core bound”. See below the more even per-core CPU utilization when using 2 threads.

clip_image003

However, 4 threads did not help and the overall CPU utilization was below 50% the whole time. Here’s the full SQLIO.EXE output with my final selected parameters for 8KB random IO in this configuration:

PS C:\> C:\SQLIO\SQLIO.EXE -s10 -kR -frandom -b8 -t2 -o16 -LS -BN X:\TestFile.DAT
sqlio v1.5.SG
using system counter for latency timings, 2337894 counts per second
2 threads reading for 10 secs from file X:\TestFile.DAT
        using 8KB random IOs
        enabling multiple I/Os per thread with 16 outstanding
        buffering set to not use file nor disk caches (as is SQL Server)
using current size: 40960 MB for file: X:\TestFile.DAT
initialization done
CUMULATIVE DATA:
throughput metrics:
IOs/sec: 35917.90
MBs/sec:   280.60
latency metrics:
Min_Latency(ms): 0
Avg_Latency(ms): 0
Max_Latency(ms): 4
histogram:
ms: 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+
%: 66 26  7  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0

For systems with more capable storage, it’s easier to get “core bound” and adding more threads can make a much more significant difference. As I mentioned, it’s important to to monitor the per-core CPU utilization via Task Manager or Performance monitor to look out for these bottlenecks.

 

8. Multiple runs are better than one

 

One thing you might have notice with SQLIO (or any other tools like it) is that the results are not always the same given the same parameters. For instance, one of our “-b8 -t2 -o16” runs yielded 35,946 IOPs while another gave us 35,917 IOPs. How can you tell which one is right? Ideally, once you settle on a specific set of parameters, you should run SQLIO a few times and average out the results. Here’s a sample PowerShell script to do it, using the last set of parameters we used for the 8KB IOs:

$ti=0
$tm=0
$tl=0
$tr=10
1..$tr | % {
   $r = C:\SQLIO\SQLIO.EXE -s10 -kR -frandom -b8 -t2 -o16 -LS -BN X:\TestFile.DAT
   $i = $r.Split("`n")[10].Split(":")[1].Trim()
   $m = $r.Split("`n")[11].Split(":")[1].Trim()
   $l = $r.Split("`n")[14].Split(":")[1].Trim()
   "Run " + $_ + " = " + $i + " IOPs, " + $m + " MB/sec, " + $l + " ms"
   $ti = $ti + $i
   $tm = $tm + $m
   $tl = $tl + $l
}
$ai = $ti / $tr
$am = $tm / $tr
$al = $tl / $tr
"Average = " + $ai + " IOPs, " + $am + " MB/sec, " + $al + " ms"

The script essentially runs SQLIO that number of times, totalling the numbers for IOPs, throughput and latency, so it can show an average at the end. The $tr variable represents the total number of runs desired. Variables starting with $t hold the totals. Variables starting with $a hold averages. Here’s a sample output:

Run 1 = 36027.40 IOPs, 281.46 MB/sec, 0 ms
Run 2 = 35929.80 IOPs, 280.70 MB/sec, 0 ms
Run 3 = 35955.90 IOPs, 280.90 MB/sec, 0 ms
Run 4 = 35963.30 IOPs, 280.96 MB/sec, 0 ms
Run 5 = 35944.19 IOPs, 280.81 MB/sec, 0 ms
Run 6 = 35903.60 IOPs, 280.49 MB/sec, 0 ms
Run 7 = 35922.60 IOPs, 280.64 MB/sec, 0 ms
Run 8 = 35949.19 IOPs, 280.85 MB/sec, 0 ms
Run 9 = 35979.30 IOPs, 281.08 MB/sec, 0 ms
Run 10 = 35921.60 IOPs, 280.63 MB/sec, 0 ms
Average = 35949.688 IOPs, 280.852 MB/sec, 0 ms

As you can see, there’s a bit of variance there and it’s always a good idea to capture multiple runs. You might want to run each iteration for a longer time, like 60 seconds each.

 

9. Performance Monitor

 

Performance Monitor is a tool built into Windows (client and server) that shows specific performance information for several components of the system. For local storage, you can look into details about the performance of physical disks, logical disks and Hyper-V virtual disks. For remote storage you can inspect networking, SMB file shares and much more. In any case, you want to keep an eye on your processors, as a whole and per core.

Here are a few counters we can inspect, for instance, while running that random 8KB IO workload we just finished investigating:

 

Counter SetCounterInstanceNotes
Logical DiskAvg. Disk Bytes/TransferSpecific disk and/or TotalAverage IO size
Logical DiskAvg. Disk Queue LengthSpecific disk and/or TotalAverage queue depth
Logical DiskAvg. Disk sec/TransferSpecific disk and/or TotalAverage latency
Logical DiskDisk Bytes/secSpecific disk and/or TotalThroughput
Logical DiskDisk Transfers/secSpecific disk and/or TotalIOPs
Processor% Processor TimeSpecific core and/or TotalTotal CPU utilization
Processor% Privileged TimeSpecific core and/or TotalCPU used by privileged system services
Processor% Interrupt TimeSpecific core and/or TotalCPU used to handle hardware interrupts

 

Performance Monitor defaults to a line graph view, but I personally prefer to use the report view (you can get to it from the line chart view by pressing CTRL-G twice). Here’s an example of what I see for my test system while running “C:\SQLIO\SQLIO.EXE -s10 -kR -frandom -b8 -t2 -o16 -LS -BN X:\TestFile.DAT”.

clip_image004

Note 1: Disk counters here are in bytes, base 10. That means that what SQLIO defines as 8KB shows here as 8,192 and the 282.49 MB/sec shows as 296,207,602 bytes/sec. So, for those concerned with the difference between a megabyte (MB) and a mebibyte (MiB), there’s some more food  for thought and debate.

Note 2: Performance Monitor, by default, updates the information once every second and you will sometimes see numbers that are slightly higher or slightly lower than the SQLIO full run average.

 

10. SQLIO and SMB file shares

 

You can use SQLIO to get the same type of performance information for SMB file shares. It is as simple as mapping the file share to a drive letter using the old “NET USE” command or the new PowerShell cmdlet “New-SmbMapping”. You can also use a UNC path directly instead of using drive letters. Here are a couple of examples:

PS C:\> C:\SQLIO\SQLIO.EXE -s10 -kR -fsequential -b512 -t1 -o3 -LS -BN \\FSC5-D\X$\TestFile.DAT
sqlio v1.5.SG
using system counter for latency timings, 2337892 counts per second
1 thread reading for 10 secs from file \\FSC5-D\X$\TestFile.DAT
        using 512KB sequential IOs
        enabling multiple I/Os per thread with 3 outstanding
        buffering set to not use file nor disk caches (as is SQL Server)
using current size: 40960 MB for file: \\FSC5-D\X$\TestFile.DAT
initialization done
CUMULATIVE DATA:
throughput metrics:
IOs/sec:  1376.40
MBs/sec:   688.20
latency metrics:
Min_Latency(ms): 2
Avg_Latency(ms): 2
Max_Latency(ms): 3
histogram:
ms: 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+
%:  0  0 100  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0

Notice I bumped up the queue depth a bit to get the same throughput as we were getting on the local disk. We’re at 2 milliseconds of latency here. As you can probably tell, this SMB configuration is using an RDMA network interface.

PS C:\> C:\SQLIO\SQLIO.EXE -s10 -kR -frandom -b8 -t2 -o24 -LS -BN \\FSC5-D\X$\TestFile.DAT
sqlio v1.5.SG
using system counter for latency timings, 2337892 counts per second
2 threads reading for 10 secs from file \\FSC5-D\X$\TestFile.DAT
        using 8KB random IOs
        enabling multiple I/Os per thread with 24 outstanding
        buffering set to not use file nor disk caches (as is SQL Server)
using current size: 40960 MB for file: \\FSC5-D\X$\TestFile.DAT
initialization done
CUMULATIVE DATA:
throughput metrics:
IOs/sec: 34020.69
MBs/sec:   265.78
latency metrics:
Min_Latency(ms): 0
Avg_Latency(ms): 0
Max_Latency(ms): 6
histogram:
ms: 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+
%: 44 33 15  6  2  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0

Again I increased the queue depth a bit to get the best IOPs in this configuration. This is also close to the local performance and average latency is still under 1 millisecond.

 

11. Performance Monitor and SMB shares

 

When using Performance Monitor to look at SMB Shares, you should use the “SMB Client Shares” set of performance counters. Here are the main counters to watch:

 

Counter SetCounterInstanceNotes
SMB Client SharesAvg. Data Bytes/RequestSpecific share and/or TotalAverage IO size
SMB Client SharesAvg. Data Queue LengthSpecific share and/or TotalAverage queue depth
SMB Client SharesAvg. Sec/Data RequestSpecific share and/or TotalAverage latency
SMB Client SharesData Bytes/secSpecific share and/or TotalThroughput
SMB Client SharesData Requests/secSpecific share and/or TotalIOPs

 

Also, here is a view of performance monitor while running the random 8KB workload shown above:

clip_image005

 

12. Conclusion

 

I hope you have learned how to use SQLIO to perform some storage testing of your own. I encourage you to use it to look at the performance of the new features in Windows Server 2012, like Storage Spaces and SMB 3.0. Let me know if you were able to try it out and feel free to share some of your experiments via blog comments.

Q and A: Is it possible to run SMB Direct from within a VM?

$
0
0

Question received via blog mail:

Jose-

I picked up a couple ConnectX-2 adapters and a cable off of ebay for cheap (about $300 for everything) to test out SMB Direct.  I followed your blog "Deploying Windows Server 2012 with SMB Direct (SMB over RDMA) and the Mellanox ConnectX-2/ConnectX-3 using InfiniBand – Step by Step" and got it working.  It sure is fast and easy to setup! 

Another technology I was looking to explore was SR-IOV in Hyper-V.  When I created the virtual switch using the HCA and enabled single-root IO on it, SMB Direct no longer worked from the host.  Are these two technologies (SMB Direct and SR-IOV) mutually exclusive?  The Get-SmbServerNetworkInterface does not show an RDMA capable interface after enabling the virtual switch. 

I was hoping SMB Direct would work from within a virtual machine.  More specifically I was hoping I'd be able to utilize network direct/NDKPI from within the VM and I was using SMB Direct to verify if this was possible or not. 

Long story short, is it possible to run SMB Direct from within a VM?

 

Answer:

Those are known limitations of RDMA and SMB Direct. If you enable SR-IOV for the NIC, you lose the RDMA capabilities. If you team the RDMA NICs, you lose the RDMA capabilities. If you connect the RDMA NIC to the virtual switch, you lose the RDMA capabilities.

Essentially SMB needs to have a direct line of sight to the RDMA hardware to do its magic. You include any additional layers in between, we can no longer program the NIC for RDMA.

If you want to use RDMA in your Hyper-V over SMB configuration, you need to have a NIC (or two for fault tolerance) used for RDMA and a NIC (or two for fault tolerance) that you connect to the virtual switch.

More details at http://blogs.technet.com/b/josebda/archive/2013/02/04/hyper-v-over-smb-performance-considerations.aspx

Q and A: Can I use SMB3 storage without RDMA?

$
0
0

Question received via e-mail:

Is it practical use SMB3 storage without RDMA or do we have a use case for production rather than development or test?
I thought RDMA would be essential for production deployment of Hyper-V SMB storage.

Answer:

RDMA is not a requirement for the Hyper-V over SMB scenario.
The most important things that RDMA can give you are lower latency and lower CPU utilization.

To give you an idea, without RDMA, I was able to keep a single 10GbE port busy in a 16-core/2-socket Romley system using a little over 10% of the CPU.
For many, using 10% of the CPU is OK in this case. With RDMA, it dropped to less than 5% of the CPU.
Those become much more important if you are using very high bandwidth, like multiple 10GbE, 40GbE (Ethernet) or 54GbIB (InfiniBand).
In those cases, without RDMA, you could end up using much more of your CPU just to do network IO. Not good.

To make a better estimate of your requirements, you need to consider:

  • Number of VMs per host
  • Number of virtual processors per VM
  • Average number of IOs per VM
  • Average size of the IOs from the VM
  • Number of physical cores and sockets per host
  • Physical network configuration (type/speed/count of ports)

With that we can think of the expected load on the CPU and on the network, and how important using RDMA would be.
 

Viewing all 182 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>