SQL Server parallelism optimization

The performances of an SQL Server database instance can often be improved by small tweaks. They might also be hindered by simple oversights. In fact, some SQL Server parallelism settings have suboptimal default values. Moreover, they need to be re-tuned after an hardware upgrade. Other issues might arise from naive hardware configurations, especially when SQL Server is hosted on a virtual machine (VM).

In particular, it is beneficial to customize these instance settings:
  • maximum degree of parallelism (MaxDoP)
  • cost threshold for parallelism (CTFP)

Starting from BigFix 10.0.2, you can use the BESAdmin command /checksqlserverparallelism to check if the MaxDoP and CTFP settings of your database instance are configured appropriately, and to detect other issues described below.

SQL Server recommended MaxDoP values

The latest Microsoft recommendations on MaxDoP are available here.

Server configuration Number of processors SQL Server 2008-2014 (10.x-12.x) SQL Server 2016 (13.x) and newer
Server with single NUMA node Less than or equal to 8 logical processors Keep MAXDOP at or below # of logical processor Keep MAXDOP at or below # of logical processors
Server with single NUMA node Bigger than 8 logical processors Keep MAXDOP at 8 Keep MAXDOP at 8
Server with multiple* NUMA nodes Less than or equal to 8 logical processors per NUMA node Keep MAXDOP at or below # of logical processors per NUMA node Keep MAXDOP at or below # of logical processors per NUMA node
Server with multiple* NUMA nodes Bigger than 8 logical processors per NUMA node Keep MAXDOP at 8 Keep MAXDOP at half the number of logical processors per NUMA node with a MAX value of 16
* The number of NUMA nodes refers to the quantity of software NUMA (soft-NUMA) nodes, if the soft-NUMA feature is enabled, and to the total hardware NUMA nodes otherwise.

On SQL Server 2016 (13.x) and higher versions, the soft-NUMA feature is enabled by default and set to automatically split hardware NUMA nodes with more than 8 logical processors into smaller soft-NUMA nodes.The soft-NUMA feature can be configured so that soft-NUMA nodes are created manually, or it can be completely disabled. More info here.

MaxDoP changes do not require a restart in SQL Server 2012 and 2019.

SQL Server recommended CTFP values

Microsoft does not provide official recommendations on setting the CTFP.

A common suggestion is setting it to a value between 15 and 50, with the understanding that the best value for it depends on the workload.

For the DB workload generated by the BigFix Server, we have verified that setting CTFP to 50 yields better performance than leaving it to 5, so 50 is our current recommendation.

CTFP changes do not require a restart in SQL Server 2012 and higher versions such as 2019.

Troubleshooting scenario 1: Under-utilization of licensed cores in a VM

Due to licensing restriction, SQL Server may not be able to use all available CPU cores.

In particular, the license of some SQL Server editions (Express, Web and Standard) is "limited to lesser of n sockets or m cores".

For example, a database instance of SQL Server 2019 Express Edition is "limited to lesser of 1 socket or 4 cores" for its maximum compute capacity, as stated here. Because of that, this edition of SQL Server can only use 4 cores on the same socket.

This limitation may lead to unexpected issues when SQL Server is installed on a virtual machine. In fact, a common VM configuration (i.e. using many virtual sockets with few cores per socket) can severely limit the number of cores that SQL Server can use, due to how the SQL Server license applies.

In our example, SQL Server 2019 Express can use up to 4 cores, but if it is installed on a VM with 4 cores and 4 sockets (1 core per socket), it will only be able to use a single core.

In another example, the SQL Server 2019 Web license lets you use "the lesser of 4 sockets or 16 cores".

If your VM has 16 (virtual) sockets and 1 core per socket, you can only use 4 cores out of 16. That is, you will use 4 sockets with 1 core each.

However, SQL Server will be able to use all cores if you change your VM CPU configuration to use, for example, 4 sockets and 4 cores per socket.

When creating a new VM, carefully carefully choose to the number of CPU sockets as well as the number of total CPU cores.

If you are using the VMware vSphere Client, when creating a VM or editing its settings, you can expand the "CPU" menu of the "Virtual Hardware" tab to configure the number of "Cores per Socket", which determines the number of sockets.

To detect if SQL Server is running in a VM, you can run this query and check if virtual_machine_type >= 1.
SELECT virtual_machine_type
FROM sys.dm_os_sys_info
To detect how many cores (logical processors) SQL Server can use, run this query.
select COUNT(*) AS sqlUsedLogicProcs 
from sys.dm_os_schedulers 
where status = 'VISIBLE ONLINE'
The total number of logical processors that SQL Server can see (but not necessarily use) is returned by this query
SELECT cpu_count AS LogicalCpuCount
FROM sys.dm_os_sys_info
To query the active SQL Server license (edition)
SELECT SERVERPROPERTY( 'edition' )

Here is the complete list of returned values (ignore the Azure ones).

Troubleshooting scenario 2: Uneven distribution of used cores

Even when SQL Server can use all licensed cores, performances might not be optimal.

Ideally, SQL Server should be licensed to use all cores on the (virtual or physical) hardware of the computer.

If SQL Server cannot use all cores on the computer, the impact on performance will be smaller when the cores it can use are evenly distributed among the hardware NUMA nodes of that computer. If SQL Server can only use a small portion of the available cores, which is far from ideal, it will only be able to use cores on the NUMA nodes hosting the n sockets that it is licensed to use.

On physical hardware, the number of hardware NUMA nodes usually matches the number of sockets, or, less commonly, is a multiple of it, for example a physical socket can contain one or more NUMA nodes.

On virtual hardware, the opposite can happen. The number of hardware NUMA nodes can be smaller than the number sockets, i.e. multiple sockets can be part of the same hardware NUMA node.

On Windows, you can use Resource monitor (resmon.exe) to check the number of hardware NUMA nodes on your computer.

In the "CPU" tab, the panel on the right should have a graph for each NUMA node and CPU processor.

If it only shows CPU graphs, that means there is only one NUMA node hosting all CPUs.

Alternatively, this query should return the total number of hardware NUMA nodes on the computer hosting SQL Server.
select COUNT( DISTINCT memory_node_id ) as hwNumaNodes 
from sys.dm_os_memory_nodes 
where memory_node_id <> 64

In SQL Server >= 2016, the automatic soft-NUMA feature will split (virtual or physical) hardware NUMA nodes with more than 8 cores into multiple soft-NUMA nodes. This does not necessarily solve the performance degradation caused by the used cores being unevenly assigned to different hardware NUMA nodes, and may in fact only mask it.

You can use this query to detect the number of logical processors used on the used (software or hardware) NUMA nodes.
select COUNT(*) as usedNumaNodes, 
MIN(online_scheduler_count) as minUsedLogicProcsPerNumaNode, 
MAX(online_scheduler_count) as maxUsedLogicProcsPerNumaNode 
from sys.dm_os_nodes 
where online_scheduler_count > 0 and node_state_desc not like '%DAC%'

An example scenario is SQL Server Web on a computer with 2 sockets and 20 cores (10 per socket).

Remember that SQL Server Web can use the lesser of 4 sockets or 16 cores.

In this setup, it will be able to use all the 16 _licensed_ cores, out of the total 20 provided by the system.

However, how the 16 used cores are chosen among the total 20 can make a difference in performances.

Assuming there is a NUMA node for each socket, the used cores could be unevenly distributed, like this:

  • 10 cores in NUMA node 0
  • 6 cores in NUMA node 1

A better distribution of the used cores would be:

  • 8 cores in NUMA node 0
  • 8 cores in NUMA node 1
The distribution of used cores between NUMA nodes depends on how the CPU affinity mask is set. It can be changed using the command
ALTER SERVER
CONFIGURATION SET PROCESS AFFINITY CPU

https://docs.microsoft.com/en-us/sql/t-sql/statements/alter-server-configuration-transact-sql?view=sql-server-2017#Affinity

Gathering additional information

To gather additional information, you use BESAdmin and pass the /extrainfo flag to the /checksqlserverparallelism command.

Also, you can inspect the SQL Server logs to extract useful details.

If no output is returned, the desired log lines might have been removed by the log rotation.

This query inspects the logs looking for the total number of sockets and cores, and the quantity of cores used in accordance with the SQL Server license.
SET NOCOUNT ON; 
DECLARE @logData TABLE( LogDate DATETIME, ProcInfo NVARCHAR(64), LogText NVARCHAR(1024) ); 
INSERT INTO @logData 
EXEC sys.xp_readerrorlog 0, 1, N'SQL Server detected ', N' socket', null, null, N'DESC'; 
SELECT TOP 1 [LogText] 
FROM @logData;

Example output:

SQL Server detected 1 sockets with 2 cores per socket and 2 logical processors per socket, 2 total logical processors; using 2 logical processors based on SQL Server licensing. This is an informational message; no user action is required.

This query inspects the logs looking for the CPU mask used to choose the CPU cores to use on each NUMA node.
SET NOCOUNT ON; 
DECLARE @logData TABLE( LogDate DATETIME, ProcInfo NVARCHAR(64), LogText NVARCHAR(MAX) ); 
INSERT INTO @logData 
EXEC sys.xp_readerrorlog 0, 1, N'Node configuration: ', N' CPU mask: ', null, null, N'DESC'; 
SELECT [LogText] 
FROM @logData;
LogText

Example output:

Node configuration: node 0: CPU mask: 0x0000000000000003:0 Active CPU mask: 0x0000000000000003:0. This message provides a description of the NUMA configuration for this computer. This is an informational message only. No user action is required.

If the CPU mask was set manually, you can view it using SQL Server Management Studio.

Right-click on your DB instance, click Properties, click Processors.