SQL Server parallelism optimization

The performance of an SQL Server database instance can often be improved by small tweaks. Performance might also be hindered by simple oversights. In fact, some SQL Server parallelism settings have suboptimal default values. Moreover, they have to be re-tuned after an hardware upgrade. Other issues might arise from inadvertent hardware configurations, especially when SQL Server is hosted on a virtual machine (VM).

In particular, it is beneficial to customize these instance settings:
  • maximum degree of parallelism (MaxDoP)
  • cost threshold for parallelism (CTFP)

Starting from BigFix 10.0.2, you can use the /checksqlserverparallelism BESAdmin command to check if the MaxDoP and CTFP settings of your database instance are configured appropriately, and to detect other issues described later.

SQL Server MaxDoP values for best performance

Microsoft recommends MaxDoP settings in Configure the max degree of parallelism Server Configuration Option.

Server configuration Number of processors SQL Server 2008-2014 (10.x-12.x) SQL Server 2016 (13.x) and newer
Server with single NUMA node Fewer than or equal to 8 logical processors Keep MAXDOP at or below the number of logical processor Keep MAXDOP at or below the number of logical processors
Server with single NUMA node More than 8 logical processors Keep MAXDOP at 8 Keep MAXDOP at 8
Server with multiple* NUMA nodes Fewer than or equal to 8 logical processors per NUMA node Keep MAXDOP at or below the number of logical processors per NUMA node Keep MAXDOP at or below the number of logical processors per NUMA node
Server with multiple* NUMA nodes More than 8 logical processors per NUMA node Keep MAXDOP at 8 Keep MAXDOP at half the number of logical processors per NUMA node with a MAX value of 16
* The number of NUMA nodes refers to the quantity of software NUMA (soft-NUMA) nodes, if the soft-NUMA feature is enabled, and to the total hardware NUMA nodes otherwise.

On SQL Server 2016 (13.x) and later versions, the soft-NUMA feature is enabled by default and set to automatically split hardware NUMA nodes with more than 8 logical processors into smaller soft-NUMA nodes.The soft-NUMA feature can be configured so that soft-NUMA nodes are created manually, or it can be completely disabled. For more information, refer to Soft-NUMA (SQL Server).

MaxDoP changes do not require that you restart the SQL Server 2012 or 2019.

SQL Server CTFP values for best performance

Microsoft does not provide recommendations for setting the CTFP.

A common suggestion is setting it to a value between 15 and 50, with the understanding that the best value for it depends on the workload.

For the database workload that the BigFix Server generates, test results show that setting CTFP to 50 yields better performance than leaving it at 5.

CTFP changes do not require that you restart SQL Server 2012 and later versions such as 2019.

Troubleshooting scenario 1: Under-utilization of licensed cores in a VM

Because of a licensing restriction, SQL Server might not be able to use all available CPU cores.

In particular, the license of some SQL Server editions (Express, Web and Standard) is "limited to the lesser of n sockets or m cores."

For example, a database instance of SQL Server 2019 Express Edition is "limited to the lesser of 1 socket or 4 cores" for its maximum compute capacity. See the Scale Limits section of Editions and supported features of SQL Server 2019 (15.x). Because of that, this edition of SQL Server can use only 4 cores on the same socket.

This limitation might lead to unexpected issues when SQL Server is installed on a virtual machine. In fact, using a common VM configuration (that uses many virtual sockets with few cores per socket) can severely limit the number of cores that SQL Server can use, because of the SQL Server license limitations.

In the example, SQL Server 2019 Express can use up to 4 cores, but if it is installed on a VM with 4 cores and 4 sockets (1 core per socket), it can use a single core.

In another example, the SQL Server 2019 Web license lets you use "the lesser of 4 sockets or 16 cores".

If your VM has 16 (virtual) sockets and 1 core per socket, you can only use 4 cores out of 16. That is, you will use 4 sockets with 1 core each.

However, SQL Server can use all cores if you change your VM CPU configuration to use, for example, 4 sockets and 4 cores per socket.

When you create a new VM, carefully choose the number of CPU sockets and the number of total CPU cores.

If you are using the VMware vSphere Client, when creating a VM or editing its settings, you can expand the "CPU" menu of the "Virtual Hardware" tab to configure the number of "cores per socket", which determines the number of sockets.

To detect whether SQL Server is running in a VM, you can run this query and check for the following result:

virtual_machine_type >= 1.
SELECT virtual_machine_type
FROM sys.dm_os_sys_info
To detect how many cores (logical processors) SQL Server can use, run this query:
select COUNT(*) AS sqlUsedLogicProcs 
from sys.dm_os_schedulers 
where status = 'VISIBLE ONLINE'
The total number of logical processors that SQL Server can detect (but not necessarily use) is returned by this query:
SELECT cpu_count AS LogicalCpuCount
FROM sys.dm_os_sys_info
To find the active SQL Server license (edition), run this query:
SELECT SERVERPROPERTY( 'edition' )

Microsoft provides complete list of returned values in SERVERPROPERTY (Transact-SQL). Ignore the Azure values.

Troubleshooting scenario 2: Uneven distribution of used cores

Even when SQL Server can use all licensed cores, performances might not be optimal.

Ideally, SQL Server is licensed to use all cores on the (virtual or physical) hardware of the computer.

If SQL Server cannot use all cores on the computer, the impact on performance is smaller when the cores it can use are evenly distributed among the hardware NUMA nodes of that computer. If SQL Server can use only a small portion of the available cores, which is not ideal, it can use only the cores on the NUMA nodes that host the n sockets that it is licensed to use.

On physical hardware, the number of hardware NUMA nodes usually matches the number of sockets, or, less commonly, is a multiple of it. For example, a physical socket can contain one or more NUMA nodes.

On virtual hardware, the opposite can happen. The number of hardware NUMA nodes can be smaller than the number sockets. That is, multiple sockets can be part of the same hardware NUMA node.

On Windows, you can use Resource monitor (resmon.exe) to check the number of hardware NUMA nodes on your computer.

In the "CPU" tab, the panel on the right shows a graph for each NUMA node and CPU processor.

If the panel shows only CPU graphs, that means there is only one NUMA node that hosts all CPUs.

Alternatively, the following query returns the total number of hardware NUMA nodes on the computer that hosts SQL Server:
select COUNT( DISTINCT memory_node_id ) as hwNumaNodes 
from sys.dm_os_memory_nodes 
where memory_node_id <> 64

In SQL Server 2016 and later, the automatic soft-NUMA feature splits virtual or physical hardware NUMA nodes with more than 8 cores into multiple soft-NUMA nodes. This split does not necessarily solve the performance degradation that unevenly assigned cores to different hardware NUMA nodes causes; it might in fact only mask it.

You can use this query to detect the number of logical processors that are used on the software or hardware NUMA nodes in use:
select COUNT(*) as usedNumaNodes, 
MIN(online_scheduler_count) as minUsedLogicProcsPerNumaNode, 
MAX(online_scheduler_count) as maxUsedLogicProcsPerNumaNode 
from sys.dm_os_nodes 
where online_scheduler_count > 0 and node_state_desc not like '%DAC%'

An example scenario is SQL Server Web on a computer with 2 sockets and 20 cores (10 per socket).

Remember: SQL Server Web can use the lesser of 4 sockets or 16 cores.

In this setup, SQL Server Web can use all the 16 licensed cores out of the total 20 that the system provides.

However, how the 16 used cores are chosen among the total of 20 can make a difference in performance.

Assuming there is a NUMA node for each socket, the used cores might be unevenly distributed, like this example:

  • 10 cores in NUMA node 0
  • 6 cores in NUMA node 1

This example shows a better distribution of the used cores:

  • 8 cores in NUMA node 0
  • 8 cores in NUMA node 1
The distribution of used cores between NUMA nodes depends on how the CPU affinity mask is set. It can be changed using this command:
ALTER SERVER
CONFIGURATION SET PROCESS AFFINITY CPU

For more information, refer to the "Setting process affinity" section of this Microsoft article: ALTER SERVER CONFIGURATION (Transact-SQL)

Gathering additional information

To gather additional information, you use BESAdmin and pass the /extrainfo flag to the /checksqlserverparallelism command.

Also, you can inspect the SQL Server logs to extract useful details.

If no output is returned, the log lines of interest might be deleted by the log rotation.

This query inspects the logs and looks for the total number of sockets and cores and the quantity of cores that is used in accordance with the SQL Server license:
SET NOCOUNT ON; 
DECLARE @logData TABLE( LogDate DATETIME, ProcInfo NVARCHAR(64), LogText NVARCHAR(1024) ); 
INSERT INTO @logData 
EXEC sys.xp_readerrorlog 0, 1, N'SQL Server detected ', N' socket', null, null, N'DESC'; 
SELECT TOP 1 [LogText] 
FROM @logData;

Example output:

SQL Server detected 1 sockets with 2 cores per socket and 2 logical processors per socket, 2 total logical processors; using 2 logical processors based on SQL Server licensing. This is an informational message; no user action is required.

This query inspects the logs and looks for the CPU mask that is used to choose the CPU cores to use on each NUMA node:
SET NOCOUNT ON; 
DECLARE @logData TABLE( LogDate DATETIME, ProcInfo NVARCHAR(64), LogText NVARCHAR(MAX) ); 
INSERT INTO @logData 
EXEC sys.xp_readerrorlog 0, 1, N'Node configuration: ', N' CPU mask: ', null, null, N'DESC'; 
SELECT [LogText] 
FROM @logData;
LogText

Example output:

Node configuration: node 0: CPU mask: 0x0000000000000003:0 Active CPU mask: 0x0000000000000003:0. This message provides a description of the NUMA configuration for this computer. This is an informational message only. No user action is required.

If the CPU mask was set manually, you can view it by using SQL Server Management Studio.

Right-click your DB instance, click Properties, and then click Processors.