Note
Access to this page requires authorization. You can try signing in or .
Access to this page requires authorization. You can try .
Troubleshoot a slow query on a dedicated SQL pool
Applies to: Azure Synapse Analytics
This article helps you identify the reasons and apply mitigations for common performance issues with queries on an Azure Synapse Analytics dedicated SQL pool.
Follow the steps to troubleshoot the issue. The first three steps walk you through collecting telemetry, which describes the lifecycle of a query. The references at the end of the article help you analyze potential opportunities found in the data collected.
Important
Most of the reported performance issues are caused by:
- Outdated statistics
- Unhealthy clustered columnstore indexes (CCIs)
To save troubleshooting time, make sure that the statistics are created and up-to-date, and rebuild clustered columnstore indexes in the dedicated SQL pool.
Step 1: Identify the request_id (also known as QID)
The request_id of the slow query is required to research potential reasons for a slow query. Use the following script as a starting point for identifying the query you want to troubleshoot. Once the slow query is identified, note down the request_id value.
First, monitor the active queries. This query is ordered by the newest rows first.
-- Monitor active queries
SELECT *
FROM sys.dm_pdw_exec_requests
WHERE [status] NOT IN ('Completed', 'Failed', 'Cancelled')
AND session_id <> session_id()
-- AND [label] = '<YourLabel>'
-- AND resource_allocation_percentage is not NULL
ORDER BY submit_time DESC;
Then, find the active queries that have the longest run time, starting with the longest-running queries.
-- Find top 10 longest running queries
SELECT TOP 10 *
FROM sys.dm_pdw_exec_requests
ORDER BY total_elapsed_time DESC;
To better target the slow queries, use the following tips when you run the script:
- Sort by either
submit_time DESCortotal_elapsed_time DESCto have the longest-running queries present at the top of the result set. - Use
OPTION(LABEL='<YourLabel>')in your queries and then filter thelabelcolumn to identify them. - Consider filtering out any QIDs that don't have a value for
resource_allocation_percentagewhen you know the target statement is contained in a batch. Be cautious with this filter as it might also filter out some queries that are being blocked by other sessions.
Step 2: Determine where the query is taking time
Run the following script to find the step that might cause the performance issue of the query. Update the variables in the script with the values described in the following table. Change the @ShowActiveOnly value to 0 to get the full picture of the distributed plan. Take note of the StepIndex, Phase, and Description values of the slow step identified from the result set.
| Parameter | Description |
|---|---|
@QID |
The request_id value is obtained in Step 1. |
@ShowActiveOnly |
Setting the value to 0 shows all steps for the query.Setting the value to 1 shows only the currently active step. |
DECLARE @QID AS VARCHAR (16) = '<request_id>', @ShowActiveOnly AS BIT = 1;
-- Retrieve session_id of QID
DECLARE @session_id AS VARCHAR (16) = (SELECT session_id
FROM sys.dm_pdw_exec_requests
WHERE request_id = @QID);
-- Blocked by Compilation or Resource Allocation (Concurrency)
SELECT @session_id AS session_id, @QID AS request_id, -1 AS [StepIndex], 'Compilation' AS [Phase],
'Blocked waiting on '
+ MAX(CASE WHEN waiting.type = 'CompilationConcurrencyResourceType' THEN 'Compilation Concurrency'
WHEN waiting.type LIKE 'Shared-%' THEN ''
ELSE 'Resource Allocation (Concurrency)' END)
+ MAX(CASE WHEN waiting.type LIKE 'Shared-%' THEN ' for ' + REPLACE(waiting.type, 'Shared-', '')
ELSE '' END) AS [Description],
MAX(waiting.request_time) AS [StartTime], GETDATE() AS [EndTime],
DATEDIFF(ms, MAX(waiting.request_time), GETDATE())/1000.0 AS [Duration],
NULL AS [Status], NULL AS [EstimatedRowCount], NULL AS [ActualRowCount], NULL AS [TSQL]
FROM sys.dm_pdw_waits waiting
WHERE waiting.session_id = @session_id
AND ([type] LIKE 'Shared-%'
OR [type] IN ('ConcurrencyResourceType', 'UserConcurrencyResourceType', 'CompilationConcurrencyResourceType'))
AND [state] = 'Queued'
GROUP BY session_id
-- Blocked by another query
UNION ALL
SELECT @session_id AS session_id,
@QID AS request_id,
-1 AS [StepIndex],
'Compilation' AS [Phase],
'Blocked by ' + blocking.session_id + ':' + blocking.request_id + ' when requesting ' + waiting.type + ' on ' + QUOTENAME(waiting.object_type) + waiting.object_name AS [Description],
waiting.request_time AS [StartTime],
GETDATE() AS [EndTime],
DATEDIFF(ms, waiting.request_time, GETDATE()) / 1000.0 AS [Duration],
NULL AS [Status],
NULL AS [EstimatedRowCount],
NULL AS [ActualRowCount],
COALESCE (blocking_exec_request.command, blocking_exec_request.command2) AS [TSQL]
FROM sys.dm_pdw_waits AS waiting
INNER JOIN
sys.dm_pdw_waits AS blocking
ON waiting.object_type = blocking.object_type
AND waiting.object_name = blocking.object_name
INNER JOIN
sys.dm_pdw_exec_requests AS blocking_exec_request
ON blocking.request_id = blocking_exec_request.request_id
WHERE waiting.session_id = @session_id
AND waiting.state = 'Queued'
AND blocking.state = 'Granted'
AND waiting.type != 'Shared'
-- Request Steps
UNION ALL
SELECT @session_id AS session_id,
@QID AS request_id,
step_index AS [StepIndex],
'Execution' AS [Phase],
operation_type + ' (' + location_type + ')' AS [Description],
start_time AS [StartTime],
end_time AS [EndTime],
total_elapsed_time / 1000.0 AS [Duration],
[status] AS [Status],
CASE WHEN estimated_rows > -1 THEN estimated_rows END AS [EstimatedRowCount],
CASE WHEN row_count > -1 THEN row_count END AS [ActualRowCount],
command AS [TSQL]
FROM sys.dm_pdw_request_steps
WHERE request_id = @QID
AND [status] = CASE @ShowActiveOnly WHEN 1 THEN 'Running' ELSE [status] END
ORDER BY StepIndex;
Step 3: Review step details
Run the following script to review the details of the step identified in the previous step. Update the variables in the script with the values described in the following table. Change the @ShowActiveOnly value to 0 to compare all distribution timings. Take note of the wait_type value for the distribution that might cause the performance issue.
| Parameter | Description |
|---|---|
@QID |
The request_id value is obtained in Step 1. |
@StepIndex |
The StepIndex value is identified in Step 2. |
@ShowActiveOnly |
Setting the value to 0 shows all distributions for the given StepIndex value.Setting the value to 1 shows only the currently active distributions for the given StepIndex value. |
DECLARE @QID VARCHAR(16) = '<request_id>', @StepIndex INT = <StepIndex>, @ShowActiveOnly BIT = 1;
WITH dists
AS (SELECT request_id, step_index, 'sys.dm_pdw_sql_requests' AS source_dmv,
distribution_id, pdw_node_id, spid, 'NativeSQL' AS [type], [status],
start_time, end_time, total_elapsed_time, row_count
FROM sys.dm_pdw_sql_requests
WHERE request_id = @QID AND step_index = @StepIndex
UNION ALL
SELECT request_id, step_index, 'sys.dm_pdw_dms_workers' AS source_dmv,
distribution_id, pdw_node_id, sql_spid AS spid, [type],
[status], start_time, end_time, total_elapsed_time, rows_processed as row_count
FROM sys.dm_pdw_dms_workers
WHERE request_id = @QID AND step_index = @StepIndex
)
SELECT sr.step_index, sr.distribution_id, sr.pdw_node_id, sr.spid,
sr.type, sr.status, sr.start_time, sr.end_time,
sr.total_elapsed_time, sr.row_count, owt.wait_type, owt.wait_time
FROM dists sr
LEFT JOIN sys.dm_pdw_nodes_exec_requests owt
ON sr.pdw_node_id = owt.pdw_node_id
AND sr.spid = owt.session_id
AND ((sr.source_dmv = 'sys.dm_pdw_sql_requests'
AND sr.status = 'Running') -- sys.dm_pdw_sql_requests status
OR (sr.source_dmv = 'sys.dm_pdw_dms_requests'
AND sr.status not LIKE 'Step[CE]%')) -- sys.dm_pdw_dms_workers final statuses
WHERE sr.request_id = @QID
AND ((sr.source_dmv = 'sys.dm_pdw_sql_requests' AND sr.status =
CASE WHEN @ShowActiveOnly = 1 THEN 'Running' ELSE sr.status END)
OR (sr.source_dmv = 'sys.dm_pdw_dms_workers' AND sr.status NOT LIKE
CASE WHEN @ShowActiveOnly = 1 THEN 'Step[CE]%' ELSE '' END))
AND sr.step_index = @StepIndex
ORDER BY distribution_id
Step 4: Diagnose and mitigate
Compilation phase issues
According to the
Descriptionvalues obtained in Step 2, check the relevant section for more information from the following table.Description Common Cause Compilation ConcurrencyBlocked: Compilation Concurrency Resource Allocation (Concurrency)Blocked: resource allocation If the query is in "Running" status identified in Step 1, but there's no step information in Step 2, check the cause that best fits your scenario to get more information from the following table.
Scenario Common Cause Statement contains complex join-filter logic or performs joins in WHEREclauseComplex query or older JOIN syntax Statement is a long-running DROP TABLEorTRUNCATE TABLEstatementLong-running DROP TABLE or TRUNCATE TABLE CCIs have high percentage of deleted or open rows (see Optimizing clustered columnstore indexes) Unhealthy CCIs (generally) Analyze the result set in Step 1 for one or more
CREATE STATISTICSstatements executed immediately after the slow query submission. Check the cause that best fits your scenario from the following table.Scenario Common Cause Statistics created unexpectedly Delay from auto-create statistics Statistics creation failed after 5 minutes Auto-create statistics timeouts
Execution phase issues
Use the following table to analyze the result set in Step 2. Determine your scenario and check the common cause for detailed information and the possible mitigation steps.
Scenario Common Cause EstimatedRowCount/ActualRowCount< 25%Inaccurate estimates The Descriptionvalue indicatesBroadcastMoveOperationand the query references a replicated table.Uncached replicated tables 1. @ShowActiveOnly= 0
2. High or unexpected number of steps (step_index) is observed.
3. Data types of joiner columns aren't identical between tables.Mismatched data type/size 1. The Descriptionvalue indicatesHadoopBroadcastOperation,HadoopRoundRobinOperationorHadoopShuffleOperation.
2. Thetotal_elapsed_timevalue of a givenstep_indexis inconsistent between executions.Ad hoc external table queries Check the
total_elapsed_timevalue obtained in Step 3. If it's significantly higher in a few distributions in a given step, follow the these steps:Check the data distribution for every table referenced in the
TSQLfield for associatedstep_idby running the following command against each:DBCC PDW_SHOWSPACEUSED(<table>);If <minimum rows value>/<maximum rows value> > 0.1, go to Data skew (stored).
Otherwise, go to In-flight data skew.
Wait type issues
If none of the above common issues apply to your query, the Step 3 data affords the opportunity to determine which wait types (in wait_type and wait_time) are interfering with query processing for the longest-running step. There are a large number of wait types, and they're grouped into related categories due to similar mitigations. Follow these steps to locate the wait category of your query step:
- Identify the
wait_typein Step 3 that is taking the most time. - Locate the wait type in wait categories mapping table and identify the wait category it included in.
- Expand the section related to the wait category from the following list for recommended mitigations.
Feedback
Was this page helpful?
