Importance of Statistics in Query Performance


In this article, you will see how statistics affect SQL query optimization and indirectly affects query performance.

View Statistics :

Statistics are very important to optimize your queries and get faster results. However, a SQL Server will automatically create them for you. There are different ways to view the details of the statistics objects. Basically, AUTO_CREATE_STATISTICS is enabled by default for each database. Also, you can disable it with an ALTER DATABASE command. 

For example, as per below query, you can use DBCC SHOW_STATISTICS command which displays header, histogram and density vector based on data stored in the statistics object.

DBCC SHOW_STATISTICS("SalesOrderDetail", NCI_SalesOrderDetail_ProductID); 
Stats-Importance of Statistics in Query Performance
DBCC SHOW_STATISTICS Command Output

Let’s start understanding the Importance of Statistics in Query Performance with the following example.

  • First, run the below-seen query to create a new database. Then set its AUTO_CREATE_STATISTICS and AUTO_UPDATE_STATISTICS properties to OFF. Due to these automatic statistics options, the creation and updating do not happen on the tables in this database. Then create a table “SalesOrderDetail” and load data from the “SalesOrderDetail” table of the “AdventureWorks” database.
CREATE DATABASE [Statistics]
GO
ALTER DATABASE [Statistics] SET AUTO_CREATE_STATISTICS OFF
ALTER DATABASE [Statistics] SET AUTO_UPDATE_STATISTICS OFF
ALTER DATABASE [Statistics] SET AUTO_UPDATE_STATISTICS_ASYNC OFF
GO  

USE [Statistics]
GO
CREATE TABLE [SalesOrderDetail](
        [OrderID] [int] NOT NULL,
        [OrderDetailID] [int] NOT NULL,
        [CarrierTrackingNumber] [nvarchar](25) NULL,
        [OrderQty] [smallint] NOT NULL,
        [ProductID] [int] NOT NULL,
        [SpecialOfferID] [int] NOT NULL,
        [UnitPrice] [money] NOT NULL,
        [UnitPriceDiscount] [money] NOT NULL,
        [LineTotal] money NOT NULL,  
[rowguid]     [uniqueidentifier] ROWGUIDCOL  NOT NULL,      [ModifiedDate] [datetime] NOT NULL,
) ON [PRIMARY]
GO
INSERT INTO [OrderDetail]
SELECT * FROM [AdventureWork].[Sales].[OrderDetail]
GO 10
  • Now let’s run these two queries and see their execution plan. See the yellow exclamation mark on the “Table Scan” operator which indicates the missing statistics. Then notice “Actual Number of Rows” and “Estimated Number of Rows” that there is a big difference. That means the execution plan used for query execution was not optimal.
select * from [dbo].[SalesOrderDetail] where   ProductID <= 800;
Query Results
select * from [dbo].[SalesOrderDetail] where   ProductID >= 800;
  • Now create an index on the ProductID column, which will also create statistics on the ProductID with the below query:
CREATE NONCLUSTERED INDEX   [NCI_OrderDetail_ProductID] ON   [dbo].[OrderDetail] ([ProductID]) GO
  • If you re-execute these queries, you will notice two things. First, there is no warning for missing statistics and second “Actual Number of Rows” and “Estimated Number of Rows” are the same or have closer value to each other.
select * from [dbo].[SalesOrderDetail] where   ProductID <= 800;
Query Results
select * from [dbo].[SalesOrderDetail] where   ProductID >= 800;

In addition, please check our other article on Database Statistics and Optimization. In this article, you will get definitions of Statistics and Optimizing their importance and functions respectively.

LEAVE A COMMENT

RELATED POSTS