Partitioned Table and Index Strategies Using SQL Server 2008
Summary: Table partitioning can make very large tables and indexes easier to manage, and improve the performance of appropriately filtered queries. This paper contains a thorough coverage of strategies for partitioning tables, including the required partition functions and partition schemes. The benefits of partition elimination, partition parallelism, and index partition maintenance are also explored. These strategies are applied to various partitioning scenarios such as the sliding window scenario, data archiving, and partition consolidation and movement.
Introduction
When a database table grows in size to the hundreds of gigabytes or more, it can become more difficult to load new data, remove old data, and maintain indexes. Just the sheer size of the table causes such operations to take much longer. Even the data that must be loaded or removed can be very sizable, making INSERT and DELETE operations on the table impractical. The Microsoft® SQL Server® 2008 database software provides table partitioning to make such operations more manageable.
Partitioning a large table divides the table and its indexes into smaller partitions, so that maintenance operations can be applied on a partition-by-partition basis, rather than on the entire table. In addition, the SQL Server optimizer can direct properly filtered queries to appropriate partitions rather than the entire table.
This paper covers strategies and best practices for using partitioned tables and indexes in SQL Server 2008. It is intended for database architects, developers, and administrators of both data warehouse and OLTP systems, and the material is presented at an intermediate to advanced level. For an introduction to partitioned tables, see "Partitioned Table and Index Concepts" in SQL Server 2008 Books Online at http://msdn.microsoft.com/en-us/library/ms190787.aspx.
Table Partitioning Overview
- One approach to partitioning data across multiple tables is called horizontal partitioning, where selected subsets of rows are placed in different tables. When a view is created over all the tables, and queries directed to the view, the result is a partitioned view. In a partitioned view, you have to manually apply the required constraints and operations, and maintenance can be complex and time-consuming. (For more information, see "Types of Views" in SQL Server 2008 Books Online at http://technet.microsoft.com/en-us/library/ms190426.aspx. For a comparison of partitioned views and table partitioning, see "Strategies for Partitioning Relational Data Warehouses in Microsoft SQL Server" at http://technet.microsoft.com/en-us/library/cc966457.aspx.)
- Another method is called vertical partitioning, where the columns of a very wide table are spread across multiple tables containing distinct subsets of the columns with the same number of rows. The result is multiple tables containing the same number of rows but different columns, usually with the same primary key column in each table. Often a view is defined across the multiple tables and queries directed against the view. SQL Server does not provide built-in support for vertical partitioning, but the new sparse columns feature of SQL Server 2008 can be a better solution for tables that require large numbers of columns. (For more information, see "Using Sparse Columns" in SQL Server 2008 Books Online at http://msdn.microsoft.com/en-us/library/cc280604.aspx.)
A Table Partitioning Example
Figure 1. Partitioned table dependencies
Figure 2. Before: for loading new data, a new empty partition is created and a staging table loaded
Here are the key points to observe from this example:
- The partitioned table stays online and remains available during the entire process, provided that the partition commands are not blocked by other activity on the table.
- Using the SWITCH option of the ALTER TABLE command involves a metadata-only operation and takes place immediately.
- Creating a new empty partition by using the SPLIT option to divide an empty partition, and removing an empty partition by merging two empty partitions, do not cause any data movement within the partitioned table and can take place almost immediately.
- The time-consuming steps are those involving loading or archiving the staging table.
Partitioned Table Operations and Concurrency
Benefits of Partitioned Tables and Indexes
- SQL Server automatically manages the placement of data in the proper partitions.
- A partitioned table and its indexes appear as a normal database table with indexes, even though the table might have numerous partitions.
- The table can be managed at the partition and filegroup level for ease of maintenance.
- Partitioned tables support easier and faster data loading, aging, and archiving, as illustrated in the example above.
- Application queries that are properly filtered on the partition column can perform better by making use of partition elimination and parallelism.
- In cases where partitioned data will not be modified, you can mark some or most of a partitioned table's filegroups as read-only, making management of the filegroups easier.
- In SQL Server 2008, you can compress individual partitions as well as control lock escalation at a partition level.
Partitioning large tables has some challenges:
- There is a maximum of 1,000 partitions for a table.
- You must manage filegroups and file placement if you place partitions on individual filegroups.
- The metadata-only operations (SWITCH, MERGE, and SPLIT) can be blocked by other DML actions on the table at the time, until a schema-modification lock can be obtained.
- Managing date or time-based data can be complex.
- You cannot rebuild a partitioned index with the ONLINE option set to ON, because the entire table will be locked during the rebuild.
- Automating changes to partitioned tables, as in a sliding window scenario, can be difficult, but Microsoft provides some tools to assist in automating the process. (See, for example, the Partition Management Utility on CodePlex at http://www.codeplex.com/sql2005partitionmgmt. Also see "Two Tools to Help Automate a Sliding Window" below.)
Additional Support for Table Partitioning in SQL Server 2008
SQL Server 2008 provides a number of enhancements to partitioned tables and indexes:
- There are new wizards, the Create Partition Wizard and the Manage Partition Wizard, in SQL Server Management Studio, for creating and managing partitions and sliding windows.
- There are enhanced parallel query operations on partitioned tables that may provide better CPU utilization and query performance for data warehouses and multiprocessor computers.
- You can implement data compression for partitioned tables and indexes, or on specified partitions of the table.
- The date data type makes defining partition columns easier, because time values are not stored using date.
- You can configure lock escalation settings at the partition level rather than the entire table.
- Transactional replication now supports the ALTER TABLE SWITCH operation on partitioned tables for both the Publisher and the Subscriber.
Planning for Table Partitioning
- Specify the partition boundaries in a partition function.
- Plan on how to store the partitions in filegroups using a partition scheme.
Choosing a Table to Partition
- Index maintenance on the table is costly or time-consuming and could benefit from reindexing partitions of the table rather than the whole table at once.
- Data must be aged out of the table periodically, and the delete process is currently too slow or blocks users trying to query the table.
- New data is loaded periodically into the table, the load process is too slow or interferes with queries on the table, and the table data lends itself to a partition column based on ascending date or time.
Choosing a Partition Column
- The partition column must be a single column in the table (either a single column or a computed column).
- If you have a combination of columns that form the best partition column, you can add a persisted computed column to the table that combines the values of the original columns and then partition on it. (For an example, see "The Rotating Window Scenario" below.) If you allow ad hoc queries against the partitioned table, those queries need to reference the computed column in their filters in order to take advantage of partition elimination.
- The partition column must also have an allowable data type. (For more information about the allowable data types, see "Creating Partitioned Tables and Indexes" in SQL Server 2008 Books Online at http://msdn.microsoft.com/en-us/library/ms188730.aspx.)
- In a clustered table, the partition column must be part of either the primary key or the clustered index. If the partition column is not part of the table's primary key, the partition column might allow NULL. Any data with a NULL in the partition column will reside in the leftmost partition.
- The partitioning column should reflect the best way to subdivide the target table. You should look for a relatively balanced distribution of data across the resulting partitions, though it may not be possible to know that in advance.
- You should also try to choose a partitioned column that will be used as a filter criterion in most of the queries run against the table. This enables partition elimination, where the query processor can eliminate inapplicable partitions from the query plan, and just access the partitions implied by the filter on the queries.
- Create or use an existing partition function that sets the correct range boundaries.
- Create or use an existing partition scheme tied to the partition function.
- Create the table using the partition scheme.
Table Partitioning Components
Figure 6. Dependencies of table partitioning components
Resources
- "Partitioned Tables and Indexes in SQL Server 2005" at http://msdn.microsoft.com/en-us/library/ms345146.aspx
- "Strategies for Partitioning Relational Data Warehouses in Microsoft SQL Server" at http://technet.microsoft.com/en-us/library/cc966457.aspx
- "Project REAL: Data Lifecycle – Partitioning" at http://technet.microsoft.com/en-us/library/cc966424.aspx
- "How to Implement an Automatic Sliding Window in a Partitioned Table on SQL Server 2005" at http://msdn.microsoft.com/en-us/library/aa964122(SQL.90).aspx
The Partition Function
Although user-created, partition functions differ from user-defined functions in a number of ways:
- Partition functions are not listed as database objects in the sys.all_objects or sys.objects system tables; instead, you can find them listed in sys.partition_functions.
- Partition functions are not contained by a database schema.
- Partition functions can be invoked interactively in Transact-SQL by using the $PARTITION function.
You create the partition function with the CREATE PARTITION FUNCTION command, specifying:
- An input parameter to the partition function declaring the data type of the partition boundaries.
- The input parameter must be compatible with subsequent table partition columns.
- A range type (either LEFT or RIGHT), which specifies how the boundary values of the partition function will be put into the resulting partitions.
- A set of constants as an initial list of boundary values. These mark the terminating points of partitions within the stated range. The boundary points must have the same data type as the input parameter. The boundary values can be modified later using ALTER PARTITION FUNCTION.
- The values are discrete and finite in number.
- The values are implicitly ordered from a minimum to a maximum.
- NULL is considered smaller than the minimum value of the data type's values.
A Single Integer Boundary Value
- The total number of resulting partitions is the number of partition function boundary values plus one.
- Any given possible value within the boundary value range can exist in only one partition.
- Partitions are numbered starting with 1.
Figure 7. A range of integers where the number 10 will become a boundary value
Let's start with a sample partition function that chooses RANGE LEFT.
CREATE PARTITION FUNCTION PF1_Left (int)
AS RANGE LEFT FOR VALUES (10);
Figure 8. A single integer boundary value with RANGE LEFT
- For RANGE LEFT, the boundary value specifies the upper bound of its partition: 10 is the highest allowable value in partition 1.
- All values in partition 1 must to be less than or equal to the upper boundary of partition 1.
- All values in partition 2 must be greater than partition 1's upper boundary.
Next, let's create a new partition function that specifies RANGE RIGHT.
CREATE PARTITION FUNCTION PF1_Right (int)
AS RANGE RIGHT FOR VALUES (10);
Figure 9. A single integer boundary value with RANGE RIGHT
- For RANGE RIGHT each boundary value specifies the lowest value of its partition: 10 is now the lowest allowable value in partition 2.
- All values in partition 1 must be less than the lower boundary 10 of partition 2.
- All values in partition 2 must be greater than or equal to partition 2's lower boundary.
Using the $PARTITION Function
The following example uses $PARTITION.
SELECT $PARTITION.PF1_Left(10);
SELECT $PARTITION.PF1_Right(10);
SELECT $PARTITION.pfn_OrderDateKeyRight(OrderDateKey) AS PartitionNumber
GROUP BY $PARTITION.pfn_OrderDateKeyRight(OrderDateKey)
For more examples, see "$PARTITION (Transact-SQL)" in SQL Server 2008 Books Online at http://msdn.microsoft.com/en-us/library/ms188071.aspx. You could also use the $PARTITION function to verify that a staging table's partition column values will all map to the correct partition.
Multiple Integer Boundary Values
- The boundary values in the CREATE PARTITION FUNCTION statement should be sorted from least to greatest. If they are not sorted, SQL Server will provide a warning and sort them.
- All the boundary values fall into partitions the same direction, left or right, based on the direction option of the RANGE clause.
- The boundary values must resolve to constants: You can use functions to determine the boundary values, so long as they result in constants.
- The maximum number of boundary values is 999, resulting in a maximum of 1,000 partitions.
- NULL can be a boundary value in a partition function. For RANGE LEFT, beginning a list of boundary values with NULL will place all and only data with NULLs in the partition column into partition 1. For RANGE RIGHT, beginning a list of boundary values with NULL will leave partition 1 permanently empty, and data with NULL in the partition column will fall into partition 2, along with all other values less than the next boundary value.
Now consider three integer boundary values: 0, 10, and 20, with each possible range direction.
Range Direction LEFT and Integer Boundary Values
For RANGE LEFT and boundary values (0, 10, 20):
CREATE PARTITION FUNCTION PF2_Left (int)
AS RANGE LEFT FOR VALUES (0, 10, 20);
The result for this RANGE LEFT assignment is:
{min … 0}, {1 … 10}, {11 … 20}, {21 … max}
Note the resulting limits for RANGE LEFT and (0, 10, 20):
Partition 1: values from the data type minimum to <= 0
Partition 2: values > 0 and <= 10
Partition 3: values > 10 and <= 20
Partition 4: values > 20 and <= data type maximum
We end up grouping the integers as min-0, 1-10, 11-20, 21-max.
CREATE PARTITION FUNCTION PF2_Left_Revised (int)
AS RANGE LEFT FOR VALUES (-1, 9, 19);
{min … -1}, {0 … 9}, {10 … 19}, {20 … max}
Range Direction RIGHT and Integer Boundary Values
Now let's use the same boundary values (0, 10, 20) but with RANGE RIGHT.
CREATE PARTITION FUNCTION PF2_Right (int)
AS RANGE RIGHT FOR VALUES (0, 10, 20);
{min … -1}, {0 … 9}, {10 … 19}, {20 … max}
Partition 1: values from the data type minimum to < 0
Partition 2: values >= 0 and < 10
Partition 3: values >= 10 and < 20
Partition 4: values >= 20 and <= data type maximum
Working with Boundary Data Types
Integer Boundary Values
{min … 1), {2}, {3}, {4 … max)
datetime Boundary Values
CREATE PARTITION FUNCTION [TransactionRangePF1] (datetime)
AS RANGE RIGHT FOR VALUES ('10/01/2003', '11/01/2003', '12/01/2003',
'1/01/2004', '2/01/2004', '3/01/2004', '4/01/2004',
'5/01/2004', '6/01/2004', '7/01/2004', '8/01/2004');
Partition 1: values from the data type minimum to < 10/01/2003 midnight
Partition 2: values >= 10/01/2003 midnight and < 11/01/2003 midnight
Partition 3: values >= 11/01/2003 and < 12/01/2003 midnight
Partition 12: values >= 8/01/2004 midnight and <= data type maximum
CREATE PARTITION FUNCTION [TransactionRangePF1_Left1] (datetime)
AS RANGE LEFT FOR VALUES ('10/01/2003 23:59:59.997',
'11/01/2003 11:59:59.997', '12/01/2003 23:59:59.997',
'1/01/2004 11:59:59.997', '2/01/2004 23:59:59.997',
'3/01/2004 11:59:59.997', '4/01/2004 23:59:59.997',
'5/01/2004 11:59:59.997', '6/01/2004 23:59:59.997',
'7/01/2004 11:59:59.997', '8/01/2004 23:59:59.997');
CREATE PARTITION FUNCTION [TransactionRangePF1_Left2] (datetime)
AS RANGE LEFT FOR VALUES ('10/31/2003', '11/30/2003', '12/31/2003',
'1/31/2004', '2/29/2004', '3/31/2004', '4/30/2004',
'5/31/2004', '6/30/2004', '7/31/2004', '8/31/2004');
DATE Boundary Values
CREATE PARTITION FUNCTION pfn_OrderDateKeyRight (integer)
20010701, 20010801, 20010901, 20011001, 20011101, 20011201,
20020101, 20020201, 20020301, 20020401, 20020501, 20020601,
20020701, 20020801, 20020901, 20021001, 20021101, 20021201,
20030101, 20030201, 20030301, 20030401, 20030501, 20030601,
20030701, 20030801, 20030901, 20031001, 20031101, 20031201,
20040101, 20040201, 20040301, 20040401, 20040501, 20040601,
CREATE PARTITION FUNCTION pfn_DateRight (date)
'20080101', '20080201', '20080301'
datetime2 and datetimeoffset Boundary Values
- When you create a partition function using datetime2, the resulting partition table column must use the same time fractional resolution.
- Again, RANGE RIGHT will be more convenient for forming boundaries based on date using datetime2.
Cautions Regarding date and datetime Strings
IF EXISTS(SELECT * FROM sys.partition_functions WHERE name = 'PF3_Date_Right_sp')
DROP PARTITION FUNCTION PF3_Date_Right_sp;
CREATE PARTITION FUNCTION PF3_Date_Right_sp (date)
AS RANGE RIGHT FOR VALUES ('01/01/2009',' 02/01/2009', '03/01/2009');
SELECT $PARTITION.PF3_Date_Right_sp('02/01/2009');
SELECT $PARTITION.PF3_Date_Right_sp('03/01/2009');
SELECT CAST('02/01/2009' AS Date);
SELECT CAST('03/01/2009' AS Date);
Note For safety, always specify date strings for all date data types so that they do not depend on language-or date format, such as YYYYMMDD, in your partition functions. (For more information, see "CREATE PARTITION FUNCTION" in SQL Server 2008 Books Online at http://msdn.microsoft.com/en-us/library/ms187802.aspx.)
TIME Boundary Values
CREATE PARTITION FUNCTION pfn_TimeRight (TIME(2))
'00:00:00', '01:00:00', '02:00:00', '03:00:00', '04:00:00', '05:00:00',
'06:00:00', '07:00:00', '08:00:00', '09:00:00', '10:00:00', '11:00:00',
'12:00:00', '13:00:00', '14:00:00', '15:00:00', '16:00:00', '17:00:00',
'18:00:00', '19:00:00', '20:00:00', '21:00:00', '22:00:00', '23:00:00'
Again, RANGE RIGHT makes the boundary value declaration much easier.
Other Data Types as Boundary Values
You might consider partitioning on a character (string) data type, but it can be tricky. The input parameter of the partition function must exactly match the data type of the table column (for example, if the data type of the table's partition column is varchar(20), the partition function's input parameter must also be varchar(20)). Further, the ranges of the character data in the resulting partitions will be based on the collation and sort order of the database. If you supply boundary values 'A', 'B', and 'C' for example, strings beginning with lower case letters will appear in the last partition and those beginning with a numeric character will appear in the first. For an example of partitioning on a character column, see "CREATE PARTITION FUNCTION" in SQL Server 2008 Books Online at http://msdn.microsoft.com/en-us/library/ms187802.aspx.
Altering a Partition Function: SPLIT and MERGE
Using SPLIT
You can add a single boundary value using the SPLIT clause of ALTER PARTITION FUNCTION:
- The new boundary value must be in the allowed range and distinct from existing boundary values.
- Adding a boundary value has the effect of splitting the table partition that the new boundary value falls in.
- You add the new boundary value using the SPLIT keyword in ALTER PARTITION FUNCTION.
- The new boundary value obeys the RANGE direction specified in CREATE PARTITION FUNCTION.
- The ALTER PARTITION FUNCTION SPLIT and MERGE options can reference variables or functions as boundary values, as long as they resolve to a constant of the same data type as the input parameter.
- If the partition function is used by a partitioned table and SPLIT results in partitions where both will contain data, SQL Server will move the data to the new partition. This data movement will cause transaction log growth due to inserts and deletes.
{min … -1), {0 … 9}, {10 … 19}, {20 … max}
Now let's add a new boundary value to split partition 3 at the value 15.
ALTER PARTITION FUNCTION PF2_RIGHT()
The result now is four boundary values (0, 10, 15, and 20) and therefore five partitions.
{min … -1}, {0 … 9}, {10 … 14}, {15 … 19}, {20 … max}
Using MERGE
Similarly, you can remove a single boundary value using MERGE:
- Removing a boundary value has the effect of merging two neighboring partitions into a single new partition.
- Removing a boundary value reduces the total number of boundary values by one, and therefore the number of partitions by one also.
{min … -1}, {0 … 9}, {10 … 14}, {15 … 19}, {20 … max}
The following command will remove the partition with lower boundary 10.
ALTER PARTITION FUNCTION PF2_RIGHT()
{min … -1}, {0 … 14}, {15 … 19}, {20 … max}
Avoiding Data Movement
The Partition Scheme
CREATE PARTITION SCHEME PS2_Right
TO (Filegroup1, Filegroup2, Filegroup3, Filegroup4);
If you want to place all the partitions on a single filegroup, you state it using the ALL keyword, and you can place all partitions on the database's primary filegroup using the PRIMARY keyword. (For more information, see "CREATE PARTITION SCHEME" in SQL Server 2008 Books Online at http://msdn.microsoft.com/en-us/library/ms179854.aspx.)
Altering a Partition Scheme
ALTER PARTITION SCHEME PS2_Right NEXT USED Filegroup5;
ALTER PARTITION SCHEME PS2_Right NEXT USED;
Using SPLIT with Partition Schemes
Suppose you start with the following:
{min … -1}, {0 … 9}, {10 … 19}, {20 … max}
Filegroup1 Filegroup2 Filegroup3 Filegroup4
Adding a SPLIT at 25 will produce:
{min … -1}, {0 … 9}, {10 … 19}, {20 … 24}, {25 … max}
Filegroup1 Filegroup2 Filegroup3 Filegroup4 Fillegroup5
Adding a SPLIT instead at 15 results in:
{min … -1}, {0 … 9}, {10 … 14}, {15 … 19}, {20 … max}
Filegroup1 Filegroup2 Filegroup3 Filegroup5 Fillegroup4
Using MERGE with Partition Schemes
For example, considering four partitions with four filegroups, using RANGE RIGHT:
{min … -1), {0 … 9}, {10 … 19}, {20 … max}
Filegroup1 Filegroup2 Filegroup3 Filegroup4
Partition 1 Partition 2 Partition 3 Partition 4
If you merge partitions 2 and 3 by removing the boundary value 10, the result will be:
{min … -1), {0 … 19}, {20 … max}
Filegroup1 Filegroup2 Fillegroup4
Partition 1 Partition 2 Partition 3
{min … 0), {1 … 10}, {11 … 20}, {21 … max}
Filegroup1 Filegroup2 Filegroup3 Filegroup4
Partition 1 Partition 2 Partition 3 Partition 4
If you MERGE partitions 2 and 3 by removing the boundary value 10, the result will be:
{min … 0), {1 … 20}, {21 … max}
Filegroup1 Filegroup3 Fillegroup4
Partition 1 Partition 2 Partition 3
Managing Filegroups with SPLIT and MERGE
For more information about table partitioning in relational data warehouse scenarios, see the TechNet white paper "Strategies for Partitioning Relational Data Warehouses in Microsoft SQL Server" at http://technet.microsoft.com/en-us/library/cc966457.aspx.
Partitioned Tables and Indexes
(PartitionColumnID int not null,
ON PS2_Right(PartitionColumnID);
CREATE CLUSTERED INDEX PT_CI ON PartitionedTable(PartitionColumnID)
If the table is partitioned and clustered on the primary key, you can use ALTER TABLE.
ALTER TABLE PartitionedTable DROP CONSTRAINT PT_PKC
Then re-create the secondary indexes so that they are aligned (see below).
Msg 2726, Level 16, State 1, Line 1
The Create Partition Wizard
Figure 10. Mapping partitions in the Create Partition Wizard
When you click Estimate storage, the Create Partition Wizard inspects the table and shows the number of rows and estimated storage for each partition. If your partition column has any of the date data types, you can set their boundaries by clicking Set boundaries. Finally, you have the choice of partitioning the table, scheduling it, or creating a script. For more information, see "Create Partition Wizard" in SQL Server 2008 Books Online at http://msdn.microsoft.com/en-us/library/cc280408.aspx.
Indexes and Constraints on a Partitioned Table
In a partitioned table, the partition column must be a part of:
You can reorganize partitioned indexes and specify individual partitions, lists of partitions, and even ranges of partitions. Reorganizing an index is always an online operation. (For more information, see "ALTER INDEX" in SQL Server 2008 Books Online at http://msdn.microsoft.com/en-us/library/ms188388.aspx.)
Using the ALTER TABLE SWITCH Option
SWITCH Requirements
- Switch from a partition of a partitioned table to an empty nonpartitioned staging table: In this case you alter the partitioned table.
- Switch from a staging table to an empty partition of the partitioned table: In this case you alter the staging table.
- Switch a partition from one partitioned table TO an empty partition in another partitioned table: For this option you ALTER the source partitioned table (that is, the partitioned table that has the data you want to move).
There are also some important requirements for indexes during a SWITCH operation:
Nonaligned indexes can be disabled for the duration of the SWITCH and re-enabled afterwards. There are additional requirements pertaining to indexes and partitioned tables, in particular XML and full-text indexes. For more information, see "Special Guidelines for Partitioned Indexes" in SQL Server 2008 Books Online at http://msdn.microsoft.com/en-us/library/ms187526.aspx.
Switching Data Out
To switch data out of a partitioned table:
- Create an empty staging table on the same filegroup as the partition to be switched out with the same structure (including the partition column).
- Create identical clustered and (optionally) nonclustered indexes on the empty staging table.
- Optionally, create a check constraint on the stand-alone table to ensure that all values of the partition column will be in the same range as the partition that will be switched out.
Figure 11 illustrates the SWITCH operation.
Figure 11. Switching a partition out to a staging table
ALTER TABLE dbo.PartitionedTable
SWITCH PARTITION 1 TO dbo.PartitionedTableAux PARTITION 1;
SWITCHING Data In
Also as mentioned above, the target partition in the partitioned table must be empty.
ALTER TABLE dbo.PartitionedTableAux
SWITCH TO PartitionedTable PARTITION 1;
Again, you can optionally add a PARTITION clause.
ALTER TABLE dbo.PartitionedTableAux
SWITCH PARTITION 1 TO PartitionedTable PARTITION 1;
Figure 12. Switching from a staging table into a partition
SWITCHING Between Two Partitioned Tables
ALTER TABLE [Production].[TransactionHistory]
TO [Production].[TransactionHistoryArchive] PARTITION 2;
Using the Manage Partition Wizard to Perform SWITCH Operations
- The staging table.
- The necessary clustered and nonclustered indexes.
- Foreign keys.
- A check constraint to preserve the partition column's range.
The script does not contain the ALTER TABLE command to actually make the switch.
Partition-Aligned Indexed Views
- The partitioned table and indexed view must share the same or equivalent partition functions (same number of and values of boundary values, same data type, and so on).
- The indexed view must contain the partition column in its SELECT list.
- Any GROUP BY statement must also include the partition column of the underlying table.
- An indexed view can only be partition-aligned with one partitioned table at a time.
You can find details about the requirements for implementing a partition-aligned indexed view, along with sample code for your own testing, in "Partition Switching When Indexed Views Are Defined" in SQL Server 2008 Books Online at http://msdn.microsoft.com/en-us/library/bb964715.aspx. The sample code contained in that entry switches a table's partition out to another partitioned table, along with the corresponding partition of the indexed view.
Partition Elimination and Query Performance
- A new partition-aware seek operation to determine partition elimination
- Two levels of the partition-aware seek, called a skip scan
- A Partitioned attribute for physical and logical operators that appears in the query plan
- Partition summary information, including the partition count and the partitions accessed
For limitations on other ways of displaying query plan, see "Displaying Partition Information by Using Other Showplan Methods" in "Query Processing Enhancements on Partitioned Tables and Indexes" in SQL Server 2008 Books Online at http://msdn.microsoft.com/en-us/library/ms345599.aspx. For more information about table partitioning and query performance in SQL Server 2008, see "Data Warehouse Query Performance" at http://technet.microsoft.com/en-us/magazine/2008.04.dwperformance.aspx.
Partition-Aware Seek Operation
WHERE OrderDateKey BETWEEN 20030402 AND 20030822
AND SalesOrderNumber =
'SO51922'
Figure 15. The Partition-Aware Seek property in the ToolTip view of a graphical query plan
Figure 17. Determining the number of partitions accessed in the operator's Properties dialog box
Skip-Scan: Seek Keys
Figure 18. Detecting the Seek Keys[2] skip scan
Join Collocation
For further explanation and an example of join collocation, see the "Interpreting Execution Plans for Collocated Joins" subtopic of "Query Processing Enhancements on Partitioned Tables and Indexes" in SQL Server 2008 Books Online at http://msdn.microsoft.com/en-us/library/ms345599.aspx. You can use the code sample at the end of the article to observe join collocation. The script generates a table called fact_sales. Just duplicate the code and generate a similar fact table called fact_sales2 and partition it using the same partition function, and create its clustered index using the same partition scheme as fact_sales1. When you join the two tables, as in the following sample query, join collocation will occur.
SUM(F2.quantity*F2.unit_price) AS total_price
AND F1.product_id = F2.product_id
WHERE F1.date_id BETWEEN 20080830 AND 20080902
Figure 19. Join collocation indicated by the Constant Scan operator
Partitioned Table Parallelism
SQL Server 2005 was optimized for queries filtered to one partition. For such queries, more than one available thread could be assigned to scan the partition in parallel. However, for a filtered query touching more than one partition, only one thread could be assigned to any given partition. In SQL Server 2008, if the number of available threads is greater than the partitions accessed by a filtered query, more than one thread can be assigned to access data in each partition. This can improve performance in filtered queries that access more than one partition. An explanation of the assignment of parallel threads to partitions is presented in the "Parallel Query Execution Strategy for Partitioned Objects" subtopic of the "Query Processing Enhancements on Partitioned Tables and Indexes" topic in SQL Server 2008 Books Online at http://technet.microsoft.com/en-us/library/ms345599.aspx. (See also "Data Warehouse Query Performance" at http://technet.microsoft.com/en-us/magazine/cc434693.aspx.)
Bitmap Filtering
SQL Server 2008 can enhance parallelized query plans on partitioned tables through use of a bitmap filtering. Bitmap filters were introduced in SQL Server 2005 and improved in SQL Server 2008: In SQL Server 2008, bitmap filters can be introduced dynamically by the query optimizer during query plan generation. This is particularly relevant for data warehouses that join large fact tables to dimension tables in a star schema. A table that is partitioned on an integer that simulates a date can take advantage of optimized bitmap filtering. For more information about bitmap filtering, see "Optimizing Data Warehouse Query Performance Through Bitmap Filtering" in SQL Server 2008 Books Online at http://msdn.microsoft.com/en-us/library/bb522541.aspx.
Data Compression
ALTER TABLE dbo.FactInternetSales
WITH (DATA_COMPRESSION = PAGE ON PARTITIONS (23,27) )
EXEC sp_estimate_data_compression_savings
, @object_name = 'FactInternetSales'
, @partition_number = $PARTITION.pfn_OrderDateKeyRight(20020301)
When you use SPLIT and MERGE on a partitioned table that is compressed, the resulting new or consolidated partitions will inherit the compression setting of the original or partner partition. When you switch a partition in or out, the compression settings between the staging table and the partition must match. For more details, see "How Compression Affects Partitioned Tables and Indexes" under "Creating Compressed Tables and Indexes" in SQL Server 2008 Books Online at http://msdn.microsoft.com/en-us/library/cc280449.aspx.
Additional Partitioning Support in SQL Server 2008
Many new features of SQL Server 2008 have uses or implications for table partitioning:
- Controlling lock escalation granularity within a partitioned table. In SQL Server 2008, you can control lock escalation at the partition level. By default, if a table is partitioned, setting the lock escalation value to AUTO causes locks to escalate to the partition level only, not the table level. This can reduce lock contention but it increases the number of locks taken.
- Partitioning a Table with a FILESTREAM Column. A table with a FILESTREAM column must have a unique constraint on a ROWGUID column, declared either as a constraint or as a unique secondary index. If the natural partitioning column is not a ROWGUID, it may be possible to create the ROWGUID value as a function of the natural partitioning column, such as a date, and then specify corresponding range values for the ROWGUID that match the desired date ranges. If you do not partition a table on the FILESTREAM's required ROWGUID column, you can maintain the uniqueness of the column based on an index and then disable the index (and thereby disable FILESTREAM operations) during any SWITCH operations. When the SWITCH is completed, you can enable (rebuild) the unique index, and when that is done, FILESTREAM operations can continue.
- Filtered indexes. In SQL Server 2008, you can place filters (limited WHERE clauses) on secondary indexes. Filtered indexes are fully compatible with table partitioning. Only secondary nonclustered indexes can be filtered. Just as with ordinary indexes, in order to enable the SWITCH operation you must either include the partition column as part of the index key, or disable the index when executing SWITCH operations. The filtered index must be aligned with the partitioned table.
- Change Tracking. Change tracking is used for maintaining synchronization between application data and SQL Server tables. You can apply change tracking to a partitioned table. However, the SWITCH command will fail if either table involved in the SWITCH has change tracking enabled. For more information, see "Configuring and Managing Change Tracking" in SQL Server 2008 Books Online at http://msdn.microsoft.com/en-us/library/bb964713.aspx.
- Change Data Capture. The Change Data Capture utility captures all INSERT, UPDATE, and DELETE activity against a partitioned or a nonpartitioned table. But if you want to use the SWITCH operation on the table, you must explicitly state so in the sys.sp_cdc_enable_table command. If you do not, any SWITCH will be prevented. For more information, see "sys.sp_cdc_enable_table" in SQL Server 2008 Books Online at http://msdn.microsoft.com/en-us/library/bb522475.aspx.
- Sparse columns. Sparse columns provide another way of more efficiently storing database data, by storing columns that are mostly NULL in an alternate location. However, A sparse column cannot be a part of a clustered index (or primary key); it also cannot be the partitioning key of a partitioned table, whether clustered or a heap. However, you can partition a secondary index on a sparse column. This could be useful if you want to partition just the secondary index.
- Full-text Search. Although you can have full-text indexes on a partitioned table, the SWITCH command will be prohibited. The full-text index is treated like a nonaligned index.
- Replication. Table partitioning is compatible with transactional replication, and even switching partitions in and out can be replicated, provided the partitioned tables are set up correctly on the Publisher and Subscriber tables, and the available partitioning options in replication are used. Merge replication does not support partition switching. For more information, see "Replicating Partitioned Tables and Indexes" in SQL Server 2008 Books Online at http://msdn.microsoft.com/en-us/library/cc280940.aspx.
The Sliding Window Scenario
The sliding window scenario is a method for aging old data out and bringing new data into a partitioned table on a periodic basis. Using a time-based or date-based partitioned table, the technique employs the SWITCH option of ALTER TABLE, along with the MERGE and SPLIT operations of ALTER PARTITION FUNCTION, to move the table's partitions forward in time one partition at a time, leaving the overall number of partitions the same. (For more information, see "Designing Partitions to Manage Subsets of Data" in SQL Server 2008 Books Online at http://msdn.microsoft.com/en-us/library/ms191174.aspx.)
Sliding Window Steps
In general, a sliding window consists of the following steps:
- Create a nonpartitioned archive table with the same structure, and a matching clustered index (if required). Optionally you can add matching secondary indexes and constraints. Place it on the same filegroup as the oldest partition.
- Use SWITCH to move the oldest partition from the partitioned table to the archive table.
- Remove the boundary value of the oldest partition using MERGE.
- Designate the NEXT USED filegroup.
- Create a new boundary value for the new partition using SPLIT (the best practice is to split an empty partition at the leading end of the table into two empty partitions to minimize data movement.).
- Create a staging table that has the same structure as the partitioned table on the target filegroup.
- Populate the staging table.
- Add indexes.
- Add a check constraint that matches the constraint of the new partition.
- Ensure that all indexes are aligned.
- Switch the newest data into the partitioned table (the staging table is now empty).
- Update statistics on the partitioned table.
- Minimize data movement by using SPLIT and MERGE on empty partitions. Switching out the oldest data partition and leaving an empty partition in its place allows you to merge an empty partition. The same is true when you use the SPLIT operation to create the partition for the newest data: It is initially an empty partition, so the SPLIT operation does not cause any data movement.
- Create the staging table for new data as initially a heap, load the data, and then build indexes (and indexed views, if any) as a preparation for switching.
- Update statistics on the table after every partition switching cycle (because statistics are kept at the table level, this operation can be time-consuming and resource intensive).
A Sliding Window Illustration
Figure 20. Switching out the oldest partition containing data
Figure 21. Result of removing the oldest data partition using MERGE
Now the SWITCH operation is performed, as shown in Figure 23.
Figure 23. A newest month's data is switched into the table
At this point the staging table is empty and can be dropped.
The Manage Partition Wizard
Sliding Window Between Two Partitioned Tables
SQL Server 2008 Books Online has an interesting variation on the sliding window using the AdventureWorks database. The Sliding.sql script provided in the Samples for SQL Server 2008 on CodePlex archives the oldest partition (September 2003) of the partitioned Production.TransactionHistory table to a new partition in the Production.TransactionHistoryArchive table. This is a good illustration of using SWITCH to move data from one partitioned table to another. (See also "Designing Partitions to Manage Subsets of Data" in SQL Server 2008 Books Online at http://msdn.microsoft.com/en-us/library/ms191174.aspx.)
Two Tools to Help Automate a Sliding Window
A general-purpose utility called the Partition Management Utility for SQL Server 2008 can be downloaded from CodePlex at http://www.codeplex.com/SQLPartitionMgmt. The Partition Management Utility (ManagePartition.exe) contains five basic operations. Although developed by the Microsoft SQL Server CAT team, this tool is not officially supported by Microsoft. The utility contains the following commands:
- ClearPartition clears a single partition from a table. This will switch a partition out to a staging table on the same filegroup, with a name of your choice, and optionally drop the staging table after the switch.
- CreateStagingFull creates an empty staging table with all necessary indexes, foreign keys, and check constraints matching a given partition.
- CreateStagingNoindex creates a staging table for a given partition with no indexes for fast loading.
- CreateStagingClustedIndex creates a staging table with only a clustered index.
- IndexStaging creates the indexes on a specified staging table that has been created using CreateStagingNoindex and loaded with data.
- Use ClearPartition on the oldest data partition, keeping a staging (archive) table with the name you supply.
- Remove the boundary value of the oldest partition using MERGE.
- Designate the NEXT USED filegroup.
- Create a new boundary value for the new partition using SPLIT.
- Use CreateStagingNoindex to create an empty staging table for data load on the NEXT USED filegroup.
- Populate the staging table with new data.
- Use IndexStaging to create the indexes and constraints on the staging table.
- Switch the staging (newest data) table into the partitioned table.
- Update statistics on the partitioned table.
http://msdn2.microsoft.com/en-us/library/aa964122.aspx. Although targeted for SQL Server 2005, the code operates with SQL Server 2008 as well. The article illustrates its technique using a sample partitioned table using RANGE RIGHT and daily data for 60 days, based on a datetime column. Two partitioned tables are created: a main partitioned table containing current data, and an auxiliary table containing archived data. To create the partition for a new day's worth of data at the end of the main partitioned table, they provide a stored procedure that automatically retrieves the next boundary ID, sets the NEXT USED filegroup, and then uses SPLIT to create the new partition. To archive the oldest partition, the article supplies a stored procedure that retrieves the boundary value for partition 2 of the table, adds a new partition to the auxiliary table, and then switches that data into the auxiliary table. However, you will have to add your own filegroup management and archiving techniques.
Other Scenarios
The Rotating Window Scenario
Instead of a window that moves forward in time by creating new partitions, you can use the partitions in a ring structure, so that new data rotates around the partition scheme one partition at a time.
For example, suppose you want to keep the last month's worth of data, partitioned by day. You could create 31 fixed partitions, where the partition number represents the day of the month. You could create a persisted calculated column that computes the day of the month based on a date column in the table, and partition on that computed column. For each new day's data, you switch out that day number's partition, truncate the staging table, load it with that day's data, and then switch the loaded staging table back into the partitioned table.
The major restriction on using such a scenario is that queries against the table must filter based on the partition number (the day of the month) in order to gain partition elimination. If users issue queries and filter by the original date column, the queries do not perform well because SQL Server's optimizer cannot eliminate inapplicable partitions.
So in such rotating windows, it is usually required that users issue queries through some kind of controlled mechanism: the application code or a stored procedure. When the user's query reaches SQL Server, any of the original date references are translated into a filter on the day of the month, and partition elimination occurs.
As a particular example, Microsoft Sales has a database that contains a table that contains the last 31 days worth of data, partitioned by hour. Each partition represents the hour of the month. The total number of partitions is 31*24 = 744 partitions. The partition function is fixed and no SPLIT or MERGE operations occur. The newest data is loaded into the current hour by staging data 30 seconds at a time (using the two staging tables scenario described below in "Continuous Loading".) At the top of the hour, the oldest partition is switched out and truncated (because downstream processes have already extracted their data) and becomes the newest hour's partition. If the time now is 2008-06-10 09:59:00, it will be the 225th hour of the month of June 2008, and we will be loading data into partition 225. Partition 226 is the oldest partition, containing data for the 226th hour of May (2008-05-10, 10:00:00). At the top of the hour, the data from partition 226 is switched out and the table truncated, and new data will be inserted into partition 226. (I am indebted to LeRoy Tuttle for this example.)
Data Loading Scenarios
The most efficient way of loading new data into a partitioned table is to load a staging table first and then use the SPLIT and SWITCH operations to switch the staging table into the partitioned table. (Partitioning can be key to improving data loads. For example, at the SQL Server 2008 Launch in February, SQL Server 2008 achieved the world record for ETL loading into a partitioned table. See "ETL World Record!" at http://blogs.msdn.com/sqlperf/archive/2008/02/27/etl-world-record.aspx.)
Periodic Loading into a Relational Data Warehouse
For more information, see the TechNet white paper "Loading Bulk Data into a Partitioned Table" at http://technet.microsoft.com/en-us/library/cc966380.aspx.
Continuous Loading
Handling Large Numbers of Partitions
- Consolidate the oldest partitions periodically. You might decide to consolidate the older data into a partition time period that is longer than the standard. For example, you might partition the current year by day, the previous year by month, and so on. In that case you must incur some penalty of data movement, because as you age the oldest finer-grained partition into the larger partition, somehow you must get the data from one partition into the other. For example, you could switch out the oldest day's worth of data, and then directly insert it into the broader last year's partition. Assuming the data in the older partition is read-only or mostly read-only, the inserts might not cause contention with user queries on the data. Alternatively, you could switch both the last daily partition and last year's partition out, each into their own staging tables, and then copy the oldest day's data into last year's partition, and switch the result back into the table. The overall effect is to maintain a sliding window for the current year's data and maintain a wider partition for the older year's data.
- Partitioned views or federated servers. If you simply must maintain more than 1,000 partitions, you could maintain more than one partitioned table, with a partitioned view declared over the partitioned tables. If the two tables are kept in separate databases on separate servers, you could create a distributed partitioned view across two SQL Server instances. For more information, see the topics related to "Using Partitioned Views" and "Federated Database Servers" in SQL Server 2008 Books Online at http://msdn.microsoft.com/en-us/library/ms190019.aspx and http://msdn.microsoft.com/en-us/library/ms190381.aspx, respectively.
Consolidating Partitions
Maintaining Non-Time-Based Ranges
Archiving Data Out of a Partitioned Table
Archiving Within the Same Database
Some considerations are listed here:
- The most efficient way to delete old partitioned data from the database is to switch the data out to a staging table, optionally export the data for archival storage, and then truncate the table.
- If you are archiving to a different database or different server, you can still switch out the oldest partition to an archive table and then export the data from the archive table. However, the filegroup used by the archive table belongs to the original database, and it cannot be moved to another database. After the archive table is exported, you can truncate the table and use the filegroup again for other purposes.
- If you use dedicated filegroups for archived data and the archived data is read-only, you can mark the filegroup read-only and reduce the locks taken on the tables when archived data is queried. However, you cannot mark a filegroup as read-only with other users in the database.
- You can store archived data on slower disks, provided you move the filegroup and file: For more information, see the next section.
Moving Older Partitions to Slower Disks
- Copy the partition's data out of the table and onto the target filegroup on the slower disk set.
- Remove the partition on the faster disk set using MERGE.
- Add a new partition with the same constraint to the slower disk set using SPLIT.
- Move the data in using SWITCH.
Figure 24. A partitioned table spread across slower and faster disk sets
- First, prepare a new filegroup on the slower disk set, which will be the target location for the new partition 4.
- Next, build a staging table on the new filegroup and copy the data to it. This could be done by exporting the partition's data to disk and bulk loading into the staging table.
- After the copy, add the appropriate indexes and constraint on the staging table for partition 4's data.
- Create a temporary staging table on the current partition 4's filegroup to serve as a place to switch data out of partition 4.
These steps are illustrated in Figure 25.
Figure 25. Manually copying the staging table to a slower disk filegroup
Figure 26. The result of SWITCH and MERGE operations on the fast filegroup
- Assign the NEXT USED property to the SlowFG4 filegroup.
- Add the boundary value for the old partition 4 back into the partition function using SPLIT.
- Switch the data from the staging table into the new partition.
The result is shown in Figure 27.
Figure 27. Switching the staging table back into partition 4 on a slower-disk filegroup
Table Partitioning Best Practices
Table partitioning is useful on very large data tables for primarily two reasons.
- Rebuild and reorganize indexes by partition.
- Use partition-aligned indexed views in switching operations.
- Use a sliding window strategy for quickly bringing in new data and archiving old data.
- Ensure that indexes are aligned with the partitioned table, and that indexed views are partition-aligned.
- Ensure that queries against the partitioned tables have filters based on the partition column.
- On data warehouse joins, keep the join column simple (such as an integer or date) and explicit, so as to take advantage of bitmap filtering for star joins.
In general, to take full advantage of table partitioning, you should:
- Make sure that the configuration of max degree of parallelism is set sufficiently high to take advantage of parallel operations, or else add a MAXDOP query hint to fine-tune the degree of parallelism.
- Maintain an empty partition on both ends of the partitioned table and ensure that only empty partitions are split and merged in a sliding window scenario.
- Remember that RANGE RIGHT may be more convenient than RANGE LEFT in partition functions, especially when you are specifying date ranges.
- Use data types without fractional components as partition columns, such as a date or an integer.
- Always use a standard language-independent date format when specifying partition function boundary values.
- Use an integer-based date and date dimension in data warehouses.
- Use a single column of the table as the partitioned column whenever possible. If you must partition across more than one column, you can use a persisted computed column as the partitioning column. But then to achieve partition elimination, you must control queries to ensure they reference the partition column in their filters.
- Use SWITCH with MERGE to drop partition data: Switch out the partition and remove the partition's boundary value using MERGE.
- Use TRUNCATE TABLE to delete partition data by switching a partition out to a staging table and truncating the staging table.
- Check for partition elimination in query plans.
- Place read only data on read-only filegroups to reduce locking and simplify recovery for piecemeal restores.
- Spread filegroups across all disks for maximum I/O performance.
- Automate sliding window scenarios using available tools.
Even though the focus of the paper is on relational data warehouses, most of the recommendations are relevant for table partitioning in an OLTP context as well.
Conclusion
Table partitioning addresses the challenges of querying and managing large tables. When you partition a table or index, many operations can apply at the partition level rather than the table level. Partition elimination allows the query processor to eliminate inapplicable partitions from a query plan, and tables and indexes can often be managed at the partition level. Partitioning a table requires careful analysis to choose the appropriate partition column, boundary values for a partition function, and filegroup placement for the partition scheme. The result is a method for managing large data tables in both OLTP and relational data warehouse scenarios.
In SQL Server 2008, indexed views can be partition-aligned, so that they participate in the SWITCH operation. You can also use new features of SQL Server 2008, such as data compression and finer-grained lock escalation control, at the partition level. In addition, Microsoft provides tools to assist in automating changes to partitioned tables over time, in particular for the sliding window scenario.
Appendix: Inspecting Table Partitioning Objects
The following sections contain queries of table partitioning metadata that you can use to inspect partition functions, partition schemes, and partitioned tables.
Inspecting a Partition Function
You can inspect any partition function by using four catalog views. Figure A1 shows an Entity-Relationship diagram of the four catalog views, considered as entities.
Figure A1. Catalog views for partition functions
Full descriptions of these catalog views can be found in SQL Server 2008 Books Online. For purposes of querying these views, among the more useful columns are:
- Sys.partition_functions returns one row per partition function, containing function_id for joining with other catalog views, the partition function name, range direction (in boundary_value_on_right), and number of resulting partitions (in fanout).
- Sys.partition_parameters returns one row per partition function and parameter ID combination. (At the current time, a partition function can only have one input parameter.) Each row contains the data type ID (in system_type_id) of the input parameter, which you can join with the sys.types catalog view to get the data type name. It also contains:
- The length, precision, and scale of the input parameter (if required by the data type).
- The collation_name of a character-based data type.
- A user_type_id, implying future support for more than native SQL Server data types as partition function input parameters.
- The length, precision, and scale of the input parameter (if required by the data type).
- Sys.partition_range_values returns one row per boundary value of the partition function, assigning a boundary_id and value.
To extract general information about all partition functions, without the explicit boundary values, you can use the following steps:
- Select the desired columns from sys.partition_functions.
- Join with sys.partition_parameters to get the value for system_type_id.
- Join with sys.types to get the data type name of the input parameter.
The following code shows an example of these steps.
SELECT
PF.function_id
, PF.name
, PF.fanout AS NumPartitions
, CASE WHEN PF.boundary_value_on_right = 0
THEN 'LEFT' ELSE 'RIGHT' END AS RangeType
, PP.parameter_id
, CASE WHEN PP.system_type_id = PP.user_type_id
THEN T1.name ELSE T2.name END AS ParameterDataType
FROM sys.partition_functions AS PF
JOIN sys.partition_parameters AS PP
ON PF.function_id = PP.function_id
JOIN sys.types AS T1
ON T1.system_type_id = PP.system_type_id
JOIN sys.types AS T2
ON T2.user_type_id= PP.user_type_id;
To query for all information about a partition function, including partitions and boundaries, you can use the above steps and add the following:
- Join with partition_range_values to get each boundary values.
The following query shows the boundary values in addition to basic function information.
SELECT
PF.function_id
, PF.name
, PF.fanout AS NumPartitions
, CASE WHEN PF.boundary_value_on_right = 0
THEN 'LEFT' ELSE 'RIGHT' END AS RangeType
, PP.parameter_id
, CASE WHEN PP.system_type_id = PP.user_type_id
THEN T1.name ELSE T2.name END AS ParameterDataType
, PRV.boundary_id
, PRV.value
, CASE WHEN PF.boundary_value_on_right = 0
THEN PRV.boundary_id ELSE PRV.boundary_id + 1 END AS PartitionNumber
FROM sys.partition_functions AS PF
JOIN sys.partition_parameters AS PP
ON PF.function_id = PP.function_id
JOIN sys.types AS T1
ON T1.system_type_id = PP.system_type_id
JOIN sys.types AS T2
ON T2.user_type_id= PP.user_type_id
JOIN sys.partition_range_values AS PRV
ON PP.function_id = PRV.function_id
AND PP.parameter_id = PRV.parameter_id;
Partition functions are database-wide objects, meaning that they are stored within a database. You can script a partition function by using Object Explorer in SQL Server Management Studio, going to the Storage node of a database, expanding the Partition Functions node and then right-clicking the partition function. However, partition functions are not listed as database objects in the sys.all_objects or sys.objects system tables.
You can also script a partition function using the Windows PowerShell™ command-line interface from SQL Server Management Studio, as you can see in the following script. For instance, to script out the PF2_Right() partition function, start by right-clicking the database name in Object Explorer to invoke SQL Server PowerShell. In the resulting window, execute the following, one line at a time.
cd PartitionFunctions
$pso = New-Object Microsoft.SqlServer.Management.Smo.ScriptingOptions
(get-item PF2_Right).Script($pso)
Inspecting a Partition Scheme
You can view partition schemes using catalog views. The relevant catalog views are sys.partition_schemes, sys.destination_data_spaces, and sys.filegroups. Figure A2 shows these catalog views interpreted as entities.
Figure A2. Catalog views for partition schemes
Both sys.partition_schemes and sys.filegroups inherit columns from sys.data_spaces. sys.data_spaces includes information for both partition schemes and filegroups. sys.partition_functions adds function_id to identify the function, and it adds some additional filegroup information for sys.filegroups. sys.partition_schemes does not contain a partition scheme ID as such, but the data_space_id maps to the partition_scheme_id of sys.destination_data_spaces.
To query the catalog views:
- Get the partition scheme name from sys.partition_schemes.name.
- Join sys.partition_schemes with sys.destination_data_spaces on data_space_id and partition_scheme_id, respectively.
- Get the filegroup name from sys.filegroups.
- Join sys.filegroups with sys.destination_data_spaces on data_space_id.
The following code queries the above tables and returns information for the partition scheme PS_Right, with Filegroup5 designated as the NEXT USED filegroup.
SELECT
SPS.name AS PartitionSchemeName
, SDD.destination_id
, SF.name AS FileGroup
FROM sys.partition_schemes AS SPS
JOIN sys.destination_data_spaces AS SDD
ON SPS.data_space_id = SDD.partition_scheme_id
JOIN sys.filegroups AS SF
ON SF.data_space_id = SDD.data_space_id
WHERE SPS.name = 'PS2_Right';
If there is a NEXT USED filegroup, the marked filegroup gets the next ordinal number in destination_id.
The sys.destination_data_spaces destination_id stores the partition number, except for the NEXT USED filegroup. The filegroup marked as NEXT USED has a number that is one greater than the number of partitions defined by the partition function. Because the number of partitions is defined in the fanout column of sys.partition_functions, you can use it to determine what the NEXT USED filegroup is, as well as what the actual partition IDs are.
SELECT
SPS.name AS PartitionSchemeName
, SPF.name AS PartitionFunctionName
, CASE WHEN SDD.destination_id <= SPF.fanout THEN SDD.destination_id
ELSE NULL END AS PartitionID
, SF.name AS FileGroup
, CASE WHEN SDD.destination_id > SPF.fanout THEN 1
ELSE 0 END AS NextUsed
FROM sys.partition_schemes AS SPS
JOIN sys.partition_functions AS SPF
ON SPF.function_id = SPS.function_id
JOIN sys.destination_data_spaces AS SDD
ON SPS.data_space_id = SDD.partition_scheme_id
JOIN sys.filegroups AS SF
ON SF.data_space_id = SDD.data_space_id
WHERE SPS.name = 'PS2_Right';
You can complete the query by showing the actual boundary values defined by the partition function, so that you can match up the function's values with the filegroups that the data would be stored in.
SELECT
SPS.name AS PartitionSchemeName
, CASE WHEN SDD.destination_id <= SPF.fanout THEN SDD.destination_id
ELSE NULL END AS PartitionID
, SPF.name AS PartitionFunctionName
, SPRV.value AS BoundaryValue
, CASE WHEN SDD.destination_id > SPF.fanout THEN 1
ELSE 0 END AS NextUsed
, SF.name AS FileGroup
FROM sys.partition_schemes AS SPS
JOIN sys.partition_functions AS SPF
ON SPS.function_id = SPF.function_id
JOIN sys.destination_data_spaces AS SDD
ON SDD.partition_scheme_id = SPS.data_space_id
JOIN sys.filegroups AS SF
ON SF.data_space_id = SDD.data_space_id
LEFT JOIN sys.partition_range_values AS SPRV
ON SPRV.function_id = SPF.function_id
AND SDD.destination_id =
CASE WHEN SPF.boundary_value_on_right = 0 THEN SPRV.boundary_id
ELSE SPRV.boundary_id + 1 END
WHERE SPS.name = 'PS2_Right';
In the above query:
- The sys.partition_functions fanout column is used to get the number of partitions.
- If the sys.destination_data_spaces destination_id is less than or equal to the number of partitions, it is the partition ID.
- If the sys.destination_data_spaces destination_id is greater than the number of partitions, that is the NEXT USED filegroup.
- If the partition function is RANGE LEFT, the value of boundary_id from sys.partition_range_values is the same as the partition ID; for RANGE RIGHT, just add 1 to the boundary_id to get the partition ID.
You can script a partition scheme by using Object Explorer in SQL Server Management Studio, clicking the Storage node of a database, expanding the Partition Schemes node, and then right-clicking the partition scheme. You can also use SQL Server PowerShell to retrieve a partition scheme script by right-clicking the database name and issuing the following PS SQL commands.
cd PartitionSchemes
$pso = New-Object Microsoft.SqlServer.Management.Smo.ScriptingOptions
(get-item PS2_Right).Script($pso)
Inspecting a Partitioned Table
Every table in SQL Server 2005 and SQL Server 2008 has at least one row in the sys.partitions catalog view, so in an extended sense they could all be considered partitioned. But more exactly, we will only consider tables that are linked with a partition scheme as partitioned.
The key to determining whether a table is partitioned is the table (or index) data_space_id in the sys.indexes catalog view, and whether it has an associated partition scheme in the sys.data_spaces catalog view. All tables that are placed on a partition scheme will have 'PS' (for partition scheme) as the type for their data_space_id in sys.data_spaces.
Even if you have a partitioned table and remove all the boundary values of its partition function using the MERGE option, the result will be a table with just one partition. However, the table will still be defined using a partition scheme, and you can add boundaries and partitions back into it using SPLIT.
The quickest way to query whether a table has a link to a partition scheme is to inspect the sys.indexes catalog view for the row corresponding to the table (whether a heap or clustered), find its data_space_id, and then find a matching partition scheme in the sys.data_spaces table. Figure A3 shows the relationship of the metadata tables you can use to find partitioned table information.
Figure A3. Catalog views for partitioned tables
To just get a list of all partitioned tables (that is, tables that are created on a partition scheme), and their partition function, you can join the sys.partitions, sys.indexes, and sys.data_spaces catalog views.
SELECT
OBJECT_NAME(SI.object_id) AS PartitionedTable
, DS.name AS PartitionScheme
FROM sys.indexes AS SI
JOIN sys.data_spaces AS DS
ON DS.data_space_id = SI.data_space_id
WHERE DS.type = 'PS'
AND OBJECTPROPERTYEX(SI.object_id, 'BaseType') = 'U'
AND SI.index_id IN(0,1);
The table name is available from the object_id column of sys.indexes, and the name of the partition scheme from the name column of sys.data_spaces. If a row in sys.data_spaces has a type of 'PS', it is a partition scheme. The query returns data for just base tables (and not system tables) by using the OBJECTPROPERTYEX() system function. Finally, the sys.indexes catalog view also contains multiple rows for indexes related to the table, so limiting the query to an index_id of 0 or 1 returns information for just heap or clustered tables.
To add the partition function, just add joins to sys.partition_schemes and sys.partition_functions.
SELECT
OBJECT_NAME(SI.object_id) AS PartitionedTable
, DS.name AS PartitionScheme
, PF.name AS PartitionFunction
FROM sys.indexes AS SI
JOIN sys.data_spaces AS DS
ON DS.data_space_id = SI.data_space_id
JOIN sys.partition_schemes AS PS
ON PS.data_space_id = SI.data_space_id
JOIN sys.partition_functions AS PF
ON PF.function_id = PS.function_id
WHERE DS.type = 'PS'
AND OBJECTPROPERTYEX(SI.object_id, 'BaseType') = 'U'
AND SI.index_id IN(0,1);
Given a partitioned table, it is useful to know the filegroups that its partitions reside on. The following query starts with the sys.partitions catalog view to show individual table partitions with the table name, partition scheme, partition function, partition number, and approximate rows per partition.
SELECT
OBJECT_NAME(SI.object_id) AS PartitionedTable
, DS.name AS PartitionScheme
, PF.name AS PartitionFunction
, P.partition_number
, P.rows
FROM sys.partitions AS P
JOIN sys.indexes AS SI
ON P.object_id = SI.object_id AND P.index_id = SI.index_id
JOIN sys.data_spaces AS DS
ON DS.data_space_id = SI.data_space_id
JOIN sys.partition_schemes AS PS
ON PS.data_space_id = SI.data_space_id
JOIN sys.partition_functions AS PF
ON PF.function_id = PS.function_id
WHERE DS.type = 'PS'
AND OBJECTPROPERTYEX(SI.object_id, 'BaseType') = 'U'
AND SI.type IN(0,1);
The number of rows from sys.partitions is considered approximate. Additional columns are available in the sys.partitions catalog view for data compression information. (See "Querying Data and Metadata from Partitioned Tables and Indexes" in SQL Server 2008 Books Online at http://msdn.microsoft.com/en-us/library/ms187924.aspx.)
Next, expand the query to get the filegroups for each partition number in a partitioned table.
OBJECT_NAME(SI.object_id) AS PartitionedTable
, DS.name AS PartitionSchemeName
, PF.name AS PartitionFunction
, P.partition_number AS PartitionNumber
ON P.object_id = SI.object_id AND P.index_id = SI.index_id
ON DS.data_space_id = SI.data_space_id
JOIN sys.partition_schemes AS PS
ON PS.data_space_id = SI.data_space_id
JOIN sys.partition_functions AS PF
ON PF.function_id = PS.function_id
JOIN sys.destination_data_spaces AS DDS
ON DDS.partition_scheme_id = SI.data_space_id
AND DDS.destination_id = P.partition_number
ON DDS.data_space_id = FG.data_space_id
AND OBJECTPROPERTYEX(SI.object_id, 'BaseType') = 'U'
Inspecting Partitioned Indexes
OBJECT_NAME(SI.object_id) AS PartitionedTable
ON DS.data_space_id = SI.data_space_id
OBJECT_NAME(SI.object_id) AS PartitionedTable
ON DS.data_space_id = SI.data_space_id
WHERE OBJECT_NAME(SI.object_id) = 'PartitionedTable'
OBJECT_NAME(SI.object_id) AS PartitionedTable
ON DS.data_space_id = SI.data_space_id
0 comments:
Post a Comment