I had a chance to do a proof of concept project with a client who was looking to migrate from an unsupported version of Sybase IQ to SQL Server 2012. The client was interested in keeping the database as similar as possible and then point the existing BI reporting applications to the new SQL Server. The client was also interested in proving that SQL Server can perform as well as Sybase IQ. Remembering that Sybase IQ is one of the leading high end column oriented databases that stores data in columns, rather than rows, this seems like a bit of a challenge. I thought this would be a nice chance to take a real world look at SQL Server 2012's new columnstore index.
There is a comprehensive article by Eric N. Hanson about the requirements and things to consider when implementing a columnstore index. I suggest reading this article before you get started so you can get an idea on the memory requirements and make an informed decision on the number and length of columns your server can support.
I wanted to give some information on the server and data characteristics so you can compare this to your environment. The server used for the proof of concept project was a virtualized server with Windows 2008 R2 (64bit) that had 2 cores, 8 GB memory, and SQL Server was setup with 7GB cap. The main fact table used for the proof of concept project was loaded with 25 million rows and contained almost 100 fields. The size of the table came out to about 20GB.
Creating the columnstore index could be done using SQL Server Management Studio by clicking on any table and then clicking on the indexes folder. You will now have the option to create a traditional binary tree index as well as the new columnstore index. This should be familiar as creating any other index. The index can be saved off as a script and executed later.
A few things I came across worth mentioning is that, first a table can only have one columnstore index. This is covered in all the documentation, but the implication is that you will need to put some thought into what columns will be included as putting all the columns in the columnstore index is not always possible since there is a memory requirement. This is also covered in Eric's article which gives the formula you can use to calculate the memory size required.
Second, INSERTS, UPDATES, and DELETES are prohibited on columns in a table that are included in a columnstore index. The columnstore index must first be DISABLED. After you are finished updating the table the index has to be REBUILT. Additionally any ALTER TABLE statements on the table are not allowed on the columns that are included in the columnstore index until you DISABLE the index. This seemed logical after thinking about it, but was surprised when I got the error message since this is a difference between a traditional binary tree index.
Third, you can verify if a query is using a columnstore index by looking at the execution plan in SQL Server Management Studio. Again this technique is no different than a traditional index.
Fourth, as expected, the columnstore will only be available in the Enterprise Edition only.
It took about 5 minutes to rebuild the columnstore index, which seemed reasonable when considering the table size I was working with.
The SQL below was used to create the columnstore index. Every column that was used in all the test reports were included in the columnstore index.
CREATE NONCLUSTERED COLUMNSTORE INDEX [invoice_line_IDX_CS] ON [POC].[invoice_line]
(
[year],
[period],
[item_no],
[cust_no],
[invoice_no],
[line_no],
[charge_cust_no],
[srep_code],
[quantity],
[selling_uom],
[ship_name],
[ship_addr1],
[ship_addr2],
[ship_to_city],
[ship_to_state],
[ship_to_zip],
[invoice_date],
[order_type],
[record_updated],
[total_pkg_qty_per_sku],
[total_product_quantity],
[net_sale_amt]
)WITH (DROP_EXISTING = OFF) ON [PRIMARY]
GO
Two Execute SQL Tasks were added to the SISS package that loaded the main fact table. The first Execute SQL Task to disable the columnstore index was added just before the load and the second Execute SQL Task was added just after the load.
ALTER INDEX [invoice_line_IDX_CS] ON [POC].[invoice_line] DISABLE
GO
--LOAD TABLE USING SSIS
ALTER INDEX [invoice_line_IDX_CS] ON [POC].[invoice_line] REBUILD
GO
4 basic reports that represented common user requests in the environment were used to compare report performance when using Sybase IQ, SQL Server with a traditional binary tree index, and SQL Server with a columnstore index. The reports were executed using two different BI reporting tools. This is represented in the chart below as "A" and "B". One BI reporting tool is "A" and the other was represented as "B". The time represents the total report execution time in seconds which includes data retrieval and internal report processing by the BI reporting tool.
The reports used simple SQL. Nothing fancy or complex here. Report 1 simply summed invoice amount by all years, while Report 2 filtered on a single sales zone and then summed invoice amount by year. Report 3 was similar to Report 1 (summed invoices amounts by all years) but was based on a view that included extra columns. Report 4 was also based on the same view and was similar to Report 2 (filtering on a single sale zone and then summing on invoice amount by year) .
Obviously the newer, 64bit hardware SQL Server outperformed Sybase running on older 32bit hardware.
In most cases the columnstore index improved query time and was impressively fast, but it always was not faster than the binary tree index. Report 1b was actually slower and Report 2b was the same. This may have more to due with the processing time by the BI tool rather than the data retreval time, but a deeper look is needed here to understand this observation. Overall, I was pleased with the columunstore index as it was easy to setup and offered a noticeable performance improvement to most report users.
A columnstore index will help reports that summarize data on an aggregate level (SUM, MIN, MAX, AVG). A traditional binary tree index will still have their place to help retrieve a few rows using highly selective filters as this not really the strength of a columnstore index. This leads me to ask "As a BI practitioner, where does it make sense to use a columnstore index ?"
Speed up existing reports may be one idea. Some reports that may have not been considered online reports can now be deployed in an interactive online way. I would think that if this were truly the case you would have already created a SSAS cube. Interesting enough, I did have a project where I had to create a SSAS cube using HR data for this very reason. The report needed to run embedded in a .Net web application with a 1 -2 second response time. After spending time performance tuning and reindexing, I turned to creating an SSAS cube which ended up giving under 2 second response time for the report queries. In a case like this, a columnstore index may have been something to consider since it would have avoided having to create and process an SSAS cube.
A second idea was that a columnstore index can be used as a replacement for aggregate table. In 1999, aggregate tables seems to be more common. Now a days I honestly don't remember having a need for them on any recent projects. Today I would tend to rely on a SSAS or a Cognos cube instead. So I am not sure about this idea.
I also thought up third idea when thinking back on a previous client where I was working with a company on a medium size SSAS cube that contained quality data. The company had many poorly performing SSRS reports that used MDX and a SSAS cube. The company's IT staff was not comfortable with MDX or SSAS , but did have strong skills in SQL and SQL Server. A company like this may be an ideal candidate for a columnstore index. It can support their SSRS reports and take advantage of their strengths in SQL and SQL Server. I would think that there are cases where a columnstore index may meet the reporting needs and avoid the complexities that come with MDX and SSAS.
I would be curious to hear if you have any other ideas for potential applications of the columnstore index.
No comments:
Post a Comment