Concepts and Techniques on Business Analytics: 2012

Wednesday, December 5, 2012

Oracle Endeca Information Discovery – Partition Transformation

I have been working with Oracle Endeca Information Discovery tool to join structured and unstructured data together in interesting business solutions. I wanted to share my experience with the Partition Transformation and help show how it can be used to conditionally split a pipeline in an Intergrator (CloverETL) graph.

The transformation has properties called Ranges and Partition key. Setting the Partition key is easy. Just pick the fields from a list. Setting the Ranges is not so easy to understand.

See the table below that shows an example of setting the Ranges property. Setting ranges requires using an awkward syntax that uses < and > to include and ( and ) to exclude. A comma defines the low end and high end of the range. A semi-colon defines separate ranges. Sounds confusing? It is. Fortunately, there is a much easier way using CTL2.

For the easy way, click on the Partition attribute that drops you into the CTL2 editor. You can refer to your input fields with a $ prefix. $DOC_TYPE was defined on the import port and is an input field. Then write a conditional statement. In my example below, all rows with a $DOC_TYPE = “Operation Instructions” will be sent to port 0. All others will be sent to port 1.

Saturday, August 11, 2012

Authentic Communication

I believe that clients truly appreciate when consultants are genuine, authentic, and avoid technical jargon or industry clichés. To that end I am going to strive to be a more genuine and authentic in my communication style on future projects.

I will avoid pinging or reaching out to but expect me to follow up with an IM, call, email, have a face to face discussion, or schedule a meeting.

I will avoid talking to others about my bandwidth capacity or what's on my plate, but expect me to tell you the first available time I CAN help.

I will not hit the ground running, but expect me to come prepared as I can and try to be productive as soon as possible.

I will not think outside the box, but expect me to use experience and common practices as well as looking for opportunities to be creative, innovative, and consider new methods and technologies when designing solutions.

I will not architect a solution nor use the word architect as a verb but you can expect me to design and implement solutions to solve business problems.

I will not carve out time, but expect me to look for the next available time in my calendar.

I will not leverage things, when I can simply use things.

I will avoid talking about drinking kool-aid, but will question when there seems to be a blind, uncritical acceptance of a business practice or technology.

I will not refer to other human beings as a resource. You can expect me to recognize the dignity of each person and treat all my business partners, colleagues, and clients as I would want anybody in my organization to treat our very best customer.

Wednesday, July 18, 2012

What Do You Prefer - Inmon or Kimball?

I was asked that question during an interview for a position on a client project using Microsoft BI tools. The client interviewer asked “Which do you prefer -Inmon or Kimball?”

The question was easy for me to answer. I could talk about how the Kimball data modeling technique called for using fact and dimension tables, star schema, conformed dimension to link different fact tables, relied on data marts, and was considered a bottom up approach. On the other hand, the Inmon data modeling technique called for a more top down approach using enterprise wide data warehouse using normalized tables.

The client interviewer pressed on. “Which would you choose?”

I thought to myself. I never really considered one over the other. They both seemed like acceptable approaches to accomplish the same thing.

I was first exposed to dimensional modeling and the Kimball technique in graduate school and have used it on a number of projects since then. Building OLTP cubes in IBM Cognos Transformer and Microsoft SQL Server Analysis Services both seemed to favor using the Kimball technique. On a few projects I used the Inmon technique. These were typically projects I inherited or were large, very complex data models.

I started off saying “It will depend on the project, the client......” and was abruptly cut off.

“Which one do you prefer?” asked the client interviewer in a challenging tone.

I had to pick one. I said something like “I used both techniques in the past, but would favor the Kimball method”.

The client interviewer responded “Good!” and moved on to the next questions.

Today in 2012 vendors have more products that use column oriented database technology that are within the price point of most companies. Microsoft has implemented the column store index in SQL Server 2012 and uses column store technology as the engine behind Power Pivot for Excel and SharePoint 2010, and SQL Server Analysis Services in tabular mode. Oracle has Endeca which supports ultra quick searches of structured and unstructured data. Endeca uses a column oriented technology for its MDEX Engine.

With column store technology so available in today’s products and CPU speed and memory size for servers continues to grow at mind numbing rates, it seems we are approaching the point that CPU and memory is not a primary constraint for many projects that have a dataset size of less than 1 terabyte.

I began to think that I may not have the same response to the client interviewer if I was asked today . The Inmon technique seems to have merit when considering products that use column store technology. The Kimball star schema may not be entirely necessary with column store technology.

We also may be looking at a new approach entirely. Perhaps a simplified variation of the Inmon technique will evolve over time. I can see merit in modeling a few related entities that contain denormalized (repeated attributes) tables. In data sets under 1 terabyte this approach may be something to consider as it may be faster to develop and deliver a BI solution initially. It may also help a company be more agility by supporting quicker changes and enhancement requests.

Tuesday, July 17, 2012

What is Predictive Analytics?

I wanted to pass on four articles by Alex Guazzelli I found on the IBM - DeveloperWorks site that gave a nice overview of predictive analytics that I found interesting and easy to read.

Part 1: What is predictive analytics?

Part 2: Predictive modeling techniques

Part 3: Create a predictive solution

Part 4: Put a predictive solution to work

Monday, June 18, 2012

Connecting Excel 2010 to SQL Server Analysis Service 2012 as a Domain User on a Non-Domain Laptop.

I am using Windows 7 on my laptop and needed to test out dimensional level security using Power Pivot\Excel 2010 that connected to a SQL Server 2012 Analysis Services cube. The cube was setup to filter on a business unit dimension. Users could only view measures in which they belonged to that particular business unit. I needed a way to connect as different domain users to verify the dimensional security in SSAS was properly setup. I wanted to connect to the client domain, that I was not a member of, with my laptop that belonged to my home office domain. Essentially, I wanted to connect using am\tom.puch while logged on to my laptop as pla\tpuch.

I came across an excellent posting by James Kovacs where he connects to SQL Server database engine using SQL Server Management Studio using a different domain user than the one he was logged on to this local machine. Devin Knight also had a nice posting where he used the EffectiveUserName property in the connection string I wanted to demonstrate how James Kovacs’ technique could work for Power Pivot/Excel 2010 when connecting to a SQL Server Server 2012 Analysis Service Cube.

I opened PowerShell and used the following command.

runas /netonly /user:AM\tom.puch "C:\Program Files (x86)\Microsoft Office\Office14\EXCEL.exe"

Excel started up. I then confirming the data connection in Excel, I can now see I am connecting with am\tom.puch.

A quick check of SQL Server Profiler confirms that the correct account is used.

Friday, May 25, 2012

SQL Server Data Alerts - Email configuration has no 'server' or 'pickup directory'

I was setting up a SQL Server Reporting Services data alert for the first time and got an error.

I searched the log file for the identifier listed in the status message in the log file under C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\14\LOGS. I found an entry that said “Email configration has no 'server' or 'pickup directory' specified”. My first thought was I’m the last person to nitpick others spelling, but I did think “configration” should have been spelled as “configuration”. My second thought was I know I setup the Email Settings in the Reporting Services Configuration Manager. I double checked and sure enough it was setup as I expected.

I checked under SharePoint 2010 Central Administration under Manage Service Applications and looked for the Reporting Services Application clicked on it. I then clicked on the Manage button on the toolbar.

I then clicked on Email Settings………

….and my Email Setting were empty.

I checked the Use SMTP server check box and filled in the Outbound SMTP server and the From Address boxes. I went back to the report and right clicked on it and selected Manage Data Alerts and then selected the data alert and selected Run. The status message now said “Last alert ran successfully and the alert was sent.”

I checked my inbox and sure enough the email was there. Now I have no excuse for not filling out my timecard.

Thursday, May 24, 2012

List Price for SQL Server 2012

I always seem to need a quick reference for the list price for SQL Server 2012. I thought others would appreciate this too.



Edition	Unit of Measure	List Price

Enterprise	per core	$6,874

Business Intelligence	per server	$8,592	AND
	per user	$209

Standard	per core	$1,793

Standard	per server	$898	AND
	per user	$209

Source:
Redmond Channel Partner
As of March 23, 2012

* Price doesn’t include Software Assurance

Saturday, March 31, 2012

Profitability Analysis–SQL Server 2012 SSIS and SSAS meets JD Edwards

I am wrapping up a profitability analysis project for a global medical manufacturing company. The goal was to give the management team a Business Intelligence application to show profitability margins for customers, products, geographic locations, and production facilities. The solution provided OLAP reporting on profitability metrics by while supporting multiple currency conversions for every currency the company did business with. To deliver this solution I had to pull financial measures like revenue and expenses as well as statistics on the manufacturing process from Oracle’s JD Edwards Enterprise One.
The idea for this project was to use SQL Server 2012 and use SSIS for the ETL process and then SSAS for the cube. The users would be using Excel 2012 to connect to the cube directly. I am happy to report that I did not run across any issues related to using SQL Server 2012 even though I was using Release Candidate 3 for most of the development phase.
I came across a few things during the SSIS development phase of this project that I wanted to pass along to others business intelligence folks who may be pulling from JD Edwards / Oracle database.
1. Configure 34bit and 64bit OLE DB Oracle providers following Greg Galloway instructions. These instructions are fantastic. I can appreciate these instructions as I remember the wailing and gnashing of teeth that took place the first time I had to setup Oracle’s OLE DB providers on a 64bit server.
2. Decide on using linked server (TSQL) or OLEDB Source (PL/SQL). My experience was that I got noticeably faster performance when using the OLEDB Source with PL/SQL rather than using the more convenient linked server.
3. Use existing date and time conversions. Bryant Avey had two useful TSQL functions. One converts the JD Edwards’s date fields, which are represented in Julian date format of CYYDDD where C = Century; YY = a 2 digit year and DD = the 3 digit number representing the day of the year (1 through 365 or 366 days on a leap year) to a Gregorian date. The function was called DateJ2G

CREATE FUNCTION [dbo].[DateJ2G]

(

@JDEDATE int, @FORMAT int

)

RETURNS varchar(20) AS

--Written by Bryant Avey, InterNuntius, Inc.

--Provided free "As Is" with no warranties or guarantees

--I just ask that you keep these comments in the function, if you use it.

--The complete article describing this function can be found at:

--http://wp.me/pBPqA-a

--This function takes a JDE Julian Date and returns

--a varchar date in the format style you specify

--To us simply pass in the JDE date and the style code

--Style codes can be found at

--For Example: select dbo.DateJ2G(sddgj,101) from f4211

--would return the JDE date in the format of 02/29/2008.

--Select dbo.DateJ2G(108060, 1) = 02/29/08

--Select dbo.DateJ2G(109060, 107) = Mar 01, 2009

--Format codes are standard SQL 2005 Date Convert codes.

--Conversion codes can be found here: http://wp.me/pBPqA-a

BEGIN

DECLARE @sqldate datetime

set @sqldate =

dateadd(day,cast((1900000 + @JDEDATE)%1000 as int)-1,(cast((

cast((1900000 + @JDEDATE)/1000 as varchar(4)) + '-01-01')

as datetime)))

RETURN (convert(varchar(20),@sqldate,@FORMAT))

END

The other function, DateG2J, converts a Gregorian date to a Julian Date. Again this was very useful.

CREATE FUNCTION [dbo].[DateG2J] (@Geogian_in datetime)

RETURNS int AS

--Written by Bryant Avey, InterNuntius, Inc.

--Provided free "As Is" with no warranties or guarantees

--I just ask that you keep these comments in the function, if you use it.

--The complete article describing this function can be found at:

--http://wp.me/pBPqA-a

--This function takes a varchar gregorian date and returns

--a Julian JDE Date

--To use simply pass in the string date

--For Example: select dbo.DateG2J('02/29/2008')

--would return the JDE integer date of 108060.

--Date input formats are standard SQL 2005 DateTime values.

--Any validly formated date string will work such as 'feb 29,2008' to get 108060.

BEGIN

declare @JulianDate_out INT

declare @Century INT

declare @YY INT

declare @DayofYear INT

Select @Century = case when datepart(yyyy,@Geogian_in) > 2000

then 100000 else 0 end

Select @YY = CAST((SUBSTRING(CAST(DATEPART(YYYY, @Geogian_in)

AS VARCHAR(4)), 3, 2)) AS INT)

select @DayOfYear = datepart(dayofyear, @Geogian_in)

SELECT @JulianDate_out = @Century + @YY * 1000 + @DayofYear

RETURN(@JulianDate_out)

END

I created a third function to validate and format the JD Edward’s time fields into a valid time field for SQL Server. My experience is that JD Edwards allows any combination of integers that may have not necessarily represented a valid time in SQL Server (or anywhere else for that matter). The ValidateJDETime function helped deal with this by checking the time and setting it to midnight if an invalid time was found.

CREATE FUNCTION [dbo].[ValidateJDETime] (@IN_Time VARCHAR(10) )

RETURNS CHAR(8)

--Written by Thomas M. Puch

--Provided free "As Is" with no warranties or guarantees

--I just ask that you keep these comments in the function, if you use it.

----Usage: SELECT dbo.ValidateJDETime ( '112233')

BEGIN

DECLARE @RV CHAR(8),

@vt AS CHAR(6),

@vh AS CHAR(2),

@vm AS CHAR(2),

@vs AS CHAR(2)

SET @vt = REPLACE(STR(@IN_Time, 6), SPACE(1), '0')

SET @vh = SUBSTRING(@vt, 1,2)

SET @vm = SUBSTRING(@vt, 3,2)

SET @vs = SUBSTRING(@vt, 5,2)

IF (@vh BETWEEN 0 AND 23 AND

@vm BETWEEN 0 AND 59 AND

@vs BETWEEN 0 AND 59)

BEGIN

SET @RV = @vh + ':' + @vm + ':' + @vs

END

ELSE

BEGIN

SET @RV = '00:00:00'

END

RETURN(@RV)

END

4. Planning to deal with NCHAR and NVARCHAR2 is something you will want to do early on . Most text fields in JD Edwards are either NCHAR or NVARCHAR2 that are padded with trailing or in some cases leading spaces. Trailing or leading spaces need to be accounted for when matching up to other data using the SSIS Lookup transformation especially if you are joining to data from text files or other source systems. It is likely that the other data sources will not have trailing or leading spaces. This will cause the SSIS Lookup transformations to not find a match in SSIS. This can be time consuming to troubleshoot because at first look the data seems to match and even joins together in a TSQL query. I would suggest using the TRIM function in Oracle on every text field in your PL/SQL query to JD Edwards. This function will remove trailing and leading spaces and will allow your SSIS Lookup Transformations to find matches.
5. When using the Oracle OLE DB provider I found that the property called ExecuteOutOfProcess, found in the Execute Package Task had to be set to FALSE if the sub-package used the Oracle OLE DB provider. This was also pointed out in Greg Galloway’s instructions. Like Greg, I did not really investigate why, but set the configuration and moved on.

SQL Server 2012 SSIS ExecuteOutOfProcess

Friday, February 3, 2012

Connecting R to Microsoft SQL Server

Connecting R to SQL Server to pull data from a SQL Server data warehouse or data mart is something you may want to consider if you need to do advanced statistical computing.

Assuming you have R already installed, the prerequisite steps are first you have to download and install the RODBC package. This was done by using the menu options found under the Packages menu in the Rgui. The second thing to do is create an ODBC data source. I created a system data source for SQL Server. I configured my server and default database, making sure to point the default database to the database I wanted to connect to. In this case I was connecting to the CatchAll database. I did not feel very creative so I called the ODBC connection the same name as the database.

The first line calls the RODBC package that supports ODBC calls.

The second line creates the connection using your ODBC connection your created previously.

The third line runs your SQL query. You can query a table or a view. The output is sent to a data frame called “dataframe”.

Finally in the fourth line the connection is closed and in the fifth line I displayed an average of the price field in my data frame to show that in fact the dataframe has been populated.