We need to understand the tabular model – where the data comes from, how to edit the schema after you create it and how to create hierarchies. This section walks you through doing this in three of the main tables – Date, Team Projects and Work Items. This post will conclude with a discussion and walkthrough of the work item measures.
You can download the Team Foundation Server Tabular Model from the Visual Studio gallery.
Working with the Date table
In this section you will look at one of the core tables from the Tfs_Warehouse – DimDate which is the date dimension and DimToday which contains a single value, the current date. There are two ways to bring schema information into the model. You can use a SQL query (hand written) or just select the columns by checking each column. Please be aware that I’m covering this very lightly and that there are other options when pulling schema information in to create your model. For the DimDate table I elected to write a very simple select statement and then to change the table and column names afterwards. Obviously one of the options when writing a select statement to pull the model information is that you can change the name as part of the statement. I’ll show you an example of this when I cover another table.
To get to the column source, if you are in the Grid view, select the Date tab and if you are in the model view select the Date table and then view the Properties (F4 or View > Properties Window from the main menu). These are the Properties from the Date table (Figure 1).
There are just a couple of things to note regarding these properties:
- Connection Name indicates where the table gets its data from – obviously this means that, as with PowerPivot, you can pull data from different sources. It’s a good idea to keep one connection string per unique data source so in this case the model only has one connection
- Hidden indicates that it is not visible to client tools (this applies to the whole table in this case but you can apply it to specific columns of data as you’ll see shortly)
- Partitions are not covered in this post (I will cover it in a later post) so for right now there are no partitions
- I do not cover the default field set or table behavior in this post – I may cover them in a later post
The Source Data represents just that, the source data. The Tabular Model takes its schema from the source data. If you click the Source Data ellipsis you’ll see the following select statement:
The important thing to note here is that the date information coming back is constrained to today’s date. The DimDate table in the warehouse actually has about three months’ worth of dates into the future. You don’t want these dates because when you start creating reports through certain clients (such as Excel) it will automatically put those dates in and you’ll have to filter every report that uses dates which is a pain.
Modifying the Date table and creating hierarchies
Once the date table was brought in I made the following changes in the tabular model:
- Rename the table Date
- Rename Year > Year_Original
- Rename Month > Month_Original
- Rename Week > Week_Original
- Rename YearString > Year
- Sort Year by Year_Original
- Rename MonthString > Month
- Sort Month by Month_Original
- Rename WeekString > Week
- Sort Week by WeekOfYear
- Hide all columns in the table except for Year, Month, Week, Date
- Create the Year Month Date hierarchy
- Create the Year Week Date hierarchy
- Mark the table as a date table
Don’t panic if you don’t know what some of these changes mean or why you would want to make them. It is critical to understand so that you can modify the model without screwing it up so I will cover each of these in detail.
Why rename the columns if they already have the names that you want to show up in the client tools? Quite simply it’s because it’s the wrong data and the wrong data type. Let’s look at the default Year column when it comes in from DimDate. The Year column contains values such as “1/1/2012 12:00:00 AM”. But, when you report on a year with Excel for example, the year shows up as “2012”. So you don’t want the fully qualified year, you don’t want the month or day with the year and you sure don’t care about the time. But there is a column with the value we want – YearString. Again, from the client tools that you are doing from the reporting, you don’t want a user choosing to see the “YearString”, you want them to choose the “Year”. This is what led to a re-shuffling of the column names. This is the same for the other column renames.
To rename a column, just select the column and in the Properties Window type in a new name for the Column Name property.
We’ve all run into this before – when you sort a string field it sorts alphabetically and not numerically. It’s a royal pain in the butt. So how do you fix it? In the tabular model you can select a column by which to sort. Figure 2 shows the properties for the Week column (after the renames which means this week column contains values such as “Week ending November 03”).
Notice that the Sort By Column is set to WeekOfYear. The WeekOfYear column is a numeric field which just has a number representing which week of the year this is (1 – 52). In this manner you can sort a textual column by a numeric column and have it come out right.
It’s important to understand that this does have a problem in one situation. If you were (for some inextricable reason since it doesn’t make sense to do this) to add the Week column and then for example the work item count to a report in Excel, it would sort by weeks not by years and then weeks. So there are some reports that just don’t make sense because you need to separate them out by higher level time periods – I’ll cover this in just a moment when I talk about hierarchies.
Any column that is not hidden is automatically displayed in a client interface. Because the goal was to match this as closely as possible to what is provided by the Tfs_Analysis cube we want the Date dimension to look what is in Excel (Figure 3).
Ignoring the hierarchies for a moment, this leads us to hide all fields but the four shown here under More Fields.
To hide a column just select the column (you can select more than one at a time but they have to be contiguous), right-click and select Hide from Client Tools.
Mark the table as a date table
This allows you to do certain things with the table in the client tools. To mark a table as a date table (you can have more than one but for TFS only one is needed), select the table and then select Table > Date > Mark As Date Table and then ensure that the DateSK column is the unique column value.
As a quick aside, you will see two types of key columns in most tables in the data warehouse – the SK and BK columns. The SK column is the Short Key column and the BK column is the Backup Key column. Either can be used as a unique identifier for the table but virtually all joins are done on the SK column so use this if possible.
Creating the date hierarchies
Hierarchies allow for drill downs and rollups depending on the client tool you are using (for example, hierarchies are supported in Excel but not in Reporting Services). They are generally very helpful and you will see a few different hierarchies in the Tfs_Analysis cube. These hierarchies are Date, Area, Iteration and File Path hierarchies. There is also a Collection/Team hierarchy this is only ever one level deep since teams are nested under collections but they can’t be nested under each other.
To create a hierarchy you have to be in the Diagram view (to switch to the diagram view, click the Diagram button in the lower right corner of the Grid view in Visual Studio). The Date table is shown in Figure 4.
There are two ways to do this so it’s just a matter of preference:
- Click the Create Hierarchy button (shown in the above figure), rename the hierarchy and then drag each of the fields you want in the hierarchy and then arrange them in the right order or
- Ctrl + click each of the fields you want in the hierarchy, right-click one of the fields and select Create Hierarchy. Rename the hierarchy and arrange the fields
Here you can see the date table with Microsoft’s two standard date hierarchies – “Year Month Date” and “Year Week Date”. I say this is Microsoft’s standard because these two date hierarchies exist for every date in TFS. However, it is important to note that the fields required to create these hierarchies do not, by default, exist any other table in the warehouse because they are calculated using the values in the DimDate table when they are placed into the cube. I will show you how to do this yourself a little later.
This concludes creating the date table. It is one of the foundational tables of any analysis services model based on TFS and fortunately it is a relatively simple table. Now we’ll look at another one that is also relatively simple but has a self-referencing relationship that is not automatically handled in a tabular model.
Working with the Team Project dimension
The Team Project dimension is pulled in from the DimTeamProject table in the warehouse simply by checking the columns desired (in this case I selected all of them). After pulling the table in, I rename the table to “Team Project” and then did the following:
- Rename ProjectNodeTypeName > Project Node Type Name
- Rename IsDelete > Is Deleted
- Rename ProjectPath > Project Path
- Rename ProjectNodeName > Name
- Change the Is Deleted field to the Whole Number data type (it comes in as a string)
- Add a new column called Collection so we can create a hierarchy (explained below)
- Hide all columns except Name, Project Node Type Name, is Deleted and Project Path
- Create the Team Project Hierarcy – Collection / Name
Most of this is pretty straightforward with the exception of the Collection column. The main reason we need to create this node is that the table is self-referencing. In looking at the data you can see this in a few different ways. If the column has a depth of 0 it is a collection. If it has a Project Node Type Name of “Team Project Collection” it is a collection. If it has a ProjectNodeType of 1 it is a Collection. So, you need to look at the ParentNodeSK to see which project belongs to which collection. The tabular model does not support a recursive relationship when creating hierarchies so we have to flatten the table a little bit so that we can create the Team Project Hierarchy.
When you add the Collection column it starts off as an empty column of type String. If you select the first row in the table beneath the name you can enter a formula in the formula bar just as you with in Excel. Expressions in PowerPivot and tabular models use DAX (for more information on DAX see this link on Technet). The formula for the collection column is this:
=LOOKUPVALUE(‘Team Project'[Name], ‘Team Project'[ProjectNodeSK], ‘Team Project'[ParentNodeSK])
This is a fairly standard lookup formula which just says, look up the column name in this table by searching in the ProjectNodeSK column for a value which matches the ParentNodeSK column of the given row. That’s it. Once you’ve done this you should see the parent collection for each project show up in the column. Now you can switch to the Diagram view and create the hierarchy.
The Work Item dimension
Finally we get to the heart of reporting in TFS – work items and the Work Item dimension. This table is a real pain to convert. Once you get the hang of it, it’s easy and the same changes apply to almost every other table that has these fields. In addition, everything you have to do for this table applies to the Test Case table and the Work Item Linked table and the Current Work Item table.
It is also important to note that there are some tricks you can pull using MDX which make it so you don’t have to duplicate the contents of a table (you can essentially use a pointer to the original table with a filter which either you can’t do in a tabular model or I haven’t figured out how – if you know I would love you to drop me a line so I can improve the model performance and size). Because of this small item you do have to duplicate this work. Fortunately since you are working from a pre-built model this shouldn’t be an issue and this is more informational. As you grow in your use of reporting, you may come across other things that you want to do or customizations that you want to make and hopefully this guide will help you.
The source of the Work Items table is the vDimWorkItemOverlay table. You do not have to use this as the source – you can use the DimWorkItems table however there are some items that have been denormalized for us in this view so I chose to take advantage of it. The statement to import the schema and data is incredibly long because of the number of columns. I have reproduced sections of it here to walk you through very specific parts so you will see abbreviated SQL statements in the next few explanations.
First off, let’s start with the date columns. As I mentioned previously, only the date table includes all of the columns to create the two standard date hierarchies. That means you need to create the columns required on your own. The major reason for this is that it would require quite a bit more work to do this in the model than doing it with a SQL statement. So, for all of the date columns, the following portion of the select statement is used:
,cast(a.[System_ChangedDate] as date) as [Changed Date]
,(select [Year] from dimDate where DateSK = cast([System_ChangedDate] as date)) as [System_ChangedDateYear]
,(select [YearString] from dimDate where DateSK = cast([System_ChangedDate] as date)) as [System_ChangedDateYearString]
,(select [Month] from dimDate where DateSK = cast([System_ChangedDate] as date)) as [System_ChangedDateMonth]
,(select [MonthString] from dimDate where DateSK = cast([System_ChangedDate] as date)) as [System_ChangedDateMonthString]
,(select [WeekString] from dimDate where DateSK = cast([System_ChangedDate] as date)) as [System_ChangedDateWeekString]
,(select [WeekOfYear] from dimDate where DateSK = cast([System_ChangedDate] as date)) as [System_ChangedDateWeekOfYear]
,(select [DayOfYear] from dimDate where DateSK = cast([System_ChangedDate] as date)) as [System_ChangedDateDayOfYear]
This is really a pretty straightforward statement, it’s just long. The statement just selects the appropriate values out of the DimDate table so that we have all of the values needed to create the date hierarchies for each date. If you do not need to create a hierarchy I would highly recommend skipping this. However, SQL 2012 SP1 will support hierarchies in Power View which means people might actually want to do rollups based on dates if they use Power View. Think about it before you decide to exclude dates. This syntax can be repeated for each date by just changing the name of the date column.
Another item to note is that each column has been re-name so the client tools will show it as a value more suited to business users. But you will also note that the values that have been brought in above have not had the column names changed because these are hidden from client tools.
Follow this procedure for each date hierarchy that you want to create:
- For the YearString field, set the sort to the Year field
- For the MonthString field, set the sort to the Month
- For the WeekString field, set the sort to the WeekOfYear
- For the actual date field, set the sort to the DayOfYear
- Create the Date Hierarchy by Month hierarchy
- Select the YearString, MonthString and date and select Create Hierarchy
- Name the hierarchy
- Re-name the hierarchy fields to Year, Month, Date
- Create the Date Hierarchy by Week hierarchy
- Select the YearString, WeekString and date and select Create Hierarchy
- Name the hierarchy
- Re-name the hierarchy fields to Year, Week, Date
Hide all of the date fields with the exception of the actual date.
This next snippet shows retrieving user names:
,c.[Name] as [Assigned To]
FROM [dbo].[vDimWorkItemOverlay] a
LEFT OUTER JOIN [dbo].[DimPerson] c on a.[System_AssignedTo__PersonSK] = c.[PersonSK]
This statement retrieves the persons’ actual name from the DimPersons table. In fact, you don’t have to retrieve the DimPersons table at all if you don’t want (again, in the recreation of the out of the Tfs_Analysis cube) and I have not (you are certainly free to add this table to the model but all of the information can be retrieved without this table by using the above SQL statement). But there is a good reason to retrieve people’s names this way instead of with joins or lookups in the model itself. The tabular model only lets you create one relationship between each table even if multiple columns may be related. So, to get around that you have to write DAX calculations (which don’t affect load time but not run-time because the calculations are evaluated while data is loaded). The calculation is pretty easy once you’ve done it once but it takes time and I just felt this was a more straightforward approach. Part of this reason is that I could copy this SQL for all of the tables that had work items and I didn’t have to play around with re-names in the model which can be very slow for reasons that I haven’t figured out yet but which I have reported to Microsoft.
So, you will see in the table source for the Work Items table, there are a number of joins to the DimPerson table to resolve all of these values.
Most of the rest of the SQL statement is just column renames for the sake of client tools. Once you have this select statement figured out you can handle the bulk of all of the other tables in the model.
Last but not least – the measures
Fact tables are akin to the Measures shown in Excel in the Tfs_Analysis cube. It also happens to be that they hold all of the data required to create the calculations. In fact, it was my examination of this data that led to the discovery of a bug with how the cube calculates the work item count of Test Case work items in TFS 2010. This has been fixed in TFS 2012 after this discovery.
First, you’ll need to rename the fact tables to something better. In this case I tried to create a Measure group for each fact table that corresponded to the correct data in the original model. Tabular models are not like MDX models and you can’t create Measure Groups independently of the tables on which the data is based. There has to be a table to hold the calculations for the measures.
Second, I’m only going to discuss the Work Item measures table. If you can do this you can do anything and you have the model to examine so you can take a close look at the other available measures.
Finally, I added the word “Measures” to the end of each measure table because in Power View the tables and measures are shown inline and are not separated. So the Measure appendage keeps the names unique and makes them easily identifiable to end users. Hopefully this is an issue in Power View which MS will eventually address.
For a measures table, you will want to hide all of the columns from the client tools because the individual columns are not that useful on their own.
The measures are entered into the Measures Grid as shown in Figure 5.
To add a formula, select one of the cells in the Measures grid and enter a formula (either their or in the formula bar above the table). Starting with just the six standard measures:
Revision Count:= CALCULATE(Sum([RevisionCount]),
State Change Count
State Change Count:= CALCULATE(Sum([StateChangeCount]),
Work Item Count
Work Item Count:=CALCULATE(Sum([RecordCount]), DATESBETWEEN(‘Date'[DateSK],
I’m using the DAX CALCULATE function (this is a fantastic post from Marco Russo on how the calculate function works) for performing the measure calculation. What this function is saying is, “give me the sum of the named column for the dates between the earliest point in time (Blank) and the last date in the Dates table as applied to all of the values in the Date table”. What we want to make explicit here is that when a date range filters this calculation that it is incorporated into this calculation. Marco Russo’s post explains how a user select range and this default scope are and-ed together to come up with the filter constraint. Mostly likely we could do a simple sum and it will still work appropriately but I thought it better to be explicit where possible.
Once you have a measure, you just have to select the measure and set the data type (in this case to the Number or Decimal Number format as needed).
Custom calculations – duration
Now, the last five measures are the fun ones and it’s these measures that I really wanted and were not possible to get in the existing Tfs_Analysis cube. But we can do it here fairly simply. These last five measures report on duration so you can easily run an effort to duration report. In lean, the difference between effort and duration is waste. In order to be able to eliminate waste you have to be able to measure it and report on progress. Well, now you can!
To add this type of calculation you first need to create a new column to hold the calculated value for a given row. To do this I added a column called Active To Closed Duration. The calculation for this column is actually very simple because we’re calculating the total duration (including weekends). The calculation for this column is as follows:
ROUND(24. * (RELATED(‘Work Item'[OriginalClosedDate]) – RELATED(‘Work Item'[OriginalActivatedDate])), 1) < 0,
ROUND(24. * (RELATED(‘Work Item'[OriginalClosedDate]) – RELATED(‘Work Item'[OriginalActivatedDate])),
I’ve chopped this up to make it a little simpler.
- Starting with an if statement we check to see if the work item has even been activated (IsBlank) and if it hasn’t we just blank the field out because there is no calculation to be done.
- However, if OriginalActivateDate is not blank then we take the original closed date and subtract it from the original activated date, multiply it by 24 and round it. Here we check to see if this value is less than 0 which means the OriginalClosedDate is blank and we set the field to blank.
- Finally, if the value is not blank, use the value.
So, what is the calculation? It’s:
ROUND(24. * (RELATED(‘Work Item'[OriginalClosedDate]) – RELATED(‘Work Item'[OriginalActivatedDate])), 1)
This works by looking in a related table (in this case the Work Item table which is related to the Work Item Measures table on the WorkItemSK column) for the activated and closed date and doing the subtraction. And then it is multiplied by 24 since that is how many hours there are in a day.
Now, let’s take the next column which is much more complicated – the Active to Closed Working Hours Duration. This is probably the information that you want more than anything if you are trying to do this type of calculation and as I said, this was my starting goal. This calculation relies on the preview calculation we just looked at:
ISBLANK([Active to Closed Duration]),
[Active To Closed Duration] –
(‘Date'[DateSK] > RELATED(‘Work Item'[Microsoft VSTS Common Activated Date])) &&
As with the prior calculation this can be read as follows:
- If the Active to Closed Duration is blank, just leave this column as blank also
- If it is not blank then we want to subtract the total duration from the non-working hours duration.
- We do this by taking the sum of the NonWorkingHours column in the Date table (this is a column I added and will explain in a moment) and we calculate this by filtering the date table to values after the activated date and before the closed date
When I was trying to determine how to best calculate the working hours it became apparent after a number of different methods of doing the calculation that the best way was to eliminate non-working hours as defined by 16 hours during a weekday and 24 hours during a weekend. To facility this I added a column to the Dates table to hold this information so I didn’t have to figure out which day was which. In addition, to make it more accurate you could add another column for holiday non-working hours for your company to adjust the values I provided out of the box.
This does result in one interesting situation where these numbers can be off by a few hours – what if you active an item at the end of the day (i.e. you only work it for one hour that day instead of eight) and you close it an hour after the day starts – again you have only worked on it one hour instead of eight? To help eliminate this effect, you will notice that the activated date starts calculating non-working hours the day after the work item was activated but it does calculate it through the closed date. This helps normalize the value somewhat and I wasn’t looking for a down to the minute calculation. However, I have two things to say about this:
- It is always critical to understand how a metric is calculated and to communicate it effectively so that everyone knows how the measurements are being made and what they are being measured against and
- I was looking for a good general number – when your team gets down to small differences like one hour between the duration and the effort then you can work on making this a more detailed calculation – it probably isn’t something you need to deal with right now
Take these into account before you start making changes. I’ve been asked to make changes for people working 4×10 schedules or 9×80 schedules – everyone has different schedules so this becomes something of a logistical nightmare. I was looking for a good solid baseline. If it’s off by a little it is at least consistently off. But now you have enough information to make the changes!
Some final notes
I mentioned in part 1 of this post that I would not be discussing partitions (and I still won’t), however, it is important to understand partitions. When the model is reloaded (i.e. Full Process) it will attempt to reload all of the data in each table. If you have a very large TFS instance this will take longer and longer to generate. Partitions get you around that problem. I did not pre-create the partitions on purpose because they aren’t needed for smaller teams. When you do go to create partitions in TFS, because data in history does not change, create it on the LastUpdateDate column for each table and do it by year. That way only the current year is reloaded. If you have a lot of data you can do it current year and then by month within the year. This is a lot of work for each table so map out a plan before you do it. You’ll be much happier after you to do it though.
I will be putting out some updates to the model. The first update will undoubtedly be to capture TFS performance data from the Tfs_Configuration database so teams can report on operational performance at a better level than they can now. Next I will be putting out a version with partitions so you don’t have to. And finally I will probably put out a PowerPivot version of this schema.
Other items that I want to work on are the many to many relationship involving requirements and test results – I’ll have another blog post on this shortly with the problem and some work-arounds. I will also be releasing some sample Power View reports illustrating the power of Power View and the tabular model and some more information on additional metrics that I’ll be adding (for instance, did you know that the test execution duration is actually reported to the data warehouse but not the cube? I’ll be adding this to the tabular model).
I truly hope this model will be beneficial to teams. There are many ideas that I have devised where this model can be incredibly helpful. For instance, what if you have several different templates with different states? It becomes confusing for a team reporting out of the cube to see states that they don’t have in their template. They add them because they don’t know and they always get no values on those rows. Using this model you could create several cubes – one based on each template type so they don’t have to try to figure out what column X is because they have never seen it before. For teams that really want to keep their data separated you could create a PowerPivot model (yes, this tabular model can be used in PowerPivot but it requires a bit of work to get it in there right now – I don’t know of a shortcut). If there is enough demand I may just put out a PowerPivot model so you don’t need to convert it.
I hope that you will enjoy playing with and using this model and that it helps you with your reporting endeavors. I will be putting out additional blog posts describing other aspects of this model over time. I hope you’ll stay tuned and definitely feel free to drop me a comment and I’ll get back to you when I can.