As promised in my previous post about working with EAV-data this post would about handling multiple timelines (or periods for that matter).
While this blog is specifically written for an EAV-environment, the technique explained in this post can be useful for other situations as well.
The problem I’m discussing in this article is about working with multiple, overlapping periods of time and the way they affect the outcome of PIVOT statements.
Because there is so little to find about this subject, I decided to make a write-up of the solution me and my colleagues came up with.
Before diving into the technique I would like to position the problem, so first of all I’d like start off with a simple example of what’s going wrong.
Take a look at the following tables:
They have a striking similarity with the tables I’ve used in my previous post, but this time the EAV_Data table is a bit more complex due to the added ValidFrom and ValidTo columns.
Because of these two columns, it is now possible to have multiple values  for different periods of time.
John is an Englishman who chose the green wetlands of Holland to make his new home. He liked it so much that he even acquired the Dutch nationality at some point.
His actions did not go unnoticed and John conquered the heart of the girl of his dreams, who he married a year later.
See underneath for a graphic representation of the situation:
Now let’s say we ignore the time-axe and just start pivoting, just like we did last time:
This query yields the following result:
This result introduces a few new problems.
First of all, the periods overlap each other.  This is a bad thing, because now I don’t have one version the truth.
Second of all, most columns are blank (NULL). This means that in order to get a correct selection we have to combine the right periods AND merge the right values to get a complete, correct visualization of the record on a specific moment.
But what is causing this behavior in the first place? Because the PIVOT operator now ‘rotates’ the data around the ID, ValidFrom and ValidTo columns, data is only presented in a row if it matches the exact same period. This is incorrect behavior, because the name (John) is valid during the entire period and therefore should be present in every row.
What we need, is what I’d like to call an Effective Timeline.
An effective timeline is a timeline that consist out of multiple consecutive periods, rather than multiple parallel/overlapping periods.
This way, a specific moment belongs to one period and one period only.
If we merge all the periods present in our example and remove the duplicates we get something like this:
See how this graph only has one line for every period?
But how do you achieve this programmatically?
In my project we came up with the following solution;
- We simply gather all unique start-and enddates of every period in one big collection.
- Next, we sort this collection from oldest to newest, ignoring the fact if the current record is a start- or enddate.
- Now we have a temporary collection with all possible dates, conviently ordered by date.
- Finally we join the collection with itself (one record shifted), combine all dates and effectively create a whole new set of consecutive periods.
See the sample UDF below for clarity:
Running this function will give the following result:
Now that we have an unigue, effective timeline, we can now join the rest of the data to get a complete overview. See the following example:
In this step the data is assigned to every period is has overlap with. Because some values are longer valid than others, it can happen that one value is assigned to multiple periods, as shown in the graph below:
Because the periods are now neatly aligned, the PIVOT statement will come up with the following result:
This results is a more accurate representation of the data. The NULL fields are gone and the data is now correctly divided amongst the periods.
With this technique you can pivot data with multiple timelines with little effort, it doesn’t matter if you have 3 periods or 3000: this technique will always calculate the effective timeline (albeit a bit slower ;)).
By making use of effective timelines working with EAV is a bit less complex.
And like I said in the beginning of this blog, calculating an effective timeline by merging periods can be useful in other situations as well. For example, if you have multiple periods that have overlap and you need to count the unique days over all periods, having a effective timeline can really speed up the process.