There are several scenarios where a developer may want to convert transactional history data into snapshot data. Sometimes an existing transactional solution has been storing transaction history data for years, requiring a migration to snapshot data for reporting purposes. Some customers may be paying a recurring service fee for disk storage and might want to keep disk space consumption down to a minimum (like in a cloud solution). (These are just a few of many possible scenarios)
I recently had the opportunity to convert a cloud-based table containing transactional history data into a snapshot dataset to better support a running total measurement. I was curious to find out what it would take to convert the history data to snapshot data using SQL. I couldn’t find an article with a clean solution, so I decided to create a prototype and write this blog to document the process.
This article explains the challenges and solutions faced with converting transactional history data into snapshot data. The solution presented had three main challenges to overcome. The first challenge involved multiple change records being captured daily for an Entity. The second challenge involved multiple records for an Entity with the same timestamp. The final challenge involved addressing gaps in dates when the data didn’t change. These needed to be filled so snapshot reporting could be performed (this transactional history data only captured changes when they occurred). There were gaps over one day, and some over one year, depending on the data.
The First Challenge
Multiple change records were logged each day for an attribute on a given Opportunity entity because CRM auditing was turned on in a Dynamics CRM environment. CRM Auditing will store a record every time an attribute is changed. Oftentimes, a CRM user would update an attribute, and then update it again seconds later. the snapshot only needed the last record per day.
-Using SQL Partitioning, partition the table by OpportunityID and Date and create a Row Number field.
-Order the Partition by Date in descending order (this allows us to make an assumption. Any record with a value of 1 will be the last record of the day.)
This solution assumes the date precision in the date field is great enough to allow every record to be unique when combined with an Opportunity. This worked for the majority of the data. However, the date column in this scenario didn’t always contain values with enough precision to ensure uniqueness on every record. A temporary solution was needed for this small dataset to move forward.
The Second Challenge
The CRM auditing records didn’t have a datetime stamp with enough precision to always be unique when combined with the Opportunity. Data analysis found that a very small percent of data (< 1/100,000 records) had the exact same datetime, but a different value. (One record was chosen for the purposes of this prototype.)
-Partition the table again by Opportunity and Date and create a Row Number field.
This generates a unique number, which will allow you to apply a longer term solution later.
The Third Challenge
There are gaps in transactional history data. The CRM auditing history data only logged records when the user made changes to the Opportunity. However, having gaps in the dataset makes it difficult to perform snapshot reporting. There were periods of time where users did not change the record. However, a specific measurement in a tabular model might require a snapshot dataset for all “open” records to show progression day over day. Additionally, I needed to know when to start & stop generating records across multiple days.
-Determine the start and stop of the gap and fill in the missing days by Partitioning and self joins.
The last record in the series might require some special filtering logic in your scenario to know when to stop filling in the gaps once the Opportunity is Closed.
Snapshot datasets allow reporting users to perform snapshot reporting on any given day, which allows them to see the state of the data back in time. This article helps frame up three challenges and solutions for converting transactional history data to snapshot data. One area to consider when converting to snapshot data is the granularity of the data you need. Additionally, data analysis should be performed to ensure data can be uniquely identified, makes sense and is accurately showing the changes overtime. Finally, the gaps should be filled in for days between changes when the Opportunity was open.
Please leave your questions/comments below! In my next blog post, I’ll provide sample code to further illustrate the technical details behind this solution to convert Transactional history to a snapshot using SQL set-based operations.