Using data to find the solution

In this article, I wanted to look at a scenario that happened on a previous project and focus on the outcomes that came from working back to the core needs of our users as an analyst and providing a solution based on data analysis.

A problem statement rarely gets directly to the team in its raw form. Often, in requirements gathering exercises and workshops, solutions are discussed and presented by the customer, making their way to teams in the form of “requirements”, but is often just a chosen solution.

This event itself can be a great starting point for forming a conversation. When others have created a potential solution to their problem, digging into how this might work can be insightful to understand the problem they're trying to solve, revealing more detailed needs than the initial ask.

More often than not, those who you're gathering the requirements from have been living with the need for some time and may have implemented workarounds (processes you should be aware of) and also done worthwhile analysis to assist you in your journey of discovery.

It is the role of the analyst to consider this when elaborating a need but also where possible fighting the corner to give the delivery team a chance to use their own experience and expertise to come up with innovative solutions for the customer.

When we're operating as analysts, we should always try to understand the root cause of the problems, before we begin working with the team to create solutions for them. Our job is to understand and solve the problems of our customers, not to blindly implement what is proposed to us, this is despite the pressures we might be told such as "we don't have ages to spend thinking about this...".

The Problem

We were developing a new solution for managing accounting differences, originating from business activities and expecting up to 100,000 new records daily. Each of these records would need to be rectified by a particular department/person via a variety of processes. The problem, is how do you get 1 of those 100,000 line items, to sit with the very person or department that can deal with it?

Before this, teams would routinely assign line items to other teams, the other team would then receive a spreadsheet and update that line item up back to the other team… leading to the items bouncing back and forth with no ownership and visibility.

In essence, we were tasked with taking a manual spreadsheet/sharepoint managed process and developing a complete system with UI, Auditing and Reporting processes to help make it easier, more accurate and ultimately quicker.

This article focuses on one mechanism - where an item was loaded which department/team would be defaulted to, so it would go onto their queue to be dealt with. This may seem like a very small part of the process, but it was fundamental to making the process efficient and timely; some teams may have thousands, of tens of thousands of these to deal with on any given day - sending them to the wrong team causes delays.

What do I mean by accounting differences?

If I sell you something, I would expect to receive payment. If I fail to receive the payment, I’d consider that a break (missing payment). If the opposite was to occur, I’d have money but no reason for that money to be in my possession (missing sale). In reconciliation, your end sum must be £0.

If you’re familiar with double-entry ledgers, this means one side of the ledger is missing and you’re out of balance.

In a regulated environment world, this is bad news. And these breaks must be solved within certain periods, otherwise, the company can be fined by the FCA under CASS regulations if you’re dealing with client money.

For this article, I'll refer to these as "breaks"

Understanding the problem

We held a workshop with the business to understand the assignment of breaks. While discussing the mechanics of a break and its origins, we began to see a solution emerging.

A spreadsheet
Updated manually (by whom we didn’t discuss)
It would contain millions of permutations combinations (violating 32 bit Excel – Probably ending up with a CSV)
Each combination would have a corresponding team associated
Somehow it would make it would be used by the system to assign to teams

With these emerging requirements (via a solution) it became clear to me

Humans can decide what type of break should go where
They do this by understanding the breaks information
Currently, teams are sometimes assigning breaks to each other, to pass the responsibility or where there is a general confusion about responsibility
If humans can do this, then what information allows them to do this?

The last question, I decided to take away knowing there were manual methods to theoretically work out who a break should sit with, also knowing these were not absolute and there would be some edge cases. The good news? All of the management to date was manual, with years of historical data available, day by day allowing traceability of break information to resolution and which team it sat with.

Proving the theory

Before analysing any data, I ran the idea by the rest of the team along with the alternative of a spreadsheet… We had confidence that the idea itself was simple and easy to implement as well as achieving consensus that this was worth a shot to do something different.

Be very careful with allowing assumptions that solution A is more expensive than solution B when that arises from outside the development team. Often people will stick with what they know, and that might not directly translate to building a resilient and well-designed software system. In this scenario, the solution we went with was actually easier to define, test and maintain than a CSV upload process (once you consider things like validation handling, and lifetime cost)

We knew we already had the Data Warehouse in place and the corresponding ETL processes defined for loading breaks into our tool for resolution… and the assignment operation fitted nicely into the SSIS package we had yet to build anyway for reporting requirements linked to the total solution.

The question we now needed to answer was, what is the business logic (algorithm) needed to solve this problem? Does one even exist?

Wrangling the data

The next step was finding the correct way to uniquely identify a break. The problem was finding a combination of break attributes that limited the number of permutations while providing a good enough percentage of breaks assigned to teams the first time, that was not re-assigned continuously (due to edge cases).

Simply put, given a combination of N attributes - Did you end up with a 1:1 relation between Break and Resolution Team?

Due to the volumes of data, Excel was not the best tool for carrying this out. Excel in its normal form supports a maximum of 1,048,576 rows, my datasets were far larger than this. You can get around this by using addons like Power Query which also allows relational data modelling, however, Power BI also comes with this baked in.

Utilising Power BI, I was able to simulate which attributes of a break could be used to determine a team it should sit with by simulating groupings and determining which combinations gave a good enough level of uniqueness.

While this may sound involved, all we're doing here is a simple GROUP BY ATTR1, ATTR2, ATTR3, ATTR4, ATTR5. By doing this, and comparing the group with a count of distinct resolution teams, I was able to determine the core % of breaks with an attribute set where the resolution team count == 1

Eventually, I was able to find a composite key of 5 attributes, which theoretically would give us a 70%+ first-time success rate and where the number of permutations (mapping entries) would be kept low enough to not be concerned about serious storage/performance implications over time.

These attributes were carefully chosen to be broad such as transaction type, account type rather than specific attributes like account numbers, as the research indicated that breaks on a single account number may end up with 12 different teams.

Through this data analysis, we were also able to draw out key NFRs

Average daily breaks
Peak daily volumes
Breaks existing at any one time (as they would not always be dealt with in a day)

All of these began feeding into the UI build, influencing decisions such as DB optimisations for Day 1 and performance testing boundaries for the testing team.

Selling the solution

We eventually got an agreement from the business to proceed with the solution, clarifying that the solution would have no ongoing human intervention and that it was cheaper than all the problems that came with the Excel solution.

Selling a solution like this was extremely challenging, ultimately a few issues existed:

Getting the customer to understand the solution and give confidence that it would work when it had never been done or seen before
If the identifying key was too broad, the mapping table record would continuously flip-flop between different teams (during the learning update process). The algorithm would be useless, or way below optimal (60% would have been acceptable by business).
If the identifying key was too narrow (i.e. down to specific identifiers like an account number) then you’d need millions and millions of learning records to reach an acceptable level of learning where you realised the benefits
Getting the customer to understand the overall cost, vs the seen "simple" solution of a CSV

With more time and investment, I’m sure we could have demonstrated the algorithm working on baseline datasets, showing the impacts on efficiency when swapping in/out different key makeups, but this would have required significant amounts of modelling which at the time I'm not sure I could have done.

Going live

Before go-live, we were able to produce an initial mapping load from the data analysis. This meant we could go live with some training loaded in, the worst case being that if it was wrong, it would self-correct.

Over the next few months, we measured the accuracy and realised we had achieved the goal. During the design phase, we worked in extra statistics on the mapping records, incrementing counts for reassignments and rule usages to measure the effectiveness of the system.

We found that over 90% of breaks in the business, were falling into the default rulesets, or being captured by the learning meaning the next time a similar break dropped in, it went to the right team!

As a side benefit, the data analysis that drove testing requirements were put to the test within the first few weeks - we had our worst-case scenario of 200,000 breaks in a single load. Because of the fantastic work of the development and test team, the system held without an issue, and the business gained instant confidence we had built a stable product.

Years on, the system and algorithm are still working as intended. One of the key things no one will measure is not only the benefit over the manual system (in terms of accuracy), but the time saved in not having staff maintain the assignments and any inevitable support that would be needed (due to human involvement).

All in all, by pushing back on our customer given solutions, we had the freedom to build something great and genuinely innovative. The customer got something that solved their problem, and we reduced the need for them to maintain it going forward (something often missed).

Providing solutions should be a team effort (the whole team) and when we work together, we can do some great things with all the experience we bring from different walks of life.

In Summary

Be careful of using single views or opinions when deciding how to solve a problem
Be open to hearing potential solutions offered from the business, and use them as a vehicle to gather more understanding of the problem
Don't be afraid to question the norm or reuse of established patterns if there are better and newer ways to solve problems
Where you have data available, try and tell the story with your data and test any assumptions or requirements
By using data, you can tease out relevant NFRs for your product, far beyond what users can tell you. Be very cautious of using NFRs that are not justified by data (like performance requirements)
Don't be afraid to not give customers the solution they've asked for; focus on solving the problem (accept that sometimes the customer's solution might be the most optimal)
Be able to prove the solution you ship achieves the desired goals (by defining some measurements upfront)
Assess solutions carefully if you're introducing manual processes into the operation of a system
Consider the full lifetime cost of a system and solution (initial development cost, ongoing human cost, support requirements)