Outlier detection in the finance data
This is the example of failed data science project.
What Is an Outlier in Accounting?
We all intuitively understand what an outlier is when looking at a chart: something that stands out, something extraordinary. However, defining an outlier formally, and therefore determining how to identify one, is far from simple.
In accounting, opinions vary widely:
- “There are no outliers in our ledger: we can explain all deviations, so nothing is truly unusual.”
- “Our books are full of errors and unexplained transactions.”
Both statements can describe the same company and the same ledger, depending purely on one’s perspective.
Please note, there are essentially two different outlier-related tasks:
- Static detection, or finding outliers in the historical ledger. This is a relatively stable environment where many anomalies can be explained or corrected retrospectively.
- Dynamic (online) detection, or identifying anomalies in real time. Here, we analyse each new transaction as it arrives, comparing it to historical data without knowing what comes next.
How to Identify an Outlier?
Our project was focused on static detection - searching for outliers in a historical financial dataset. While there are many formal definitions, let’s keep it simple and look at two common ones:
- An outlier is any value that falls outside the 3-sigma range (three standard deviations from the mean) for its category.
- Alternatively, it can be defined using asymmetric boundaries based on the interquartile range.
In both cases, it’s often helpful to filter the data first, removing linear trends and seasonal patterns (for example, yearly cycles).
Analysts may also apply more advanced methods, such as neural networks or other machine learning models, to identify outliers.
(The illustration above came from https://medium.com/data-science/the-ultimate-guide-to-finding-outliers-in-your-time-series-data-part-3-0ff73ce28ca3)
These are more or less formal approaches, and we can switch to the project itself now. We worked with financial data from SAP, aggregated to the minimum level required for detailed reporting, typically monthly summaries across business units, geographies, and financial statement items.
Our simple procedure scanned all <business> + <geo> + <FS item> groups to flag potential outliers. We built a convenient visualization tool to display, rank, and annotate these anomalies, and even suggest corrections.
Outliers Identified. What’s Next?
And here’s where the project stumbled. Despite multiple training sessions and enthusiastic demos, no one in Finance adopted it. Everyone said, “Wow, that’s interesting!”, but no one used it. No one wants even to try.
Why did it fail? Because the project was initiated solely by the Data Science team, without alignment with Finance from the start. The impressive number of detected outliers fascinated us, but the Finance team saw no actionable value.
