The Importance of Data Quality in Financial Services

machine learning
Data Society          
machine learning
  November 2022             
machine learning         Blog
financial data science

Financial institutions no doubt glean tremendous benefits from their data resources. However, the business value they derive from data can be limited—or even compromised—if they’re using bad data. Poor data quality is often the source of problems ranging from inefficiency to misguided decision-making. In addition, finance companies can suffer from regulatory compliance failures due to bad data. This potential pitfall makes data quality crucial in the financial services industry. However, with skillful data governance and quality controls, banks and other finance companies can confidently leverage transformative data-driven tools.

Bad Data’s Costly Toll

data engineering

Financial institutions are gifted with—and bedeviled by—a breathtaking volume of data from a broad range of sources. While the potential applications for these data reserves abound, they inevitably come with several quality issues, such as:

  • Incomplete or inconsistent data.
  • Duplicate records.
  • Missing values.
  • Data decay.
  • Ambiguous data. 

These critical flaws can yield several unfavorable outcomes. For example, historical data with obsolete customer information can negatively impact customer relations. Also, aside from the time drain associated with manually correcting these errors, inaccurate data that goes undetected can spawn problems down the road, leading to:

  • Flawed analytics.
  • Unreliable forecasting and risk assessment.
  • Misguided decision-making.
  • Erroneous reporting.
  • Reputational damage due to public-facing errors. 

Additionally, and perhaps most costly, poor data quality can erode internal stakeholders’ trust in transformative technologies. Therefore, as AI/ML technologies become increasingly imperative for financial services companies to remain competitive, it is critical for organizations to have safeguards in place to guarantee that the data they use to train models and perform analytics produces trustworthy information.

According to Gartner’s 2017 Data Quality Market Survey, organizations attributed an estimated $15 trillion annual losses to bad data. Additionally, 77 percent of IT Decision Makers surveyed by Vanson Bourne for SnapLogic reported that they don’t entirely trust their organizations’ data, and 91 percent believe work is needed to improve their organizations’ data quality. 

What Impacts Data Quality?

Data quality is measured by completeness, consistency, accuracy, timeliness, uniqueness, and validity. Common sources of poor data quality include:

Erroneous Data Entry - Inaccurate or misplaced values introduced when a record is first created can doom data quality from the start. However, if these erroneous records are abundant and left unchecked, they can amplify inaccuracies and quality issues as they move through the data pipeline and across the organization.

Data Migration and Integration - Data migration and integration initiatives commonly present increased threats to data integrity. Mismatched or rearranged fields and data loss are among the data quality issues that can arise when merging or migrating data.

data engineering

Data Silos - Isolating data sets from organization-wide maintenance and access results in data silos, leading to incomplete, outdated, and inconsistent data and even increasing exposure to security threats. Silos also present risks associated with poorly managed handoffs between teams and departments. 

Poorly Maintained Data Repositories - Failure to monitor and update data leaves companies vulnerable to errors and inaccuracies related to degraded or obsolete data. Data maintenance failures can also cause healthy data lakes to degenerate into the unorganized morasses known as data swamps. While bad data going into the lake can taint the data supply from the start, insufficient data monitoring compounds data quality problems as raw data deteriorates and accumulates. Therefore, the data lake failure rate attributed to substandard data is approximately 85 percent.  

The Case for Prioritizing Data Quality in the Financial Services Industry

data engineering

It makes sense that increased reliance on data-driven technologies, such as AI/ML-enabled tools, creates a more urgent need for reliable data. Moreover, when data informs operational strategy and trains the models that drive such critical functions as lending decisions, predictive analytics, and regulatory compliance, it becomes crucial for organizations to prioritize data quality and track data drift that could impact model accuracy. Still, O’Reilly’s AI Adoption in the Enterprise 2022 survey notes that only 49 percent of respondents with AI products reported having a governance plan in place for AI projects.

Before financial institutions realize data science’s full transformative potential, they must reckon with the data quality issue. The most effective approach to addressing this challenge is establishing a solid data governance strategy to oversee policies and procedures for continuously monitoring and maintaining data quality and compliance. Such protocols offer the reassurance many financial services companies need to develop and deploy innovative data-driven tools. In addition, with data science training that promotes organization-wide data literacy and data governance awareness among decision-makers, financial institutions can meet the future with confidence, trusting that the data-driven insights that guide them are based on complete, consistent, timely, and accurate data.  


Subscribe to our newsletter

cross linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram