The Critical Role of Data Science in Detecting Potential Financial Crime

machine learning
    Scott Shimp            
machine learning
         April 13, 2022           
machine learning                  Blog
learn machine learning

Fraud and financial crime compliance are among the most prominent public concern areas outside of traditional financial risk for a financial institution. Companies devote substantial resources to detect and deter fraud to avoid financial losses and reputational damage. They must also maintain programs to address potential financial crimes, such as money laundering, terrorist financing, and sanctions violations. Moreover, these programs must be sufficiently robust to maintain compliance with federal legislation, rules, and international standards.

Like every business function, fraud and financial crime compliance programs rely on a combination of people, processes, and technology, with information and data flow integral to their success. As we advocated in our post on rethinking risk management, financial institutions should ensure that teams responsible for risk management are equipped with data knowledge and tools commensurate with their roles. Fraud and financial crime compliance teams are no exception, so let’s look at some of the tools and techniques relevant to these programs and their stakeholders.

Inputs and Outputs of Transaction Monitoring

learn machine learning

Under the Bank Secrecy Act (BSA) and subsequent legislation in the United States, banks are obligated to submit reports of suspicious activity to a federal database. These reports provide law enforcement with information they may use to pursue cases of potential criminal activity. However, regulators do not explicitly define what makes financial transactions “suspicious.” Although standards bodies and experts provide guidance, companies must determine what “suspicious” looks like for their business and products.

In concrete terms, this requires designing appropriate rules, models, and reporting to analyze transaction details such as volume, amount, type, and geography, along with thresholds for identifying potentially suspicious activity for review. For example, a company may want to flag transactions, generating an “alert” if a customer’s transactions exceed thresholds for volume or amount in a certain period of time. Once these rules are established, companies must implement the transaction monitoring systems with the necessary data flows and teams of investigators to review the results.

Financial institutions deploy commercially available solutions to monitor their transactions in many cases. However, off-the-shelf solutions are a poor fit for detecting suspicious activity in the context of a specific business or identifying novel forms of suspicious activity. In addition, the use of a commercially available tool does not eliminate accountability for ensuring good results; therefore, customizing and augmenting these solutions is critical to success. In particular, teams setting up and managing these systems must strike a balance between the number of transactions flagged for review that end up not being considered suspicious and the number of transactions that were not flagged for review that would have been considered suspicious, or between false positives and false negatives.

Data Science Methods for Optimization

This task of optimization requires financial institutions to bring a variety of data science skills and methods to bear, both to tailor the commercial solutions and build their tools for monitoring and reporting where appropriate. For example, because different types of customers may exhibit fundamentally different behaviors, it is vital to augment traditional sampling and testing with a segmentation or clustering model to set thresholds for monitoring rules. This approach uses unsupervised machine learning across multiple variables to identify distinct clusters of activity.

A monitoring outcome may also depend on determining that a set of transactions was executed by the same individual, even if the names or other transaction details are different. Good data practitioners know they have a number of options for measuring the similarity of records, including text. Complications such as incomplete names, nicknames, or variations in transliteration can lead to missed matches, while common names can lead to large numbers of incorrect matches. Setting an appropriate threshold for the selected similarity metric requires understanding the data being matched, the method of calculating the similarity, and the ensuing process of reviewing potential matches.

learn machine learning

To assist with triage of a large volume of alerts from commercial tools, a program may also develop predictive models that indicate the probability of a particular alert from the monitoring system leading to a productive investigation. With the right team in place, a company can build machine learning models that directly generate tailored monitoring alerts, potentially providing a solid complement to even properly customized commercial solutions.

As the variety of these methods suggests, optimization of transaction monitoring systems requires a depth of knowledge in data science that domain experts in BSA/AML programs may not have. For this reason, many financial institutions have developed internal teams or engaged with trusted external partners with dedicated data and analytics skillsets. However, regardless of the team structure, both data practitioners and investigators need to be able to collaborate to produce good results and communicate outcomes and trends effectively.


A robust fraud and financial crime compliance program requires leadership, domain experts, and data practitioners with the skills and common vocabulary necessary to implement robust systems and make data-driven risk-based decisions. Data Society can help programs reach the next level of maturity in data science practices with instructor-led training and capstone projects focused on real compliance challenges. In addition, Data Society provides custom data science and AI/ML solutions to enhance risk management capabilities.


camelsback powered by data society logo

An AI Engine for Continuous Risk Assessment in Financial Services

Robust risk assessment relies on good data, effective tools, and responsible data science practices. That's why Data Society created Camelsback, an AI solution for continuous risk assessment in financial services. Camelsback is based on our award-winning risk evaluation framework developed for the FDIC’s Resilience Tech Sprint.

American Banker logofditech logo

Subscribe to our newsletter

Data Society provides customized, industry-tailored data science training solutions—partnering with organizations to educate, equip, and empower their workforce with the skills to achieve their goals and expand their impact.

cross linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram