Serving Data Justice when Predicting Recidivism

machine learning
John Nader
machine learning
May 19, 2021
data driven decision making process
Data science, specifically predictive algorithms, has become widely used in everything from retail merchandising to medicine to finance. Applying data science to the United States court system is essential and laudable, especially using predictive analytics to understand recidivism. Typically, recidivism prediction is labor intensive and potentially susceptible to bias. The complement of applied data science can trim the process and bolster goodwill, saving time and money while securing public trust in the process. However, if practitioners are poorly trained or data being used is incomplete or misapplied, potential hazards in this work abound.

AI vs. Human Analysis in Recidivism 

Several new findings in recidivism predictive analysis showed that robust statistical models that leverage machine learning and data science are far more accurate in predicting recidivism than human analysis. In February 2020, the peer-reviewed scientific journal Science Advances published a study titled "The Limits of Human Predictions of Recidivism," with the finding that "people can predict recidivism as well as statistical models if only a few simple predictive factors are specified as inputs." However, in real-world settings, with complex data available, the study demonstrated that robust statistical models predict recidivism far better.


A recent joint study conducted by researchers from Stanford University and the University of California, Berkeley, noted that machine learning and data science when applied ethically, equitably, and given the full complement of data available to parole administration, are more accurate than human analysis in predicting rates of recidivism. 

In applications such as these, biased data is a genuine concern. A New York Times article published on February 6, 2020, titled "An Algorithm That Grants Freedom, or Takes it Away," cited an algorithm used during arraignment hearings in San Jose, California. As the algorithm leveraged only basic data points, groups like ProPublica raised substantial concerns that bias is inherent in these analyses. Questions such as "Which data points are weighted most heavily?" and "Are enough data points being evaluated to make a well-informed and accurate decision to mete justice in a fair and equitable manner?" are vital when addressing potential bias. 

financial data science

Eliminating Biased Data from the Equation

The root issues first lie in the fact data sets are often skewed by prior gender, racial, or geographical biases. Algorithms that track only a handful of data sets can be flawed when broadly applied to cases where nuance can alter the interpretation of the facts. There's also a subset of the public that views data science and algorithmic decisions through Hollywood's science fiction lens, leading some to fear a future like that portrayed in HBO's Westworld series, in which a compassionless AI predicts and controls the fate of every individual worldwide. In general, predictive algorithms are questioned by those who don't understand data science and may see it as being applied to further historical inequalities. Hence, it's easy to understand why watchdog and public policy organizations like the ACLU will question the fairness of these practices at scale.

The bulwark against such concerns is a wary public and watchdog groups working with data science practitioners to establish true data science expertise, with an eye toward real-world application. This is a crucial reasons Data Society's signature training programs work to demystify data science, doing justice to the truth in the data and unseating potential biases. For example, variance inflation factor, which we first teach in our module on regression, is but one way data scientists identify unsuspected patterns between variables – to see if elements expected to be unrelated are, in fact, collinear. 

Data Science Training for Equitable Outcomes

Tailored training in these methods can enable parole and court personnel to use the structured and unstructured datasets available to assess the effectiveness of various treatment, supervision, and monitoring programs funded by the federal probation and pretrial service offices. Following this, they'll be able to identify and isolate meaningful drivers of recidivism and separate them from the chaff, and do so in a way that is both scientifically sound and will withstand critique. 

Furthermore, having the right tools allows local criminal justice personnel to advance their mission to assist the federal courts in the fair administration of justice and bring about long-term positive change, setting a new standard for the fair and equitable application of the law to all citizens. With training and careful application of these approaches, a society powered by data can truly serve justice for all.

financial data science
Tailored training...can enable parole and court personnel to use the structured and unstructured datasets available to assess the effectiveness of various treatment, supervision, and monitoring programs funded by the federal probation and pretrial service offices.
John Nader

John Nader

Chief Operating Officer

Subscribe to our newsletter

cross linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram