This begs the question: is there a way to predict which prisoners are more likely to become repeat offenders?
Recidivism prediction is important because it has significant applications in terms of allocating social services, policy-making, sentencing, probation and bail. From judges to social workers, all parties involved need to be able to work together and understand the risk posed by various individuals.
And if we can more accurately determine how likely someone who has just been released from prison is to commit another crime within a few years, we could potentially reduce crime rates and better allocate the money we spend on social services.
The criminal justice system has been using forecasting to make decisions since the 1920s, when parole boards used a mixture of factors such as age, race, prior offense history and school grades to determine whether an inmate should be paroled or not.
Much has changed since then, both in terms of the sheer quantity and quality of data at our fingertips and the ability to process all of that information quickly using machine learning methods that can produce accurate predictive models for recidivism. Machine learning methods are a form of artificial intelligence. They are computer algorithms that have the ability to learn over time, or in this case make better predictions as they acquire more data.
While these methods have a long history, there has been controversy as to whether they need to be very complicated with many inputs to be accurate or whether simple yet accurate “rules of thumb” exist for many prediction problems. Judges and prosecutors are less inclined to use a complicated (and incomprehensible) black box predictive model in which they can’t understand how the criminal history variables are used to predict recidivism.
In current work with colleagues Jiaming Zeng and Berk Ustun, we found that simple, transparent yet equally accurate predictive models often do exist for predicting recidivism. Such models would be more usable and defensible for all decision-making parties, and are created by machine-learning methods in a completely automated way using data.
As a data scientist, my aim is to build predictive models that assist people in making decisions, particularly in areas that are critical for the the smooth operation of society such as energy grid reliability, health care and computational criminology. Using statistical models such as those intended to predict recidivism, we can drastically improve the functioning of how we live and work.
Today most judges are using rudimentary, ad hoc models for predicting whether someone before them is likely to be a recidivist.
Essentially, they use a score sheet during sentencing with a standard set of risk assessment tools. It’s a combination of people making the (manual) choice of which risk factors to include and an ad hoc optimization scheme for determining what score someone receives for each factor.
As a society, we need to do more to optimize these processes. We don’t want to make poor decisions – decisions that literally are often a matter of life and death. We absolutely need to optimize how our social services are allocated to have the most impact in decreasing our recidivism rates, which, as you know from the beginning of this article, are currently abysmal.
To create better scoring systems, we used the largest publicly available data set on recidivism. Our data set was compiled as part of a national study, and contained criminal histories from over 33,700 individuals in 15 states released in the same year. These individuals constituted over two-thirds of the prisoners released nationwide that year.
We found several advantages of our models on these data. First, they are accurate simply because they are based on large amounts of data. Second, they are simple, understandable, accurate and customizable. The models are also small enough that they each fit on an index card. That is, these are not complicated formulas. A judge could calculate the prediction of recidivism for an individual in his or her head, without a computer. They need only to add up the “points” for each risk factor (eg, three points for one risk factor, five points for another factor, etc).
The models are so simple-looking that they appear as if a person made them up, but that’s not how they were developed. In fact, behind the scene is a large data set, a sophisticated machine learning method and a lot of computational time on a powerful computer.
Because they are generated automatically, we were able to build a separate predictive model for each type of crime (violence, property, drugs, etc). Furthermore, the machine learning tools can be applied to data from different local areas, with differing populations; each jurisdiction could create its own models, which could potentially make the recidivism predictions much more accurate. Since the current models in use cannot be customized to the jurisdiction, they are “one size fits all” models, which might not be as relevant for some jurisdictions as much as others. By drilling down to the local level, the tools can become increasingly accurate.
The machine learning models work by assigning points for various factors. If the points add up to above a certain threshold determined by the prisoner’s history, then the individual is likely to commit another crime within three years.
Our basic model used to predict arrest for any offense is a good example. If the individual was younger than 24 at the time of release, two points are assigned (younger people are more likely to commit violent crime). If there are at least five prior arrests, two points are assigned. If the person was over 40 when he or she was first confined, two points are deducted.
When all the points are tallied, if they add up to one or more, then the individual is likely to be arrested within three years. This is a very simple model, but we have found that even when we use state-of-the-art machine learning methods that use all of the features in the database, these methods do not perform any better than our simple model.
The variables and points are determined entirely by the machine learning algorithm applied to the data and not by hand. Some of these models are going to seem obvious to judges or prosecutors, but that’s good – it means these models will bring everyone onto the same page. Hopefully, it will make it more difficult to make a bad decision.
That said, there are definitely weaknesses in our approach. In particular, our data set could be improved with more detail about the prisoners. However, since the data we used are publicly available and our software will also be public, people will be able to repeat and build on our work, and to use our code on their own data.
It’s also important to note that these models can be helpful or dangerous, depending on how you use them. This isn’t like Minority Report, where you are convicting someone of a specific crime they haven’t committed yet. Rather, these models simply quantify the fact that people who committed more crimes in the past are more likely to run afoul of the law in the future.
However, if the models aren’t used for the right purpose, then there is the risk of inadvertently using them for discriminatory punishment. For instance, you wouldn’t want to use race as a factor for a model that determines sentencing; we don’t want to punish someone longer because of their race.
My team chose not to include any explicit socio-demographic factors, and we specifically excluded race as a variable. We did test how much more accurate the model would be by including race, but we found that it was not particularly useful. The models were almost equally accurate with and without including race as an explicit factor.
There is no reason for people to design models by hand anymore because automated ones can be simpler, more transparent, easier to use and just as accurate. They can ensure that decisions are more reliable and useful, preserving our resources for the people who need them most.