Individual Submission Summary
Share...

Direct link:

Developing Modern Criminal Offense Classification Techniques

Fri, Nov 19, 9:30 to 10:50am, Burnham 1, 7th Floor

Abstract

The purpose of this paper is to design an open-source library for multi-label text classification which determines a standardized charge code for a given offense description. Text-tool Offense Classification (TOC) uses a hierarchical classification framework in which a multilayer perceptron classifier is trained for each local node. The dataset used to train and test the model contains 229,022 unique offense descriptions from 12 states and was provided by Measures for Justice (MFJ), a non-profit organization which collects, standardizes, and makes publicly available, criminal justice data at the county-level. The data taxonomy used to create the hierarchy of classifiers is predefined in MFJ’s offense classification schema as offense type code, offense category code, and charge code. Employing cluster-based sampling on originating state, offense description, and charge code to generate training data showed little improvement in offense type code classification when compared to sampling only on offense description and charge code. However, the former method to generate training data showed significant improvement in offense category code and thereby improvement in charge code classification due to the hierarchical nature of the model. As the state coverage of training data improves, TOC provides a scalable approach to analyze textual crime data for the public.

Authors