Predicting London Crime Rates Using Machine Learning Blog

Predicting London Crime Rates Using Machine Learning

by 7wData
January 10, 2019

Predicting the number and even the type of crimes being committed in the Greater London area each month is no easy task, but here’s how I cracked it, with Dataiku Data Science Studio (DSS).

This blog post was updated in February 2017 to include all 2016 data and make predictions for 2017.

In 2014,London police started trialing software designed by Accentureto identify gang members that were likely to commit violent crimes or reoffend. It began an unprecedented study drawing on five years of data that included previous crime rates and social media activity. Using Big Data to fight crime is clearly not entirely novel, but I wanted to take this further, especially with all the open data about crime that’s out there.

The Greater London police (Metropolitan and London) are doing a great job at fighting crime, providing data, and mapping the results, but what’s interesting is to try to make predictions and not just have a view on the past data.

We might already have an idea of who is likely to commit a crime, but how many crimes would this result in, and what would be the nature of these crimes? This was the kind of information I hoped to predict, so I tried two different predictive models: crime month-by-month at the LSOA level and crime type (whether burglary, bicycle theft, arson, etc.) month-by-month at the LSOA level.

About LSOA: LSOA is a census area containing 1,000 to 3,000 people. Here’s thefull definition from ONS.

So, where to begin? I sourced the data from the open source crime database on theUK police portal, selecting data from 2011 to 2016 pertaining to Greater London (central London and the surrounding metropolitan area).

The main source of data I used is available here - I selected the metropolitan and London areas. I also used UK census information, Point of Interests (POIs), and the geographical locations of police stations.

I enriched the dataset with various open data sources, added the police station coordinates, and added postcodes. I also inputted POIs and the LSOA statistics.

To prepare the dataset for training the machine learning models, I created a geohash based on latitude and longitude coordinates. I cleaned the data for recoding, filling empty values and structuring, which is super simple in Dataiku DSS.

I then created clusters for the LSOA in order to define the criminality profiles and their levels - three clusters and one outlier were found. The different datasets could then be joined.

I built two models, the first for prediction per LSOA per month and the second for prediction per LSOA per month per crime type.

I collected the POIs, cleaned the data, and created a geohash for each latitude/longitude coordinate, and then loaded it into a HPE Vertica database. Then I was ready to collect the crimes from 2011 to 2016 and to clean this data.

Here is an overview of the first data preparation step:

I have developed a geohashing plugin for transforming the XY coordinates into categorical values. If you are not familiar with DSS plugins, you can find out more here - plugins are super useful for packaging a methodology and adding new functions to Dataiku DSS.

Let’s have a first look at the volume of crime data we collected. For this, I created a chart of the number of crimes by year with Dataiku DSS:

I decided to work with crime data from 2012 to 2015 and then predict for 2016.The second step was to predict the number of crimes in 2017 based on the 2016 model. The first pleasant surprise was seeing the number of crimes decreasing.But I was less surprised by the re-categorization of crimes. This is often the case in other industries when, for operational reasons, a category is splitted or merged.

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Predicting London Crime Rates Using Machine Learning

Leave a Reply Cancel reply

Upcoming Events

MarkLogic World | Amsterdam

Knowledge Graph — The Ultimate Center of Excellence

From Text to Value: Pairing Text Analytics and Generative AI

Bringing Data Closer to Decision Makers with Data Fabric

Categories

Tags

You Might Be Interested In

Data Quality: The Heart of Big Data

Is blockchain driving an evolution or a revolution in the energy ecosystem?

Tackling Your Multicloud Strategy in Five Steps

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

IT Engineer

Data Engineer

Applications Developer

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

Predicting London Crime Rates Using Machine Learning

Leave a Reply Cancel reply

Upcoming Events

Categories

Tags

You Might Be Interested In

Recent Jobs

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

To Drive Analytics Adoption
And manage change