Predicting London Crime Rates Using Machine Learning

Predicting London Crime Rates Using Machine Learning

Predicting the number and even the type of crimes being committed in the Greater London area each month is no easy task, but here’s how I cracked it, with Dataiku Data Science Studio (DSS).

This blog post was updated in February 2017 to include all 2016 data and make predictions for 2017. 

In 2014,London police started trialing software designed by Accentureto identify gang members that were likely to commit violent crimes or reoffend. It began an unprecedented study drawing on five years of data that included previous crime rates and social media activity. Using Big Data to fight crime is clearly not entirely novel, but I wanted to take this further, especially with all the open data about crime that’s out there.

The Greater London police (Metropolitan and London) are doing a great job at fighting crime, providing data, and mapping the results, but what’s interesting is to try to make predictions and not just have a view on the past data.

We might already have an idea of who is likely to commit a crime, but how many crimes would this result in, and what would be the nature of these crimes? This was the kind of information I hoped to predict, so I tried two different predictive models: crime month-by-month at the LSOA level and crime type (whether burglary, bicycle theft, arson, etc.) month-by-month at the LSOA level.

About LSOA: LSOA is a census area containing 1,000 to 3,000 people. Here’s thefull definition from ONS.

So, where to begin? I sourced the data from the open source crime database on theUK police portal, selecting data from 2011 to 2016 pertaining to Greater London (central London and the surrounding metropolitan area).

The main source of data I used is available here - I selected the metropolitan and London areas. I also used UK census information, Point of Interests (POIs), and the geographical locations of police stations.

I enriched the dataset with various open data sources, added the police station coordinates, and added postcodes. I also inputted POIs and the LSOA statistics.

To prepare the dataset for training the machine learning models, I created a geohash based on latitude and longitude coordinates. I cleaned the data for recoding, filling empty values and structuring, which is super simple in Dataiku DSS.

I then created clusters for the LSOA in order to define the criminality profiles and their levels - three clusters and one outlier were found. The different datasets could then be joined.

I built two models, the first for prediction per LSOA per month and the second for prediction per LSOA per month per crime type.

I collected the POIs, cleaned the data, and created a geohash for each latitude/longitude coordinate, and then loaded it into a HPE Vertica database. Then I was ready to collect the crimes from 2011 to 2016 and to clean this data.

Here is an overview of the first data preparation step:

I have developed a geohashing plugin for transforming the XY coordinates into categorical values. If you are not familiar with DSS plugins, you can find out more here - plugins are super useful for packaging a methodology and adding new functions to Dataiku DSS.

Let’s have a first look at the volume of crime data we collected. For this, I created a chart of the number of crimes by year with Dataiku DSS:

I decided to work with crime data from 2012 to 2015 and then predict for 2016.The second step was to predict the number of crimes in 2017 based on the 2016 model. The first pleasant surprise was seeing the number of crimes decreasing.But I was less surprised by the re-categorization of crimes. This is often the case in other industries when, for operational reasons, a category is splitted or merged.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

Data Quality: The Heart of Big Data

19 Mar, 2016

After last week’s post on promise and perils of big data, I wanted to pursue the discussion further around data …

Read more

Is blockchain driving an evolution or a revolution in the energy ecosystem?

9 Feb, 2019

The World Energy Council in partnership with PwC has interviewed 39 top level management energy leaders to find an answer …

Read more

Tackling Your Multicloud Strategy in Five Steps

7 Jun, 2021

The customers I speak with these days are typically dealing with two distinct yet related challenges. First, they’re having to …

Read more

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

Remote (United States (Nationwide))

9 May, 2024

Read More

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.