Predict mass shootings

Using the mass shooting data from Mother Jones website, I tried to forecast the number of mass shoottings and total fatalities in the rest of 2018. Some of the R scripts are from this website. Please let me know if I there is any error. I am not an expert on time series analysis. Twitter @StevenXGe

In recent years, mass shootings are becomming more frequent and more deadly. The models suggest for the rest of 2018, there probably be about 6 more shootings, killing about 59 people in the U.S.

#Data

url = "https://docs.google.com/spreadsheet/pub?key=0AswaDV9q95oZdG5fVGJTS25GQXhSTDFpZXE0RHhUdkE&output=csv"
x = read.csv(url)
colnames(x)
##  [1] "case"                               
##  [2] "location"                           
##  [3] "date"                               
##  [4] "year"                               
##  [5] "summary"                            
##  [6] "Fatalities"                         
##  [7] "Injured"                            
##  [8] "total_victims"                      
##  [9] "Venue"                              
## [10] "Prior.signs.of.mental.health.issues"
## [11] "Mental.health...details"            
## [12] "Weapons.obtained.legally"           
## [13] "Where.obtained"                     
## [14] "Type.of.weapons"                    
## [15] "Weapon.details"                     
## [16] "Race"                               
## [17] "Gender"                             
## [18] "Sources"                            
## [19] "Mental.Health.Sources"              
## [20] "latitude"                           
## [21] "longitude"                          
## [22] "Type"

#Data cleaning

x = x[,c("date","year","Fatalities","Race","Gender","Type")]
x$Race = gsub("white","White",as.character(x$Race) )
x$Race = as.factor(x$Race)
summary(x)
##         date         year        Fatalities          Race   
##           : 2   Min.   :1982   Min.   : 3.000   White  :56  
##  1/17/1989: 1   1st Qu.:1998   1st Qu.: 5.000   black  : 9  
##  1/28/18  : 1   Median :2009   Median : 6.000   Asian  : 8  
##  1/30/2006: 1   Mean   :2006   Mean   : 8.313   Black  : 7  
##  1/6/2017 : 1   3rd Qu.:2015   3rd Qu.: 8.000   Latino : 7  
##  1/8/2011 : 1   Max.   :2018   Max.   :58.000   Other  : 5  
##  (Other)  :94   NA's   :2      NA's   :2        (Other): 9  
##            Gender      Type   
##               : 2        : 2  
##  Female       : 2   Mass :85  
##  M            :26   Spree:14  
##  Male         :70             
##  Male & Female: 1             
##                               
## 
x = subset(x,x$year != 2018)
x$date = as.Date(x$date,"%m/%d/%Y")

The number of mass shootings per year

There is no mass shooting in 1983, 1985, and 2002.

yearly = table(x$year)
yearly = c(1,0, yearly)
names(yearly)[1:4]= 1982:1985
yearly[3]=2
yearly[4]=0 
yearly = c(yearly[1:20],0, yearly[21:35])
names(yearly)[21] = "2002"
barplot(yearly)

Clearly, the frequency of mass shootings is increasing in the last 6 years.

Time series model

series = ts(yearly,start=c(1982)) 
yforecast = HoltWinters(series, beta=FALSE, gamma=FALSE)
yforecast
## Holt-Winters exponential smoothing without trend and without seasonal component.
## 
## Call:
## HoltWinters(x = series, beta = FALSE, gamma = FALSE)
## 
## Smoothing parameters:
##  alpha: 0.5024872
##  beta : FALSE
##  gamma: FALSE
## 
## Coefficients:
##       [,1]
## a 8.446514
plot(yforecast, main="Number of mass shootings ")

The red line is probably the model.

Making forecast

# install.packages("forecast")
library(forecast)
shottings = forecast(yforecast, h = 1)
shottings
##      Point Forecast    Lo 80   Hi 80    Lo 95    Hi 95
## 2018       8.446514 6.239526 10.6535 5.071217 11.82181

The model predicts there would be about 8.4 mass shootings this year. Since 2 already happened. There probably would be 6 incidences of mass shooting for the rest of 2018. The 95% confidence interval is between 5 and 11. So it is very likely at least 3 more shootings will occur in 2018.

Total fatalities in 2018

Prepare data

tem = aggregate(x$Fatalities, by=list(x$year), FUN=sum)
tem2 = rbind( tem[1,], c(1983,0), tem[2,],c(1985,0),tem[3:18,], c(2002, 0), tem[19:33, ] ) 
fatalities = tem2[,2]
names(fatalities) = tem2[,1]
barplot(fatalities, las=2)

This data is noisy, but we still see an uptake in total fatalities in the last 10 years.

series = ts(fatalities,start=c(1982))
yforecast = HoltWinters(series, beta=FALSE, gamma=FALSE)
yforecast
## Holt-Winters exponential smoothing without trend and without seasonal component.
## 
## Call:
## HoltWinters(x = series, beta = FALSE, gamma = FALSE)
## 
## Smoothing parameters:
##  alpha: 0.4323295
##  beta : FALSE
##  gamma: FALSE
## 
## Coefficients:
##       [,1]
## a 79.90564
plot(yforecast, main="Total fatalities from mass shooting")

shottings = forecast(yforecast, h = 1)
shottings
##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2018       79.90564 53.81795 105.9933 40.00795 119.8033

So this model predicts that about 80 people will be killed in mass shootings in 2018. Since there were 21 in Florida and Pennsylvania already. The rest of 2018 will see about 59 people getting killed. I hope these are not going to happen in schools.

comments powered by Disqus