Introduction

We investigate frequency-severity modeling using an insurance automobile claims dataset studied in Ferreira and Minikel (2010, 2012). These data, made public by the Massachusetts Executive Office of Energy and Environmental Affairs (EOEEA), summarizes automobile insurance experience from the state of Massachusetts in year 2006. The dataset consists of approximately 3.25 million policies representing over half a billion dollars of claims.

Because the dataset represents experience from several insurance carriers, it is not surprising that the amount of policyholder information is less than typically used by large carriers that employ advanced analytic techniques. Nonetheless, we do have basic ratemaking information that is common to all carriers, including primary driver characteristics and territory groupings. At the vehicle level, we also have mileage driven in a year, the focus of the Ferreira and Minikel study.

Data and Summary Statistics

From the Ferreira and Minikel (2010, 2012) data, we drew a random sample of 100,000 policyholders for our analysis. With the following code, you will see after importing the data and looking it over, many values of “TotLoss” were negative. We followed the recommendation of Ferreira and Minikel and forced all losses of less than 50 to be equal to 0.

R Code to Bring in Data
Table 1: Number of Policies by Rating Group and Territory
Terr 1 Terr 2 Terr 3 Terr 4 Terr 5 Terr 6
A 13905 14603 8600 15609 14722 9177
B 293 268 153 276 183 96
I 706 685 415 627 549 471
M 700 700 433 830 814 713
S 2806 3104 1644 2958 2653 1307

The code also creates a table that shows the distribution of number of policies by rating group and territory. The distribution of policies is reasonably level across territories. In contrast, the distribution by rating group is more uneven; for example, over three quarters of the policies are from the “Adult” group. The sparsest cell is business drivers in territory 6; the most heavily populated cell is territory 4 adult drivers.

For this study, an insurance claim is from only bodily injury, property damage liability, and personal injury protection coverages. These are the compulsory, and thus fairly uniform, types of insurance coverages in Massachusetts; it is critical to have uniformity in reporting standards in an intercompany study such as in Ferreira and Minikel (2010, 2012). As a result, in Table 2, the averages of the loss might appear to be lower than in other studies. This is because the total is over the three compulsory coverages and does not represent, for example, losses from the commonly available (and costlier) comprehensive coverage. The average total loss in Table 1 is 127.48. We also see important differences by rating group, where average losses for inexperienced youthful drivers are over 3 times greater than adult drivers. We can think of this total loss as a pure premium.

Table 2 shows that the average claim number is 4.3%. Specifically, for the 100,000 policies, there were 95,875 that had zero claims, 3,942 that had one claim, 176 that had two claims, and 7 that had three claims. The table also reports important difference by rating group, where the average number of losses for inexperienced youthful drivers are about 2.5 times greater than adult drivers.

Table 2 also summarizes information on the earned exposure, defined here as the amount of time that the policy was in force in the study, and annual mileage. Annual mileage was estimated by Ferreira and Minikel (2010,2012) based on Massachusetts’ Commonwealth’s Registry of Motor Vehicles mandatory annual safety checks, combined with automobile information from the vehicle identification number (commonly known using the acronym VIN). Interestingly, Table 2 shows that only about 90% of our data possess valid information about the number of miles driven, so that about 10% are missing.

Here is the R code used to produce Tables 2 and 3.

R Code for Tables 2 and 3
Table 2: Averages by Rating Group
Rating Group Number Total Loss Claim Number Earned Exposure Annual Miles
A 76616 115.95 0.040 0.871 12527.019
B 1269 159.67 0.055 0.894 14405.950
I 3453 354.68 0.099 0.764 12770.241
M 4190 187.27 0.065 0.800 13478.258
S 14472 114.14 0.038 0.914 7611.064
Table 3: Averages by Territory
Territory Number Total Loss Claim Number Earned Exposure Annual Miles
1 18410 98.24 0.032 0.882 12488.73
2 19360 94.02 0.036 0.876 12323.96
3 11245 112.21 0.037 0.870 12399.59
4 20300 126.70 0.044 0.875 11961.74
5 18921 155.62 0.051 0.866 10955.85
6 11764 198.95 0.066 0.842 10783.49
Note: Both models use logarithmic exposure as an offset.
Estimated negative binomial dispersion parameter is 2.128.
Reference levels are ‘A’ for Rating Group and ‘6’ for Territory

Table 3 provides similar information but by territory. Here, we see that the average total loss and number of claims for territory 6 is about twice that for territory 1.

There 4,125 (= 100,000 - 95,875) policies with losses. To get a better handle on claim sizes, Figure 1 provides smooth histograms of the loss distribution. The left-hand panel is in the original (dollar) units, indicating a distribution that is right-skewed. The right-hand panel shows the same distribution on a logarithmic scale where we see a more symmetric behavior.

Do our rating factors affect claim size? To get some insights into this question, Figure 2 shows the logarithmic loss distribution by each factor. The left-hand panel shows the distribution by rating group, the right-hand panel shows the distribution by territory. Neither figure suggests that the rating factors have a strong influence on the size distribution.

Here is the R code used to Figures 1 and 2.

R Code for Figures 1 and 2
Loss Distribution. The left-hand panel shows the distribution of loss, the right-hand panel shows the same distribution but on a (natural) logarithmic scale.

Figure 1: Loss Distribution. The left-hand panel shows the distribution of loss, the right-hand panel shows the same distribution but on a (natural) logarithmic scale.

Logarithmic Loss Distribution by Factor. The left-hand panel shows the distribution by rating group, the right-hand panel shows the distribution by territory.

Figure 2: Logarithmic Loss Distribution by Factor. The left-hand panel shows the distribution by rating group, the right-hand panel shows the distribution by territory.

Model Fitting

We report three types of fitted models here: (1) frequency models, (2) a severity model, and (3) a pure premium model.

Model Fitting - Frequency

Table 4 summarizes the results from two frequency models, Poisson and negative binomial regression models. For both models, we used a logarithmic link with logarithmic exposure as an offset variable. Focussing on the Poisson fit, we see that the \(t\)-statistics indicate strong statistical significance for several levels of each factor, rating group and territory. Additional tests confirm that they are statistically significant factors. Although not reported in Table 4, we also ran a model that included interactions among terms. The interaction terms were statistically insignificant at the \(p-value = 0.303\) level. Hence, we report on the model without interactions, in part because of our desire for simplicity.

We also ran an analysis including annual mileage. This variable turned out to be strongly statistically significant with a \(t\)-statistic equal to 12.08. However, by including this variable, we also lost 9,869 observations due to missing values in annual mileage. Thus, we treat the potential inclusion of this variable as an interesting follow-up study.

For some audiences, analysts may wish to present the more flexible negative binomial regression model. Table 4 shows that there is little difference in the estimated coefficients for this data set, indicating that the simpler Poisson model is acceptable.

R Code for Table 4
Table 4: Comparison of Poisson and Negative Binomial Models
Poisson Estimate Poisson t-statistic Neg Bin Estimate Neg Bin t-statistic
(Intercept) -2.636 -70.921 -2.639 -69.671
cgroupFB 0.344 2.846 0.343 2.790
cgroupFI 1.043 18.271 1.038 17.638
cgroupFM 0.541 8.578 0.539 8.383
cgroupFS -0.069 -1.493 -0.069 -1.477
tgroupF1 -0.768 -14.022 -0.766 -13.785
tgroupF2 -0.641 -12.241 -0.640 -12.037
tgroupF3 -0.600 -9.867 -0.598 -9.704
tgroupF4 -0.433 -8.810 -0.432 -8.638
tgroupF5 -0.265 -5.491 -0.264 -5.375

Model Fitting - Severity

Table 5 summarizes the fit of a gamma regression severity model. As described earlier, we use total losses divided by number of claims as the dependent variable and the reciprocal of the number of claims as the weight. We fit a gamma distribution with a logarithmic link and the two factors, rating group and territory. Table 5 shows small \(t\)-statistics associated with levels of rating group and territory - only “inexperienced” drivers are statistically significant. Additional tests indicate that the territory factor is not statistically significant and the rating group factor is marginally statistically significant with a \(p-value = 0.042\). This is an interesting finding.

R Code for Table 5
Table 5: Gamma Regression Models
temp1S temp2S
(Intercept) 7.986 137.335
cgroupFB 0.014 0.076
cgroupFI 0.222 2.492
cgroupFM -0.013 -0.131
cgroupFS 0.036 0.500
tgroupF1 0.026 0.309
tgroupF2 -0.137 -1.673
tgroupF3 0.004 0.041
tgroupF4 -0.029 -0.376
tgroupF5 0.019 0.255

Model Fitting - Pure Premium

As an alternative to the frequency severity approach, we also fit a model using “pure premiums,” total losses, as the dependent variable. Similar to the frequency and severity models, we use a logarithmic link function with the factors rating group and territory. The Tweedie distribution was used. We approximated the Tweedie shape parameter \(p\) using profile likelihood and found that the value \(p=1.5\) was acceptable. This was the value used in the final estimation.

Table 6 reports the fitted Tweedie regression model. The \(t\)-statistics associated with several levels of rating group and territory are statistically significant. This suggests, and was confirmed through additional testing, that both factors are statistically significant determinants of total loss. The table also reports the relativities (computed as the exponentiated parameter estimates). Interestingly, these relativities turn out to be close to those of the frequency model; this is not surprising given the lack of statistical significance associated with the factors in the severity model.

R Code for Table 6
Table 6: Tweedie Regression Model
Estimate \(t\)-statistic
(Intercept) 5.356 63.471
cgroupFB 0.340 1.277
cgroupFI 1.283 9.389
cgroupFM 0.474 3.221
cgroupFS -0.033 -0.358
tgroupF1 -0.743 -6.535
tgroupF2 -0.782 -6.916
tgroupF3 -0.552 -4.368
tgroupF4 -0.480 -4.438
tgroupF5 -0.269 -2.498

Application: Massachusetts Automobile Claims

Out-of-Sample Model Comparisons

To compare the frequency-severity and pure premium approaches, we examined a held-out validation sample. Specifically, from our original database, we drew a random sample of 100,000 policies and developed the models reported earlier. Then, we drew an (independent) sample of 50,000 policies. You don’t have access to the original data files and thus can not draw the random samples that we did. Nonetheless, you may find the follow R code interesting, to see how we did it.

R Code for Independent In- and Out-Samples

For the frequency-severity model, our predictions are based on the Poisson frequency coefficients in Table 4 to estimate \(\boldsymbol \beta_F\), severity coefficients in Table 5 to estimate \(\boldsymbol \beta_S\), and with values of the independent variables from the held-out validation sample. The predictions for the Tweedie model followed similarly using the coefficients reported in Table 6.

Table 3 compares the predictions for frequency-severity and the pure premium models. The left-hand panel shows the distribution of our pure premium predictions. The right-hand panel shows the strong relationship between the two predictions; it turns out that the correlation is approximately 0.999. Both models provided some ability to predict total losses; the (Spearman) correlation between held-out losses and (either) predictor turned out to be 8.2%.

Here is the R code to produce Figure 3.

R Code for Figure 3
Out-of-Sample Mean Performance. The left-hand panel shows the distribution of the out-of-sample predictions calculated using the pure premium, or Tweedie, model. The right-hand panel shows the strong relationship between the scores from the frequency-severity and the pure premium models.

Figure 3: Out-of-Sample Mean Performance. The left-hand panel shows the distribution of the out-of-sample predictions calculated using the pure premium, or Tweedie, model. The right-hand panel shows the strong relationship between the scores from the frequency-severity and the pure premium models.

Because the mean predictor did not provide a way of discriminating between the pure premium and frequency-severity models, we also looked to tail percentiles. Specifically, in the Tweedie regression model, we cited \(p=1.5\) and \(\hat{\phi} = 2.371\) and described how to estimate \(\hat{\mu}_i\) for each observation \(i\). Then, we noted that one could use the qTweedie function in R to get quantiles, e.g., the 95th quantile. We did this for each held-out observation and compared it to the actual realized value. If the model is correct, we expect 95% of the observations to be less than the corresponding quantile.

On a separate note, actuaries might wish to use this set of quantiles, that varies by policy characteristics, when selecting a reinsurance treaty.

The procedure for Poisson frequency and gamma severity models is similar but a bit more complex. Earlier, we noted that a Poisson sum of gamma random variables has a Tweedie distribution. So, even though we estimate the frequency and severity parameters separately, they can still be combined when we look at the loss distribution. We showed explicitly how to get Tweedie parameters from the Poisson frequency and gamma severity models. Then, as with the Tweedie GLM, we can calculate, e.g., a 95th quantile for each observation and compare it to the actual realized value.

Table 7 provides the comparisons for selected percentiles. Both models provide disappointing results. On the one hand, this table suggests that fitting the tails of the distribution is a more complex problem that requires more refined data and sophisticated models. On the other hand, the similarity of results in Table 3 when predicting the mean suggests a robustness of the GLM procedures that gives the analyst confidence when providing recommendations.

Here is the R code to produce Table 7. To test this out, use the number “ntest” to be something small, like 2500. On my machine, the full run of ntest=50000 takes longer.

R Code for Table 7
Table 7: Out-of-Sample Quantile Performance
Percentile Pure Premium Frequency Severity
0.960 0.5144 0.4264
0.970 0.8568 0.7912
0.980 0.9404 0.8644
0.985 0.9712 0.9104
0.990 0.9904 0.9436
0.995 0.9948 0.9792
0.999 0.9972 0.9980