Chapter 11 Risk Classification

Chapter Preview. This chapter motivates the use of risk classification in insurance pricing and introduces readers to Poisson regression as a prominent example of risk classification. In Section 11.1 we explain why insurers need to incorporate various risk characteristics, or rating factors, of individual policyholders in pricing insurance contracts. In Section 11.2, we introduce Poisson regression as a pricing tool to achieve such premium differentials. The concept of exposure is also introduced in this section. As most rating factors are categorical, we show in Section 11.3 how the multiplicative tariff model can be incorporated into a Poisson regression model in practice, along with numerical examples for illustration.

11.1 Introduction


In this section, you learn:

  • Why premiums should vary across policyholders with different risk characteristics.
  • The meaning of the adverse selection spiral.
  • The need for risk classification.

Through insurance contracts, the policyholders effectively transfer their risks to the insurer in exchange for premiums. For the insurer to stay in business, the premium income collected from a pool of policyholders must at least equal the benefit outgo. In general insurance products where a premium is charged for a single period, say annually, the gross insurance premiumSum of expected losses and expenses and profit on a policy based on the equivalence principle is stated as \[ \text{Gross Premium = Expected Losses + Expected Expenses + Profit}. \] Thus, ignoring frictional expenses associated with the administrative expenses and profit, the net or pure premium charged by the insurer should be equal to the expected losses occurring from the risk that is transferred from the policyholder.

If all policyholders in the insurance pool have identical risk profiles, the insurer simply charges the same premium for all policyholders because they have the same expected loss. In reality, however, the policyholders are hardly homogeneous. For example, mortality risk in life insurance depends on the characteristics of the policyholder, such as, age, sex and lifestyle. In auto insurance, those characteristics may include age, occupation, the type or use of the car, and the area where the driver resides. The knowledge of these characteristics or variables can enhance the ability of calculating fair premiums for individual policyholders, as they can be used to estimate or predict the expected losses more accurately.

Adverse Selection. Indeed, if the insurer does not differentiate the risk characteristicsThe distinguishing features of a policy that help determine the expected loss on the policy of individual policyholders and simply charges the same premium to all insureds based on the average loss in the portfolio, the insurer would face adverse selectionA pricing structure that entices riskier individuals to purchase and discourages low-risk individuals from purchasing, a situation where individuals with a higher chance of loss are attracted in the portfolio and low-risk individuals are repelled.

For example, consider a health insurance where smoking status is an important risk factor for mortality and morbidity. Most health insurers in the market require different premiums depending on smoking status, so smokers pay higher premiums than non-smokers, with other characteristics being identical. Now suppose that there is an insurer, we will call EquitabAll, that offers the same premium to all insureds regardless of smoking status, unlike other competitors. The net premium of EquitabAll is naturally an average mortality loss accounting for both smokers and non-smokers. That is, the net premium is a weighted average of the losses with the weights being the proportions of smokers and non-smokers, respectively. Thus it is easy to see that that a smoker would have a good incentive to purchase insurance from EquitabAll than from other insurers as the offered premium by EquitabAll is relatively lower. At the same time non-smokers would prefer buying insurance from somewhere else where lower premiums, computed from the non-smoker group only, are offered. As a result, there will be more smokers and less non-smokers in the EquitabAll’s portfolio, which leads to larger-than-expected losses and hence a higher premium for insureds in the next period to cover the higher costs. With the raised new premium in the next period, non-smokers in EquitabAll will have even greater incentives to switch insurers. As this cycle continues over time, EquitabAll would gradually retain more smokers and less non-smokers in its portfolio with the premium continually raised, eventually leading to business collapse.

In the literature, this phenomenon is known as the adverse selection spiralPhenomenon where a book of business deteriorates as it attracts ever-riskier individuals when forced to increase premiums due to losses or death spiral. Therefore, incorporating and differentiating important risk characteristics of individuals in the insurance pricing process are a pertinent component for both the determination of fair premium for individual policyholders and the long term sustainability of insurers.

Rating Factors. In order to incorporate relevant risk characteristics of policyholders in the pricing process, insurers maintain some classification system that assigns each policyholder to one of the risk classes based on a relatively small number of risk characteristics that are deemed most relevant. These characteristics used in the classification system are called rating factorsCharacteristics of a risk that help price the insurance contract, which are a priori variablesVariables which the insurer has prior knowledge of before the policy inception in the sense that they are known before the contract begins (e.g., sex, health status, vehicle type, etc, are known during underwriting). All policyholders sharing identical risk factors thus are assigned to the same risk class, and are considered homogeneous from a pricing viewpoint; the insurer consequently charges them the same premium or rate.

Regarding the risk factors and premiums, the Actuarial Standard of Practice (ASOP) No. 12 of the Actuarial Standards Board (2018) states that the actuary should select risk characteristics that are related to expected outcomes, and that rates within a risk classification system would be considered equitable if differences in rates reflect material differences in expected cost for risk characteristics. In the process of choosing risk factors, ASOP also requires the actuary to consider the following: relationship of risk characteristics and expected outcomes, causality, objectivity, practicality, applicable law, industry practices, and business practices.

On the quantitative side, an important task for the actuary in building a risk classification framework is to construct a statistical model that can determine the expected loss given various rating factors of a policyholder. The standard approach is to adopt a regression model which produces the expected loss as the output when the relevant risk factors are given as the inputs. In this chapter we learn about Poisson regression, which can be used when the loss is a count variableA count variable is a discrete variable with values on nonnegative integers., as a prominent example of an insurance pricing tool.

Show Quiz Solution

11.2 Poisson Regression Model

The Poisson regression model has been successfully used in a wide range of applications and has an advantage of allowing closed-form expressionsA mathematical expression that can be well defined with a formula that has a finite number of operations for important quantities. In this section we introduce Poisson regression as a natural extension of the Poisson distribution.


In this section you will:

  • Understand Poisson regression as a convenient tool for combining individual Poisson distributions.
  • Sharpen your understanding of the concept of exposure and its importance.
  • Formally learn how to formulate a Poisson regression model using indicator variables when the explanatory variables are categorical.

11.2.1 Need for Poisson Regression

Poisson Distribution

To introduce Poisson regression, let us consider a hypothetical health insurance portfolio where all policyholders are of the same age and only one risk factor, smoking status, is relevant. Smoking status thus is a categorical variableA variable whose values are qualitative groups and can have no natural ordering (nominal) or an ordering (ordinal) with two levelsDifferent outcomes of a categorical variable: smoker and non-smoker. As there are two levels for smoking status, we may denote smoker and non-smoker by level 1 and 2, respectively. Here the numbering is arbitrary; smoking status is a nominalA categorical variable where the categories do not have a natural order and any numbering is arbitrary categorical variable. (See Section 2.3.1 an introduction to categorical and nominal variables.) Suppose now that we are interested in pricing a health insurance where the premium for each policyholder is determined by the number of outpatient visits to doctor’s office during a year. The medical cost for each visit is assumed to be the same regardless of smoking status for simplicity. Thus if we believe that smoking status is a valid risk factor in this health insurance, it is natural to consider observations from smokers separately from non-smokers. In Table 11.1 we present data for this portfolio.

Table 11.1. Number of Visits to Doctor’s Office in Last Year

\[ {\small \begin{matrix} \begin{array}{cc|cc|cc} \hline \text{Smoker} & \text{(level 1)} & \text{Non-smoker}&\text{(level 2)} & & \text{Both}\\ \text{Count} & \text{Observed} & \text{Count} & \text{Observed} & \text{Count} & \text{Observed} \\ \hline 0 & 2213 & 0 & 6671 & 0 & 8884 \\ 1 & 178 & 1 & 430 & 1 & 608 \\ 2 & 11 & 2 & 25 & 2 & 36 \\ 3 & 6 & 3 & 9 & 3 & 15 \\ 4 & 0 & 4 & 4 & 4 & 4 \\ 5 & 1 & 5 & 2 & 5 & 3 \\ \hline \text{Total} & 2409 & \text{Total} & 7141 & \text{Total} & 9550 \\ \text{Mean} & 0.0926 & \text{Mean} & 0.0746 & \text{Mean} & 0.0792 \\ \hline \end{array} \end{matrix} } \]

As this dataset contains random counts, we try to fit a Poisson distribution for each level.

As introduced in Section 3.2.3, the probability mass function of the Poisson with mean \(\mu\) is given by \[\begin{equation} \Pr(Y=y)=\frac{\mu^y e^{-\mu}}{y!},\qquad y=0,1,2, \ldots \tag{11.1} \end{equation}\]
and \(\mathrm{E~}{(Y)}=\mathrm{Var~}{(Y)}=\mu\). In regression contexts, it is common to use \(\mu\) for mean parameters instead of the Poisson parameter \(\lambda\) although certainly both symbols are suitable. As we saw in Section 3.4.2, the mleMaximum likelihood estimate of the Poisson distribution is given by the sample mean. Thus if we denote the Poisson mean parameter for each level by \(\mu_{(1)}\) (smoker) and \(\mu_{(2)}\) (non-smoker), we see from Table 11.1 that \(\hat{\mu}_{(1)}=0.0926\) and \(\hat{\mu}_{(2)}=0.0746\). This simple example shows the basic idea of risk classification. Depending on smoking status, a policyholder will have a different risk characteristic that can be incorporated via varying Poisson mean parameters to compute the fair premium. In this example the ratio of expected loss frequencies is \(\hat{\mu}_{(1)}/\hat{\mu}_{(2)}=1.2402\), implying that smokers tend to visit a doctor’s office 24.02\(\%\) times more frequently compared to non-smokers.

It is also informative to note that if the insurer charges the same premium to all policyholders regardless of smoking status, based on the average characteristic of the portfolio, as was the case for EquitabAll described in Introduction, the expected frequency (or premium) \(\hat{\mu}\) is 0.0792, obtained from the last column of Table 11.1. It can be verified that \[\begin{equation} \hat{\mu} = \left(\frac{n_1}{n_1+n_2}\right)\hat{\mu}_{(1)}+\left(\frac{n_2}{n_1+n_2}\right)\hat{\mu}_{(2)}=0.0792, \tag{11.2} \end{equation}\] where \(n_i\) is the number of observations in each level. Clearly, this premium is a weighted average of the premiums for each level with the weight equal to the proportion of insureds in that level.

A simple Poisson regression
In the example above, we have fitted a Poisson distribution for each level separately, but we can actually combine them together in a unified fashion so that a single Poisson model can encompass both smoking and non-smoking statuses. This can be done by relating the Poisson mean parameter with the risk factor. In other words, we make the Poisson mean, which is the expected loss frequency, respond to the change in the smoking status. The conventional approach to deal with a categorical variable is to adopt indicator or dummy variablesA variable that takes on a value of 0 or 1 to indicate the absence or presence of a categorical characteristic that take either 1 or 0, so that we turn the switch on for one level and off for others. Therefore we may propose to use \[\begin{equation} \mu=\beta_0+\beta_1 x_1 \tag{11.3} \end{equation}\] or, more commonly, a log linear formLinear regression model where the response variable is the natural log of the expected response value \[\begin{equation} \log \mu=\beta_0+\beta_1 x_1, \tag{11.4} \end{equation}\] where \(x_1\) is an indicator variable with \[\begin{equation} x_1= \begin{cases} 1 & \text{if smoker}, \\ 0 & \text{otherwise}. \end{cases} \tag{11.5} \end{equation}\] We generally prefer the log linear relation in (11.4) to the linear one in (11.3) to prevent producing negative \(\mu\) values, which can happen when there are many different risk factors and levels. The setup in (11.4) and (11.5) then results in different Poisson frequency parameters depending on the level in the risk factor: \[\begin{equation} \log \mu= \begin{cases} \beta_0+\beta_1 \\ \beta_0 \end{cases} \quad \text{or equivalently,}\qquad \mu= \begin{cases} e^{\beta_0+\beta_1} & \text{if smoker (level 1)}, \\ e^{\beta_0} & \text{if non-smoker (level 2)} . \end{cases} \tag{11.6} \end{equation}\] This is the simplest form of Poisson regression. Note that we require a single indicator variable to model two levels in this case. Alternatively, it is also possible to use two indicator variables through a different coding scheme. This scheme requires dropping the intercept term so that (11.4) is modified to \[\begin{equation} \log \mu=\beta_1 x_1+\beta_2 x_2, \tag{11.7} \end{equation}\] where \(x_2\) is the second indicator variable with \[ x_2= \begin{cases} 1 & \text{if non-smoker}, \\ 0 & \text{otherwise}. \end{cases} \] Then we have, from (11.7), \[\begin{equation} \log \mu= \begin{cases} \beta_1 \\ \beta_2 \end{cases} \quad \text{or}\qquad \mu= \begin{cases} e^{\beta_1} & \text{if smoker (level 1)}, \\ e^{\beta_2} & \text{if non-smoker (level 2)}. \end{cases} \tag{11.8} \end{equation}\] The numerical result of (11.6) is the same as (11.8) as all coefficients are given as numbers in actual estimation, with the former setup more common in most texts; we also stick to the former.

With this Poisson regression model we can readily understand how the coefficients \(\beta_0\) and \(\beta_1\) are linked to the expected loss frequency in each level. According to (11.6), the Poisson mean of the smokers, \(\mu_{(1)}\), is given by \[ \mu_{(1)}=e^{\beta_0+\beta_1}=\mu_{(2)} \,e^{\beta_1} \quad \text{or}\quad \mu_{(1)}/\mu_{(2)} =e^{\beta_1} \] where \(\mu_{(2)}\) is the Poisson mean for the non-smokers. This relation between the smokers and non-smokers suggests a useful way to compare the risks embedded in different levels of a given risk factor. That is, the proportional increase in the expected loss frequency of the smokers compared to that of the non-smokers is simply given by a multiplicative factor \(e^{\beta_1}\). Put another way, if we set the expected loss frequency of the non-smokers as the base value, the expected loss frequency of the smokers is obtained by applying \(e^{\beta_1}\) to the base value.

Dealing with multi-level case
We can readily extend the two-level case to a multi-level one where \(l\) different levels are involved for a single rating factor. For this we generally need \(l-1\) indicator variables to formulate \[\begin{equation} \log \mu=\beta_0+\beta_1 x_1+\cdots+\beta_{l-1} x_{l-1}, \tag{11.9} \end{equation}\] where \(x_k\) is an indicator variable that equals 1 if the policy belongs to level \(k\) and 0 otherwise, for \(k=1,2, \ldots, l-1\). By omitting the indicator variable associated with the last level in (11.9) we effectively chose level \(l\) as the base caseThe categorical level chosen as the default with all dummy variable indicators of 0, or reference level, but this choice is arbitrary and does not matter numerically. The resulting Poisson parameter for policies in level \(k\) then becomes, from (11.9), \[ \mu= \begin{cases} e^{\beta_0+\beta_k} & \text{if the policy belongs to level } k, (k=1,2, ..., l-1), \\ e^{\beta_0} & \text{if the policy belongs to level } l. \end{cases} \] Thus if we denote the Poisson parameter for policies in level \(k\) by \(\mu_{(k)}\), we can relate the Poisson parameter for different levels through \(\mu_{(k)}=\mu_{(l)}\, e^{\beta_k}\), \(k=1,2, \ldots, l-1\). This indicates that, just like the two-level case, the expected loss frequency of the \(k\)th level is obtained from the base value multiplied by the relative factor \(e^{\beta_k}\). This relative interpretation becomes more powerful when there are many risk factors with multi-levels, and leads us to a better understanding of the underlying risk and a more accurate prediction of future losses. Finally, we note that the varying Poisson mean is completely driven by the coefficient parameters \(\beta_k\)’s, which are to be estimated from the dataset; the procedure of the parameter estimation will be discussed later in this chapter.

11.2.2 Poisson Regression

We now describe Poisson regression in a formal and more general setting. Let us assume that there are \(n\) independent policyholders with a set of rating factors characterized by a \(k\)-variate vector25. The \(i\)th policyholder’s rating factor is thus denoted by vector \(\mathbf{ x}_i=(1, x_{i1}, \ldots, x_{ik})^{\prime}\), and the policyholder has recorded the loss count \(y_i \in \{0,1,2, \ldots \}\) from the last period of loss observation, for \(i=1, \ldots, n\). In the regression literature, the values \(x_{i1}, \ldots, x_{ik}\) are generally known as explanatory variables, as these are measurements providing information about the variable of interest \(y_i\). In essence, regression analysis is a method to quantify the relationship between a variable of interest and explanatory variables.

We also assume, for now, that all policyholders have the same one unit period for loss observation, or equal exposure of 1, to keep things simple; we will discuss more details regarding the exposure in the following subsection.

We describe Poisson regression through its mean function. For this we first denote \(\mu_i\) as the expected loss count of the \(i\)th policyholder under the Poisson specification (11.1): \[\begin{equation} \mu_i=\mathrm{E~}{(y_i|\mathbf{ x}_i)}, \qquad y_i \sim Pois(\mu_i), \, i=1, \ldots, n. \tag{11.10} \end{equation}\] The condition inside the expectation in equation (11.10) indicates that the loss frequency \(\mu_i\) is the model expected response to the given set of risk factors or explanatory variables. In principle the conditional mean \(\mathrm{E~}{(y_i|\mathbf{ x}_i)}\) in (11.10) can take different forms depending on how we specify the relationship between \(\mathbf{ x}\) and \(y\). The standard choice for Poisson regression is to adopt the exponential function, as we mentioned previously, so that \[\begin{equation} \mu_i=\mathrm{E~}{(y_i|\mathbf{ x}_i)}=e^{\mathbf{ x}^{\prime}_i\beta}, \qquad y_i \sim Pois(\mu_i), \, i=1, \ldots, n. \tag{11.11} \end{equation}\] Here \(\beta=(\beta_0, \ldots, \beta_k)^{\prime}\) is the vector of coefficients so that \(\mathbf{ x}^{\prime}_i \boldsymbol \beta=\beta_0+\beta_1x_{i1} +\ldots+\beta_k x_{ik}\). The exponential function in (11.11) ensures that \(\mu_i >0\) for any set of rating factors \(\mathbf{ x}_i\). Often (11.11) is rewritten as a log linear form \[\begin{equation} \log \mu_i=\log \mathrm{E~}{(y_i|\mathbf{ x}_i)}=\mathbf{ x}^{\prime}_i \boldsymbol \beta, \qquad y_i \sim Pois(\mu_i), \, i=1, \ldots, n \tag{11.12} \end{equation}\] to reveal the relationship when the right side is set as the linear form, \(\mathbf{ x}^{\prime}_i\beta\). Again, we see that the mapping works well as both sides of (11.12), \(\log \mu_i\) and \(\mathbf{ x}_i\beta\), can now cover all real values. This is the formulation of Poisson regression, assuming that all policyholders have the same unit period of exposure. When the exposures differ among the policyholders, however, as is the case in most practical cases, we need to revise this formulation by adding an exposure component as an additional term in (11.12).

11.2.3 Incorporating Exposure

Concept of Exposure

We first saw the concept of exposures in Section 10.4. In order to determine the size of potential losses in any type of insurance, one must always know the corresponding exposure. The concept of exposure is an extremely important ingredient in insurance pricing, though we usually take it for granted. For example, when we say the expected claim frequency of a health insurance policy is 0.2, it does not mean much without the specification of the exposure such as, in this case, per month or per year. In fact, all premiums and losses need the exposure precisely specified and must be quoted accordingly; otherwise all subsequent statistical analyses and predictions will be distorted.

In the previous section we assumed the same unit of exposure across all policyholders, but this is hardly realistic in practice. In health insurance, for example, two different policyholders with different lengths of insurance coverage (e.g., 3 months and 12 months, respectively) could have recorded the same number of claim counts. As the expected number of claim counts would be proportional to the length of coverage, we should not treat these two policyholders’ loss experiences identically in the modeling process. This motivates the need of the concept of exposure in Poisson regression.

The Poisson distribution in (11.1) is parametrized via its mean. To understand the exposure, we alternatively parametrize the Poisson pmf in terms of the rate parameter \(\lambda\), based on the definition of the Poisson process: \[\begin{equation} \Pr(Y=y)=\frac{(\lambda t)^y e^{-\lambda t}}{y!},\qquad y=0,1,2, \ldots \tag{11.13} \end{equation}\] with \(\mathrm{E~}{(Y)}=\mathrm{Var~}{(Y)}=\lambda t\). Here \(\lambda\) is known as the rate or intensity per unit period of the Poisson process and \(t\) represents the length of time or exposure, a known constant value. For given \(\lambda\) the Poisson distribution (11.13) produces a larger expected loss count as the exposure \(t\) gets larger. Clearly, (11.13) reduces to (11.1) when \(t=1\), which means that the mean and the rate become the same for an exposure of 1, the case we considered in the previous subsection.

In principle, the exposure does not need to be measured in units of time and may represent different things depending the problem at hand. For example:

  1. In health insurance, the rate may be the occurrence of a specific disease per 1,000 people and the exposure is the number of people considered in the unit of 1,000.
  2. In auto insurance, the rate may be the number of accidents per year of a driver and the exposure is the length of the observed period for the driver in the unit of year.
  3. For workers compensationA no-fault insurance system prescribed by state law where benefits are provided by an employer to an employee due to a job-related injury, including death, resulting from an accident or occupational disease that covers lost wages resulting from an employee’s work-related injury or illness, the rate may be the probability of injury in the course of employment per dollar and the exposure is the payroll amount in dollars.
  4. In marketing, the rate may be the number of customers who enter a store per hour and the exposure is the number of hours observed.
  5. In civil engineering, the rate may be the number of major cracks on the paved road per 10 kms and the exposure is the length of road considered in the unit of 10 kms.
  6. In credit risk modelling, the rate may be the number of default events per 1000 firms and the exposure is the number of firms under consideration in the unit of 1,000.

Actuaries may be able to use different exposure basesThe unit of measurement chosen to represent the exposure for a particular risk for a given insurable loss. For example, in auto insurance, both the number of kilometers driven and the number of months covered by insurance can be used as exposure bases. Here the former is more accurate and useful in modelling the losses from car accidents, but more difficult to measure and manage for insurers. Thus, a good exposure base may not be the theoretically best one due to various practical constraints. As a rule, an exposure base must be easy to determine, accurately measurable, legally and socially acceptable, and free from potential manipulation by policyholders.

Incorporating exposure in Poisson regression

As exposures affect the Poisson mean, constructing Poisson regressions requires us to carefully separate the rate and exposure in the modelling process. Focusing on the insurance context, let us denote the rate of the loss event of the \(i\)th policyholder by \(\lambda_i\), the known exposure (the length of coverage) by \(m_i\) and the expected loss count under the given exposure by \(\mu_i\). Then the Poisson regression formulation in (11.11) and (11.12) should be revised in light of (11.13) as \[\begin{equation} \mu_i=\mathrm{E~}{(y_i|\mathbf{ x}_i)}=m_i \,\lambda_i=m_i \, e^{\mathbf{ x}^{\prime}_i \boldsymbol \beta}, \qquad y_i \sim Pois(\mu_i), \, i=1, \ldots, n, \tag{11.14} \end{equation}\] which gives \[\begin{equation} \log \mu_i=\log m_i+\mathbf{ x}^{\prime}_i \boldsymbol \beta, \qquad y_i \sim Pois(\mu_i), \, i=1, \ldots, \tag{11.15} \end{equation}\] Adding \(\log m_i\) in (11.15) does not pose a problem in fitting as we can always specify this as an extra explanatory variable, as it is a known constant, and fix its coefficient to 1. In the literature the log of exposure, \(\log m_i\), is commonly called the offsetNatural log of the exposure amount that is added to a regression model to account for varying exposures.

Show Quiz Solution

11.3 Categorical Variables and Multiplicative Tariff


In this section you will learn:

  • The multiplicative tariff model when the rating factors are categorical.
  • How to construct a Poisson regression model based on the multiplicative tariff structure.

11.3.1 Rating Factors and Tariff

In practice most rating factors in insurance are categorical variables, meaning that they take one of the predetermined number of possible values. Examples of categorical variables include sex, type of cars, the driver’s region of residence and occupation. Continuous variables, such as age or auto mileage, can also be grouped by bands and treated as categorical variables. Thus we can imagine that, with a small number of rating factors, there will be many policyholders falling into the same risk class, charged with the same premium. For the remaining of this chapter we assume that all rating factors are categorical variables.

To illustrate how categorical variables are used in the pricing process, we consider a hypothetical auto insurance with only two rating factors:

  • Type of vehicle: Type A (personally owned) and B (owned by corporations). We use index \(j=1\) and \(2\) to respectively represent each level of this rating factor.
  • Age band of the driver: Young (age \(<\) 25), middle (25 \(\le\) age \(<\) 60) and old age (age \(\ge\) 60). We use index \(k=1, 2\) and \(3\), respectively, for this rating factor.

From this classification rule, we may create an organized table or list, such as the one shown in Table 11.2, collected from all policyholders. Clearly there are \(2 \times 3=6\) different risk classes in total. Each row of the table shows a combination of different risk characteristics of individual policyholders. Our goal is to compute six different premiums for each of these combinations. Once the premium for each row has been determined using the given exposure and claim counts, the insurer can replace the last two columns in Table 11.2 with a single column containing the computed premiums. This new table then can serve as a manual to determine the premium for a new policyholder given rating factors during the underwriting process. In non-life insurance, a table (or a set of tables) or list that contains each set of rating factors and the associated premium is referred to as a tariffA table or list that contains the rating factors and associated premiums and other risk information. Each unique combination of the rating factors in a tariff is called a tariff cell; thus, in Table 11.2 the number of tariff cells is six, same as the number of risk classes.

Table 11.2. Loss Record of the Illustrative Auto Insurer

\[ {\small \begin{matrix} \begin{array}{ccrrc} \hline \text{Rating} &\text{factors} & \text{Exposure} & \text{Claim count} \\ \text{Type }(j) & \text{Age }(k) & \text{in year} & \text{observed}\\ \hline \hline 1 & 1 & 89.1 & 9\\ 1 & 2 & 208.5& 8\\ 1 & 3 & 155.2 & 6 \\ 2 & 1 & 19.3 & 1 \\ 2 & 2 & 360.4 & 13 \\ 2 & 3 & 276.7 & 6 \\ \hline \end{array} \end{matrix} } \]

Let us now look at the loss information in Table 11.2 more closely. The exposure in each row represents the sum of the length of insurance coverages, or in-force timesThe timeframe during which a policy is active and the insurer is bound by the contractual obligation, in years, of all the policyholders in that tariff cell. Similarly the claim counts in each row is the number of claims in each cell. Naturally the exposures and claim counts vary due to the different number of drivers across the cells, as well as different in-force time periods among the drivers within each cell.

In light of the Poisson regression framework, we denote the exposure and claim count of cell \((j,k)\) as \(m_{jk}\) and \(y_{jk}\), respectively, and define the claim count per unit exposure as \[ z_{jk}= \frac{y_{jk}}{ m_{jk}}, \qquad j=1,2;\, k=1, 2,3. \] For example, \(z_{12}=8/208.5=0.03837\), meaning that a policyholder in tariff cell (1,2) would have 0.03837 accidents if insured for a full year on average. The set of \(z_{ij}\) values then corresponds to the rate parameterParameter in certain distributions, such as the exponential, that indicate how quickly the function decays, and it is the reciprocal of the scale parameter in the Poisson distribution (11.13) as they are the event occurrence rates per unit exposure. That is, we have \(z_{jk}=\hat{\lambda}_{jk}\) where \({\lambda}_{jk}\) is the Poisson rate parameter. Producing \(z_{ij}\) values however does not do much beyond comparing the average loss frequencies across risk classes. To fully exploit the dataset, we will construct a pricing model from Table 11.2 using Poisson regression, for the remaining part of the chapter.

We comment that actual loss records used by insurers typically include many more risk factors, in which case the number of cells grows exponentially. The tariff would then consist of a set of tables, instead of one, separated by some of the basic rating factors, such as sex or territory.

11.3.2 Multiplicative Tariff Model

In this subsection, we introduce the multiplicative tariff modelA rating method where each rating factor is the product of parameters associated with that rating factor, a popular pricing structure that can be naturally used within the Poisson regression framework. The developments here are based on Table 11.2. Recall that the loss count of a policyholder is described by a Poisson regression model with rate \(\lambda\) and the exposure \(m\), so that the expected loss count becomes \(m\lambda\). As \(m\) is a known constant, we are essentially concerned with modelling \(\lambda\), so that it responds to the change in rating factors. Among other possible functional formsThe algebraic relationship between a dependent variable and explanatory variables, we commonly choose the multiplicative26 relation to model the Poisson rate \(\lambda_{jk}\) for cell (\(j,k\)): \[\begin{equation} \lambda_{jk}= f_0 \times f_{1j} \times f_{2k}, \qquad j=1,2;\, k=1, 2,3. \tag{11.16} \end{equation}\]

Here \(\{ f_{1j}, j=1,2\}\) are the parameters associated with the two levels in the first rating factor, car type, and \(\{ f_{2k}, k=1,2,3\}\) associated with the three levels in the age band, the second rating factor. For instance, the Poisson rate for a mid-aged policyholder with a Type B vehicle is given by \(\lambda_{22}=f_0 \times f_{12} \times f_{22}\). The first term \(f_0\) is some base value to be discussed shortly. Thus these six parameters are understood as numerical representations of the levels within each rating factor, and are to be estimated from the dataset.

The multiplicative form (11.16) is easy to understand and use, because it clearly shows how the expected loss count (per unit exposure) changes as each rating factor varies. For example, if \(f_{11}=1\) and \(f_{12}=1.2\), then the expected loss count of a policyholder with a vehicle of type B would be 20\(\%\) larger than type A, when the other factors are the same. In non-life insurance, the parameters \(f_{1j}\) and \(f_{2k}\) are known as relativitiesA numerical estimate of value in one category relative to the value in a base classification, typically expressed as a factor as they determine how much expected loss should change relative to the base value \(f_0\). The idea of relativity is quite convenient in practice, as we can decide the premium for a policyholder by simply multiplying a series of corresponding relativities to the base value.

Dropping an existing rating factor or adding a new one is also transparent with this multiplicative structure. In addition, the insurer may adjust the overall premium for all policyholders by controlling the base value \(f_0\) without changing individual relativities. However, by adopting the multiplicative form, we implicitly assume that there is no serious interaction among the risk factors.

When the multiplicative form is used we need to address an identification issue. That is, for any \(c>0\), we can write \[ \lambda_{jk}= f_0 \times \frac{f_{1j}}{c} \times c\,f_{2k}. \] By comparing with (11.16), we see that the identical rate parameter \(\lambda_{jk}\) can be obtained for very different individual relativities. This over-parametrization, meaning that many different sets of parameters arrive at an identical model, obviously calls for some restriction on \(f_{1j}\) and \(f_{2k}\). The standard practice is to make one relativity in each rating factor equal to one. This can be made arbitrarily in theory, but the standard practice is to make the relativity of most common class (base class) equal to one. We will assume that type A vehicles and young drivers to be the most common classes, that is, \(f_{11} = 1\) and \(f_{21} = 1\). This way all other relativities are uniquely determined. The tariff cell \((j,k)=(1,1)\) is then called the base tariff cellThe chosen set of rating categories where the rate equals the intercept of the model (the base value), where the rate simply becomes \(\lambda_{11}=f_0\), corresponding to the base value according to (11.16). Thus the base value \(f_0\) is generally interpreted as the Poisson rate of the base tariff cell.

Again, (11.16) is log-transformed and rewritten as \[\begin{equation} \log \lambda_{jk}= \log f_0 + \log f_{1j} + \log f_{2k}, \tag{11.17} \end{equation}\] as it is easier to estimate, similar to (11.12). This log linear form makes the log relativities of the base level in each rating factor equal to zero, i.e., \(\log f_{11}=\log f_{21}=0\), and leads to the following alternative, more explicit expression for (11.17): \[\begin{equation} \small{ \log \lambda_{jk}=\begin{cases} \log f_0 + \quad 0 \quad \,\,+ \quad 0 \quad \,\,& \text{for a policy in cell }(1,1), \\ \log f_0+ \quad 0 \quad \,\,+\log f_{22}& \text{for a policy in cell }(1,2), \\ \log f_0+ \quad 0 \quad \,\,+\log f_{23}& \text{for a policy in cell }(1,3), \\ \log f_0+\log f_{12}+ \quad 0 \quad \,\,& \text{for a policy in cell }(2,1), \\ \log f_0+\log f_{12}+\log f_{22}& \text{for a policy in cell }(2,2), \\ \log f_0+\log f_{12}+\log f_{23}& \text{for a policy in cell }(2,3). \\ \end{cases} } \tag{11.18} \end{equation}\] This shows that the Poisson rate parameter \(\lambda\) varies across different tariff cells, with the same log linear form used in a Poisson regression framework. In fact the reader may see that (11.18) is an extended version of the early expression (11.6) with multiple risk factors and that the log relativities now play the role of \(\beta_i\) parameters. Therefore all the relativities can be readily estimated via fitting a Poisson regression with a suitably chosen set of indicator variables.

11.3.3 Poisson Regression for Multiplicative Tariff

Indicator Variables for Tariff Cells

We now explain how the relativities can be incorporated into Poisson regression. As seen early in this chapter we use indicator variables to deal with categorical variables. For our illustrative auto insurer, therefore, we define an indicator variable for the first rating factor as \[ x_1= \begin{cases} 1 & \text{ for vehicle type B}, \\ 0 & \text{ otherwise}. \end{cases} \] For the second rating factor, we employ two indicator variables for the age band, that is, \[ x_2= \begin{cases} 1 & \text{for age band 2}, \\ 0 & \text{otherwise}. \end{cases} \] and \[ x_3= \begin{cases} 1 & \text{for age band 3}, \\ 0 & \text{otherwise}. \end{cases} \] The triple \((x_1, x_2, x_3)\) then can effectively and uniquely determine each risk class. By observing that the indicator variables associated with Type A and Age band 1 are omitted, we see that tariff cell \((j,k)=(1,1)\) plays the role of the base cell. We emphasize that our choice of the three indicator variables above has been carefully made so that it is consistent with the choice of the base levels in the multiplicative tariff model in the previous subsection (i.e., \(f_{11}=1\) and \(f_{21}=1\)).

With the proposed indicator variables we can rewrite the log rate (11.17) as \[\begin{equation} \log \lambda_{}= \log f_0+ \log f_{12} \times x_1 + \log f_{22} \times x_2 +\log f_{23} \times x_3, \tag{11.19} \end{equation}\] which is identical to (11.18) when each triple value is actually applied. For example, we can verify that the base tariff cell \((j,k)=(1,1)\) corresponds to \((x_1, x_2,x_3)=(0, 0, 0)\), and in turn produces \(\log \lambda=\log f_0\) or \(\lambda= f_0\) in (11.19) as required.

Poisson regression for the tariff model
Under this specification, let us consider \(n\) policyholders in the portfolio with the \(i\)th policyholder’s risk characteristic given by a vector of explanatory variables \(\mathbf{ x}_i=(1, x_{i1}, x_{i2},x_{i3})^{\prime}\), for \(i=1, \ldots, n\). We then recognize (11.19) as \[ \log \lambda_{i}= \beta_0+ \beta_1 \, x_{i1} + \beta_{2} \, x_{i2} +\beta_3 \, x_{i3}=\mathbf{ x}^{\prime}_i \boldsymbol \beta, \qquad i=1, \ldots, n, \] where \(\beta_0, \ldots, \beta_3\) can be mapped to the corresponding log relativities in (11.19). This is exactly the same setup as in (11.15) except for the exposure component. Therefore, by incorporating the exposure in each risk class, a Poisson regression model for this multiplicative tariff model finally becomes \[ \begin{array}{ll} \log \mu_i &=\log \lambda_{i}+\log m_i= \log m_i+ \beta_0+ \beta_1 \, x_{i1} + \beta_{2} \, x_{i2} +\beta_3 \, x_{i3}\\ &=\log m_i+\mathbf{ x}^{\prime}_i \boldsymbol \beta, \end{array} \] for \(i=1, \ldots, n\). As a result, the relativities are given by \[\begin{equation} {f}_0=e^{\beta_0}, \quad {f}_{12}=e^{\beta_1}, \quad {f}_{22}=e^{\beta_2}, \quad \text{and}\quad {f}_{23}=e^{\beta_3}, \tag{11.20} \end{equation}\] with \(f_{11}=1\) and \(f_{21}=1\) from the original construction. For the actual dataset, \(\beta_i\), \(i=0,1, 2, 3\), is replaced with the mle \(b_i\) using the method in the technical supplement at the end of this chapter (Section 11.A).

11.3.4 Numerical Examples

We present two numerical examples of Poisson regression. In the first example we construct a Poisson regression model from Table 11.2, which is a dataset of a hypothetical auto insurer. The second example uses an actual industry dataset with more risk factors. As our purpose is to show how a Poisson regression model can be used under a given classification rule, we are not concerned with the quality of the Poisson model fit in this chapter.

Example 11.1: Poisson regression for the illustrative auto insurer. In the last few subsections we considered a dataset of a hypothetical auto insurer with two risk factors, as given in Table 11.2. We now apply a Poisson regression model to this dataset. As done before, we have set \((j,k)=(1,1)\) as the base tariff cell, so that \(f_{11}=f_{21}=1\). The result of the regression gives the coefficient estimates \((b_0, b_1,b_2,b_3)=(-2.3359, -0.3004, -0.7837, -1.0655)\), which in turn produces the corresponding estimated relativities \[ {f}_0=0.0967, \quad {f}_{12}= 0.7405, \quad {f}_{22}=0.4567 \quad \text{and}\quad {f}_{23}=0.3445, \] from the relation given in (11.20). The R script and the output are as follows.

Show R Code

Example 11.2. Poisson regression for Singapore insurance claims data. This actual dataset is a subset of the data used by Frees and Valdez (2008). The data are from the General Insurance Association of Singapore, an organization consisting of non-life insurers in Singapore. These data contains the number of car accidents for \(n=7,483\) auto insurance policies with several categorical explanatory variables and the exposure for each policy. The explanatory variables include four risk factors: the type of the vehicle insured (either automobile (A) or other (O), denoted by Vtype), the age of the vehicle in years (Vage), gender of the policyholder (Sex) and the age of the policyholder (in years, grouped into seven categories, denoted Age).

Based on the data description, there are several things to consider before constructing a model. First, there are 3,842 policies with vehicle type A (automobile) and 3,641 policies with other vehicle types. However, age and sex information is available for the policies of vehicle type A only; the drivers of all other types of vehicles are recorded to be aged 21 or less with sex unspecified, except for one policy, indicating that no driver information has been collected for non-automobile vehiclesMotorized vehicles which are not autos, such as atvs, off-road vehicles, go-carts, etc.. Second, type A vehicles are all classified as private vehicles and all the other types are not.

When we include these risk factors, we assume all unspecified sex to be male. As the age information is only applicable to type A vehicles, we set the model accordingly. That is, we apply the age variable only to vehicles of type A. Also we used five vehicle age bands, simplifying the original seven bands, by combining vehicle ages 0,1 and 2; the combined band is marked as level 227 in the data file. Thus our Poisson model has the following explicit form: \[\begin{align*} \log \mu_i= \mathbf{ x}^{\prime}_i\beta+&\log m_i=\beta_0+\beta_1 I(Sex_i=M)+ \sum_{t=2}^6 \beta_t\, I(Vage_i=t) \\ &+ \sum_{t=7}^{13} \beta_t \,I(Vtype_i=A)\times I(Age_i=t-7)+\log m_i. \end{align*}\]

The fitting result is given in Table 11.3, for which we have several comments.

  • The claim frequency is higher for males by 17.3%, when other rating factors are held fixed. However, this may have been affected by the fact that all unspecified sex has been assigned to male.
  • Regarding the vehicle age, the claim frequency gradually decreases as the vehicle age increases, when other rating factors are held fixed. The level starts from 2 for this variable but, again, the numbering is nominal and does not affect the numerical result.
  • The policyholder age variable only applies to type A (automobile) vehicle, and there are no policies in the first age band. We may speculate that younger drivers less than age 21 drive their parents’ cars rather than having their own because of high insurance premiums or related regulations. The missing relativity may be estimated by some extrapolation or the professional judgement of the actuary. The claim frequency is the lowest for age band 3 and 4, but gets substantially higher for older age bands, a reasonable pattern seen in many auto insurance loss datasets.

We also note that there is no base level in the policyholder age variable, in the sense that no relativity is equal to 1. This is because the variable is only applicable to vehicle type A. This does not cause a problem numerically, but one may set the base relativity as follows if necessary for other purposes. Since there is no policy in age band 0, we consider band 1 as the base case. Specifically, we treat its relativity as a product of 0.918 and 1, where the former is the common relativity (that is, the common premium reduction) applied to all policies with vehicle type A and the latter is the base value for age band 1. Then the relativity of age band 2 can be seen as \(0.917=0.918 \times 0.999\), where 0.999 is understood as the relativity for age band 2. The remaining age bands can be treated similarly.

Table 11.3. Singapore Insurance Claims Data

\[ {\small \begin{matrix} \begin{array}{clcc} \hline \text{Rating factor} & \text{Level} & \text{Relativity in the tariff} & \text{Note}\\ \hline\hline \text{Base value} & & 0.167 & f_0\\ \hline \text{Sex} & 1 (F) & 1.000 & \text{Base level}\\ & 2 (M) & 1.173 &\\\hline \text{Vehicle age} & 2 (0-2\text{ yrs}) & 1.000 & \text{Base level}\\ & 3 (3-5\text{ yrs}) & 0.843 \\ & 4 (6-10\text{ yrs}) & 0.553 \\ & 5 (11-15\text{ yrs}) & 0.269 \\ & 6 (16+\text{ yrs}) & 0.189 &\\\hline \text{Policyholder age} & 0 (0-21) & \text{N/A} & \text{No policy} \\ \text{(Only applicable to} & 1 (22-25) & 0.918 \\ \text{vehicle type A)} & 2 (26-35) & 0.917 \\ & 3 (36-45) & 0.758 \\ & 4 (46-55) & 0.632 \\ & 5 (56-65) & 1.102\\ & 6 (65+) & 1.179\\ \hline \hline \end{array} \end{matrix} } \] Let us try several examples based on Table 11.3. Suppose a male policyholder aged 40 who owns a 7-year-old vehicle of type A. The expected claim frequency for this policyholder is then given by \[ \lambda=0.167 \times 1.173 \times 0.553 \times 0.758 = 0.082. \] As another example consider a female policyholder aged 60 who owns a 3-year-old vehicle of type O. The expected claim frequency for this policyholder is \[ \lambda=0.167 \times 1 \times 0.843 = 0.141. \] Note that for this policy the age band variable is not used as the vehicle type is not A. The R script is given as follows.

Show R Code

As a concluding remark, we comment that Poisson regression is not the only possible count regression model. Actually, the Poisson distribution can be restrictive in the sense that it has a single parameter and its mean and the variance are always equal. There are other count regression models that allow more flexible distributional structureThe manner in which a statistical distribution is parameterized, such as negative binomial regressions and zero-inflated (ZI) regressions; details of these alternative regressions can be found in other texts listed in the next section.

Show Quiz Solution

11.4 Risk Classification vs Discrimination

We have so far developed a quantitative model to deal with risk classification. There are however important qualitative aspects of risk classification as well, which have important moral and regulatory and legal implications. We briefly survey various issues related to risk classification in this section; see Frees and Huang (2023) for a more comprehensive treatment.

We start by acknowledging that risk classification, by definition, differentiates or discriminates among insureds or potential buyers based on a wide variety of attributes. That is, insurers divide individuals into subgroups and charge different premiums on the ground that each subgroup, when suitably formed, exhibits a different risk profile and thus produces insurance events (such as medical claims or car accidents) that are different in number and size. In this sense discrimination, of which the meaning is simply treating subgroups differently, is an essential element in insurance business.

Insurers can discriminate among customers in various stages. For example, they may decide not to insure potential customers at the marketing or underwriting stage by excluding particular subgroups intentionally, an issue known as redlining. Also insurers may refuse to renew existing customers, or restrict the insurance coverage. Another form of discrimination can be made by charging unfair prices for certain subgroups. This price discrimination is a standard practice in insurance and not an issue provided that the price differences are made based on the underlying risk level of each subgroup. However, non-risk price discrimination is more problematic in that the price differs for the identical product and coverage. These non-risk rating factors tend to be prohibited in many jurisdictions.

11.4.1 Economic Commodity versus Social Good

While economic arguments view insurance as an economic commodity and thus support insurers’ risk-based discrimination, others such as consumer advocates perceive insurance as a social good that should benefit the general public thus argue that discrimination must be avoided especially for disadvantaged groups. These two opposing views can be understood as two extremes of a continual spectrum of fairness when implemented in the real world. To give an idea let us consider the following examples:

  • Stock insurance company is located at one end of the spectrum. Here the company issues individual contracts, and insurance is viewed as a collection of separate agreements rather than a collective concept. Actuarially fair pricing is then determined by the expected value of the uncertain event, reflecting the risk transferred from the insured to the insurer. Fairness in this case is defined as each customer paying for their own risk only, supported by economic theory.

  • Government-sponsored social insurance is at the other end of the spectrum. Here contracts typically involve subsidies between different subgroups. Governments frequently employ such social policies to redistribute risk or income among individuals, though adherence to the principle of actuarial fairness can vary significantly depending on the target level of the redistribution.

  • Group insurance lies in the of the spectrum. For example, consider a disability income contract issued to the employees of an employer. In this case premium differentials by risk factors are not a major issue if the employer pays all or a major portion of the premiums.

Clearly the issue of discrimination and fairness in insurance is a multi-layered problem; it involves not only technical modeling but is also affected by the social consensus depending on the goal and characteristics of the insurance program.

11.4.2 Information Asymmetry

Discrimination in insurance may also arise from information asymmetry which means that insurers and the current or potential customers have unequal knowledge or access to relevant information on the underlying risk.

Adverse selection described in Introduction of this chapter is an example of information asymmetry. For an insurer adverse selection can occur when customers know better about their own risks than the insurer or when other competitors in the market have better knowledge about the risk of the customers, as illustrated at the beginning of this chapter. Generally speaking adverse selection can be reduced as more information on the customers is made available.

Another type of information asymmetry is moral hazard. In a typical case of moral hazard, policyholders become more risk-seeking or less cautious about the risk when the corresponding risk has already been insured. In other words, the insureds have the incentive to take on more risk because of the safety net provided by the insurance, leading to raised costs for the insurer. One remedy to moral hazard is to offer incentives to policyholders so that they can act more responsively or the exposure of the risk itself can be reduced. To illustrate, consider a customer who bought a homeowner’s insurance contract and thus becomes less careful about fire and theft. To mitigate such moral hazard, the insurer may offer some premium discount on the condition that fire and security alarms be installed in the house. Some moral hazard applies to insurer’s side. For example, insurers may collect protected or sensitive variables and create pricing models to unfairly discriminate customers. The moral hazard of the insurer’s side is generally managed and prevented based on insurance regulations and laws.

Another recently emerging area of information asymmetry is the knowledge imbalance created from big data models used by insurers. Consumer advocates point out that the information gap between the customers and insurers will get wider as more big data are available only to insurers. As a result, insurers can cherry pick potentially more profitable customers, and customers without big data tools will be unable to access to such information and thus at disadvantage. This implies that free market competition between insurers may be insufficient to protect policyholders if the insurers collectively monopolize the exclusive knowledge on the customers.

11.4.3 Sensitive Variables and Regulation

Some attributes or variables used in risk classification may be perceived to be unfair or sensitive. In the literature the following list of criteria is often considered in deciding whether an attribute is acceptable or fair as a rating variable.

  • Control: An attribute that can be controlled by an insured is generally considered to be an acceptable variable to be used for rating purposes. Smoking status is an example of such attribute. In contrast, race and gender at birth cannot be controlled by insureds.
  • Mutability: Some attributes change over time, but they may be used as rating variables if they are deemed fair to everyone. For example, aging applies fairly to us all over the course of a lifetime.
  • Statistical Discrimination: If a variable does not have predictive value of an underlying risk, it is generally viewed as unacceptable.
  • Causality: A variable known to cause an insured event can be used for rating purposes, but establishing a causal relationship may not be always easy because it requires strong evidence beyond a simple association.
  • Limiting or Reversing the Effects of Past Prejudice: If an attribute is related to negative stereotypes or otherwise disadvantaged groups, it may not be used for rating purposes.
  • Inhibiting Socially Valuable Behavior: If an insurer’s use of an attribute prevents socially desirable behaviors, it may not be used for rating purposes. For example, U.S. laws prohibit insurers from discriminating on the basis of intimate partner violence because such reporting could dissuade victims of violence from seeking needed medical care or police intervention.

In light of these complications, many jurisdictions have so-called rate regulations which prohibit insurers from engaging problematic pricing practices. For example, in the US, the model rating law of the National Association of Insurance Commissioners (NAIC, 2010) says that “rates shall not be excessive, inadequate or unfairly discriminatory.” It further notes that “unfair discrimination exists if, after allowing for practical limitations, price differentials fail to reflect equitably the differences in expected losses and expenses.” Different jurisdictions maintain different standards on the strictness of rate regulations. In countries where rate regulations are heavily enforced the regulators prescribe the actual rates whereas in other countries the regulators may only require approval of rates.

11.4.4 Big Data Models and Proxy Discrimination

Big data models such as deep learning, machine learning and AI algorithms are now ubiquitous in virtually every area of our society. These models are known to detect new patterns and connections that were previously unknown using advanced algorithms and various data sources.

From the viewpoint of risk classification or rating discrimination in insurance, it is argued that big data models would bring significant changes in privacy and proxy discrimination. The issue of privacy protection is well known. When data from various sources, e.g., the location information from GPS, wearable devices, social networks, credit cards, are combined together, big data models may reveal the identity of individuals or their sensitive information. Given that these collected data consists of seemingly innocuous or voluntarily provided variables, the potential risk of privacy breach and sensitive variable fabrication is becoming a reality.

Proxy discrimination arises when insurers discriminate based on a facially neutral attribute that is highly correlated with a protected and sensitive information. By employing these proxy variables in pricing models insurers could get the same quantitative results that would be obtained from using the protected variables directly. This is problematic because insurers are able to effectively use prohibited variables, without actually violating the rate regulation. Discovery or synthesis of such proxies can be made through statistical and big data models; proxy discrimination is harder to detect in the latter models as their algorithms tend to be less transparent. Though it is impossible to eliminate proxy discrimination completely, several strategies have been suggested to mitigate it:

  1. Community Rating: If all policyholders pay the same price, proxy discrimination can be eliminated. This kind of rating can be found in social insurance programs, but rare in general.

  2. Approved Variables: Regulators may specify a set of variables allowed in rating, prohibiting others. For example, in the US individual health insurance market under the Affordable Care Act, insurers are allowed to use only four rating factors: (1) whether a plan covers an individual or family, (2) geographic area, (3) age, and (4) smoking status.

  3. Actuarial Justification: In this strategy regulators specify a set of protected or sensitive variables that should not be used in rating. In addition, outside these variables, only variables that are actuarially justifiable or statistically significant are allowed.

  4. Limited Prohibitions. Alternatively, regulators specify a set of protected variables and no restriction is made for other variables outside this set.

  5. No Restrictions. In this extreme strategy regulators impose no prohibitions on rating variables, which actually is the case for most commercial insurance lines.

In practice the most viable option would be to adopt the third or fourth strategy with suitable modifications, with some disclosure requirement for the pricing model and data source used in the rating process.

11.5 Exercises

11.1. Regarding Table 11.1 answer the following.
(a) Verify the mean values in the table.
(b) Verify the number in equation (11.2).
(c) Produce the fitted Poisson counts for each smoking status in the table.

11.2. In a Poisson regression formulation (11.10), consider using \(\mu_i=\mathrm{E~}{(y_i|\mathbf{ x}_i)}=({\mathbf{ x}^{\prime}_i\beta})^2\), for \(i=1, \ldots, n\), instead of the exponential function. What potential issue would you have?

11.6 Further Resources and Contributors

Further Reading and References

Poisson regression is a special member of a more general regression model class known as the generalized linear model (GLM). The GLM develops a unified regression framework for datasets when the response variables are continuous, binary or discrete. The classical linear regression model with a normally distributed error is also a member of the GLM. There are many standard statistical texts dealing with the GLM, including McCullagh and Nelder (1989). More accessible texts are Dobson and Barnett (2008), Agresti (1996) and Faraway (2016). For actuarial and insurance GLM applications, see Frees (2009), De Jong and Heller (2008). Also, Ohlsson and Johansson (2010) discusses GLM in non-life insurance pricing context with tariff analyses.

In fact there is a notable historical connection between the GLM and an influential actuarial model. In 1960s the actuarial community has developed an auto ratemaking model that produces coherent and consistent rates across subgroups in both additive and multiplicative form. This method, known as Bailey minimum bias method (Bailey and Simon, 1960 and Bailey, 1963), turned out to be equivalent to the solution of a statistical model known as the GLM which made its first appearance in 1970s by Nelder and Wedderburn (1972). This strong connection has helped actuaries use and adopt the GLM models in a wide range of actuarial problems; see Frees, Derrig, Meyers (2014) and references therein for a more detailed historical note on this.

Contributor

  • Joseph H. T. Kim, Yonsei University, is the principal author of the initial version of this chapter. Email: for chapter comments and suggested improvements.
  • Chapter reviewers include: Chun Yong Chew, Lina Xu, Jeffrey Zheng.

TS 11.A. Estimating Poisson Regression Models

The principles of maximum likelihood estimation (mle) are introduced in Sections 3.4.2 and 4.4.2, defined in Section 17.2.2, and theoretically developed in Chapter 19. Here we present the mle procedure of Poisson regression so that the reader can see how the explanatory variables are treated in maximizing the likelihood function in the regression setting.

Maximum Likelihood Estimation for Individual Data

In Poisson regression the varying Poisson mean is determined by parameters \(\beta_i\)’s, as shown in (11.15). In this subsection we use the maximum likelihood method to estimate these parameters. Again, we assume that there are \(n\) policyholders and the \(i\)th policyholder is characterized by \(\mathbf{ x}_i=(1, x_{i1}, \ldots, x_{ik})^{\prime}\) with the observed loss count \(y_i\). Then, from (11.14) and (11.15), the log-likelihood function of vector \(\beta=(\beta_0, \dots, \beta_k)\) is given by \[\begin{align} \nonumber \log L(\beta) &= l(\beta)=\sum^n_{i=1} \left( -\mu_i +y_i \, \log \mu_i -\log y_i! \right) \\ & = \sum^n_{i=1} \left( -m_i \exp(\mathbf{ x}^{\prime}_i\beta) +y_i \,(\log m_i+\mathbf{ x}^{\prime}_i\beta) -\log y_i! \right) \tag{11.21} \end{align}\] To obtain the mle of \(\beta=(\beta_0, \ldots, \beta_k)^{\prime}\), we differentiate28 \(l(\beta)\) with respect to vector \(\beta\) and set it to zero: \[\begin{equation} \frac{\partial}{\partial \beta}l(\boldsymbol \beta)\Bigg{|}_{\beta=\mathbf{b}}=\sum^n_{i=1} \left(y_i -m_i \exp(\mathbf{ x}^{\prime}_i \mathbf{ b}) \right)\mathbf{ x}_i=\mathbf{ 0}. \tag{11.22} \end{equation}\]

Numerically solving this equation system gives the mle of \(\beta\), denoted by \(\mathbf{ b}=(b_0, b_1, \ldots, b_k)^{\prime}\). Note that, as \(\mathbf{ x}_i=(1, x_{i1}, \ldots, x_{ik})^{\prime}\) is a column vector, equation (11.22) is a system of \(k+1\) equations with both sides written as column vectors of size \(k+1\). If we denote \(\hat{\mu}_i=m_i \exp(\mathbf{ x}^{\prime}_i \mathbf{ b})\), we can rewrite (11.22) as \[ \sum^n_{i=1} \left(y_i -\hat{\mu}_i \right)\mathbf{ x}_i=\mathbf{ 0}. \] Since the solution \(\mathbf{ b}\) satisfies this equation, it follows that the first among the array of \(k+1\) equations, corresponding to the first constant element of \(\mathbf{ x}_i\), yields \[ \sum^n_{i=1}\left( y_i -\hat{\mu}_i \right)\times 1={ 0}, \] which implies that we must have \[ n^{-1}\sum_{i=1}^n y_i =\bar{y}=n^{-1}\sum_{i=1}^n \hat{\mu}_i. \] This is an interesting property saying that the average of the individual losses, \(\bar{y}\), is same as the average of the estimated values. That is, the sample mean is preserved under the fitted Poisson regression model.

Maximum Likelihood Estimation for Grouped Data

Sometimes the data are not available at the individual policy level. For example, Table 11.2 provides collective loss information for each risk class after grouping individual policies. When this is the case, \(y_i\) and \(m_i\), the quantities needed for the mle calculation in (11.22), are unavailable for each \(i\). However this does not pose a problem as long as we have the total loss counts and total exposure for each risk class.

To elaborate, let us assume that there are \(K\) different risk classes, and further that, in the \(k\)th risk class, we have \(n_k\) policies with the total exposure \(m_{(k)}\) and the average loss count \(\bar{y}_{(k)}\), for \(k=1, \ldots, K\); the total loss count for the \(k\)th risk class is then \(n_k\, \bar{y}_{(k)}\). We denote the set of indices of the policies belonging to the \(k\)th class by \(C_k\). As all policies in a given risk class share the same risk characteristics, we may denote \(\mathbf{ x}_i=\mathbf{ x}_{(k)}\) for all \(i \in C_k\). With this notation, we can rewrite (11.22) as \[\begin{align} \nonumber \sum^n_{i=1} \left(y_i -m_i \exp(\mathbf{ x}^{\prime}_i \mathbf{ b}) \right)\mathbf{ x}_i &= \sum^K_{k=1}\Big{\{}\sum_{i \in C_k} \left(y_i -m_i \exp(\mathbf{ x}^{\prime}_i \mathbf{ b}) \right)\mathbf{ x}_i \Big{\}} \\ \nonumber & =\sum^K_{k=1}\Big{\{} \sum_{i \in C_k} \left(y_i -m_i \exp(\mathbf{ x}^{\prime}_{(k)} \mathbf{ b}) \right)\mathbf{ x}_{(k)} \Big{\}} \\ \nonumber & =\sum^K_{k=1}\Big{\{} \Big(\sum_{i \in C_k}y_i -\sum_{i \in C_k}m_i \exp(\mathbf{ x}^{\prime}_{(k)} \mathbf{ b}) \Big)\mathbf{ x}_{(k)} \Big{\}} \\ & =\sum^K_{k=1} \Big(n_k\, \bar{y}_{(k)}-m_{(k)} \exp(\mathbf{ x}^{\prime}_{(k)} \mathbf{ b}) \Big)\mathbf{ x}_{(k)} =0. \tag{11.23} \end{align}\] Since \(n_k\, \bar{y}_{(k)}\) in (11.23) represents the total loss count for the \(k\)th risk class and \(m_{(k)}\) is its total exposure, we see that for Poisson regression the mle \(\mathbf{ b}\) is the same whether if we use the individual data or the grouped data.

Information matrix. Section 19.1 defines information matrices. Taking second derivatives to (11.21) gives the information matrixMatrix that measures the amount of information that an observable random variable x carries about an unknown parameter of a distribution, and is used to calculate covariance matrices of maximum likelihood estimators of the mle estimators, \[\begin{equation} \mathbf{ I}(\boldsymbol \beta)=-\mathrm{E~}{\left( \frac{\partial^2}{\partial \boldsymbol \beta \partial \boldsymbol \beta^{\prime}}l(\boldsymbol \beta) \right)}=\sum^n_{i=1}m_i \exp(\mathbf{ x}^{\prime}_i \boldsymbol \beta)\mathbf{ x}_i \mathbf{ x}_i^{\prime}=\sum^n_{i=1} {\mu}_i \mathbf{ x}_i \mathbf{ x}_i^{\prime}. \tag{11.24} \end{equation}\] For actual datasets, \({\mu}_i\) in (11.24) is replaced with \(\hat{\mu}_i=m_i \exp(\mathbf{ x}^{\prime}_i \mathbf{ b})\) to estimate the relevant variances and covariances of the mle \(\mathbf{ b}\) or its functions.

For grouped datasets, we have \[ \mathbf{ I}(\boldsymbol \beta)=\sum^K_{k=1} \Big{\{}\sum_{i \in C_k}m_i \exp(\mathbf{ x}^{\prime}_i \boldsymbol \beta)\mathbf{ x}_i \mathbf{ x}_i^{\prime} \Big{\}}=\sum^K_{k=1} m_{(k)} \exp(\mathbf{ x}^{\prime}_{(k)} \boldsymbol \beta)\mathbf{ x}_{(k)} \mathbf{ x}_{(k)}^{\prime}. \]

TS 11.B. Selecting Rating Factors

A complete discussion of rating factor selection is beyond the scope of this book. In addition to technical analyses, you have to think carefully about the type of business (personal, commercial) as well as the regulatory landscape. Nonetheless, a broad overview of some key concerns may serve to ground the reader as one thinks about the pricing of insurance contracts.

Statistical Criteria

From an analyst’s perspective, the discussion starts with the statistical significance of a rating factor. If the factor is not statistically significant, then the variable is not even worthy of consideration for inclusion in a rating plan. The statistical significance is judged not only on an in-sample basis but also on how well it fares on an out-of-sample basis, as per our discussion in Chapter 6.

It is common in insurance applications to have many rating factors. Handling multivariate aspects can be difficult with traditional univariate methods. Analysts employ techniques such as generalized linear models as described in Section 11.3.

Rating factors are introduced to create cells that contain similar risks. A rating group should be large enough to measure costs with sufficient accuracy. There is an inherent trade-off between theoretical accuracy and homogeneity.

As an example, most insurers charge the same automobile insurance premiums for drivers between the ages of 30 and 50, not varying the premium by age. Presumably costs do not vary much by age, or cost variances are due to other identifiable factors.

Operational Criteria

From a business perspective, statistical criteria only provide a starting point for discussions of potential inclusion of rating factors. Inclusion of a rating factor must also induce economically meaningful results. From an insured’s perspective, if differentiation by a factor produces little change in a rate then it is not worth including. From an insurer’s perspective, the inclusion of a factor should help segment the marketplace in a way that helps attract the business that they seek.

Rating factors should also be objective, inexpensive to administer, and verifiable. For example, automobile insurance underwriters often talk of “maturity” and “responsibility” as important criteria for youthful drivers. Yet, these are difficult to define objectively and to apply consistently. As another example, in automobile it has long been known that amount of miles (or kilometers) driven is an excellent rating factor. However, insurers have been reluctant to adopt this factor because it is subject to abuse. Historically, driving mileage has not been used because of the difficulty in verifying this variable (it is far too easy to alter the car’s odometer to change reported mileage). Going forward, modern day drivers and cars are equipped with global positioning devices and other equipment that allow insurers to use distance driven as a rating factor because it can be verified.

Rating Factors from the Perspective of a Consumer

Insurance companies sell insurance products to a variety of consumers; consequently, companies are affected by public perception. On the one hand, free market competition dictates rating factors that insurers use, as is common in commercial insurance. On the other hand, insurance may be required by law. This is common in personal insurance such as third party automobile liability and homeowners. In these instances, the mandatory and de facto mandatory purchase of insurance may mean that free market competition is insufficient to protect policyholders. Here, the following items affect the social acceptability of using a particular risk characteristic as a rating variable:

  • Affordability - introduction of some variables may be mitigated by resulting high costs of insurance.
  • Causality - other things being equal, a rating variable is easier to justify if there is a “causal” relationship with losses. A good example is the effects of smoking in life insurance. For many years, this factor was viewed with suspicion by the industry. However, over time, scientific evidence provided overwhelming evidence as this an important predictor of mortality.
  • Controllability - A controllable variable is one that is under the control of the insured, e.g., installing burglar alarms. The use of controllable rating variables encourages accident prevention.
  • Privacy concerns - people are reluctant to disclose personal information. In today’s world with increasing emphasis on social media and the availability of personal information, consumer advocates are concerned that the benefits of big data skew heavily in insurers’ favor. They reason that insureds do not have equivalent new tools to compare quality of coverage/policies and performance of insurance companies.

Example: Youthful Drivers. In some cases, a particular risk characteristic may identify a small group of insureds whose risk level is extremely high, and if used as a rating variable, the resulting premium may be unaffordable for that high-risk class. To the extent that this occurs, companies may wish to or be required by regulators to combine classes and introduce subsidies. For example, 16-year-old drivers are generally higher risk than 17-year-old drivers. Some companies have chosen to use the same rates for 16- and 17-year-old drivers to minimize the affordability issues that arise when a family adds a 16-year-old to the auto policy.

Societal Effects of Rating Factors

With public discussions of rating factors, it is also important to think about the societal effects of classification.

For example, does a rating variable encourage “good” behavior? As an example, we return to the use of distance driven as a rating factor. Many people advocate for including this variable as a factor. The motivation is that if insurance, like fuel, is priced based on distance driven, this will induce consumers to reduce the amount driven, thereby benefiting society.

One can consider other aspects of societal effects of classification, see, for example, Niehaus and Harrington (2003):

  • Re-distributive Effects - provide a cross-subsidy from e.g., high risks to low risks
  • Classification Costs - Money spent by society, insurers, to classify people appropriately.

  1. For example, if there are 3 risk factors each of which the number of levels are 2, 3 and 4, respectively, we have \(k=(2-1)\times(3-1)\times (4-1)=6\).↩︎

  2. Preferring the multiplicative formRelationship where the dependent variable is a product of the explanatory variables to others (e.g., additive one) was already hinted in (11.4).↩︎

  3. corresponding to VAgecat1.↩︎

  4. We use matrix derivative here.↩︎