Chapter 8 Appendix: Data Dictionary

This appendix describes the datasets used in this book.

For each set of data, we provide download buttons so that you can easily access the data in standard .csv (comma separated value) format. This allows you replicate and experiment with the methods developed in the book as well as sharpen your understanding through exercises.

We provide the source of each dataset. We also recommend, for deeper understanding, that you occasionally refer to these original sources to further develop your appreciation of the data underpinning the analytics developed in this book.

8.1 United States Population Mortality Counts

Source: The Human Mortality Database (HMD).

Description: We now bring into consideration mortality experience for populations that had been observed over time, which is available at the Human Mortality Database (HMD) for a wide range of countries. The available data include Exposures by age, sex, and calender year period, i.e. how many people of a given age and sex lived in the country’s population during a given period of time, and corresponding Deaths, i.e. how many of these individuals had died.

The data are available using this download button:

Table 8.1: US Population Mortality Counts First Five Rows
Year_start Year_end Age Female Male Total
1935 1939 0 4869267 5057569 9926836
1935 1939 1 4802597 4936238 9738835
1935 1939 2 5119574 5244634 10364208
1935 1939 3 5159494 5287402 10446896
1935 1939 4 5189350 5307754 10497104
Table 8.1: US Population Mortality Counts Last Five Rows
Year_start Year_end Age Female Male Total
2010 2014 106 3853.91 415.44 4269.35
2010 2014 107 2031.89 210.41 2242.30
2010 2014 108 1083.62 106.29 1189.91
2010 2014 109 535.32 58.42 593.73
2010 2014 110 576.06 80.03 656.09
Table 8.2: US Population Mortality Counts Summary Statistics
Year_start Year_end Age Female Male Total
Min. :1935 Min. :1939 Min. : 0 Min. : 56 Min. : 21 Min. : 80
1st Qu.:1954 1st Qu.:1958 1st Qu.: 27 1st Qu.: 1293833 1st Qu.: 823082 1st Qu.: 2130434
Median :1972 Median :1976 Median : 55 Median : 5263300 Median : 4850955 Median :10124747
Mean :1972 Mean :1976 Mean : 55 Mean : 4946072 Mean : 4758380 Mean : 9704453
3rd Qu.:1991 3rd Qu.:1995 3rd Qu.: 83 3rd Qu.: 8238099 3rd Qu.: 8407376 3rd Qu.:16678945
Max. :2010 Max. :2014 Max. :110 Max. :11606030 Max. :11674875 Max. :23280905

8.2 Synthetic Insurer Data

Description: We provide survival information for a hypothetical insurance company in SyntheticInsurerData.csv. The company has sold whole life insurance policies, i.e. policies that pay a death benefit upon the policyholder’s death, since 1955. Policyholders have to go through an underwriting examination, and in addition to policyholders’ age, sex (0 for female, 1 for male), smoking status (0 for non-smoker, 1 for smoker), and the month of sale, the company records the applicants body-mass index (BMI) and the systolic blood pressure at the time of underwriting. Finally, for those policyholders with a claim, i.e., for the policyholders that have died, the company records the time of death (relative to the month of underwriting). The data are organized in the order of sales, so that the oldest entries are at the top of the data and the newest entries are at the bottom:

The data are available using this download button:

Table 8.3: Synthetic Insurer First Five Rows
Month_of_Sale Age Sex Smoking BMI BloodPressure Claim Time_of_death
1 27 0 0 25.8 117 YES 55.63
1 51 1 0 17.6 109 YES 18.53
1 59 1 0 22.5 132 YES 15.88
1 37 1 0 22.9 109 YES 57.40
1 62 0 0 30.9 147 YES 27.83
Table 8.3: Synthetic Insurer Last Five Rows
Month_of_Sale Age Sex Smoking BMI BloodPressure Claim Time_of_death
780 57 1 1 19.30 118 NO NA
780 40 0 0 20.10 117 NO NA
780 27 1 0 20.60 90 NO NA
780 55 1 1 20.10 118 NO NA
780 23 1 0 19.03 82 NO NA
Table 8.4: Synthetic Insurer Summary Statistics
Month_of_Sale Age Sex Smoking
Min. : 1 Min. :19 Min. :0.000 Min. :0.000
1st Qu.:209 1st Qu.:31 1st Qu.:0.000 1st Qu.:0.000
Median :382 Median :39 Median :1.000 Median :0.000
Mean :395 Mean :40 Mean :0.699 Mean :0.298
3rd Qu.:591 3rd Qu.:48 3rd Qu.:1.000 3rd Qu.:1.000
Max. :780 Max. :65 Max. :1.000 Max. :1.000
BMI BloodPressure Claim Time_of_death
Min. :16.1 Min. : 57 NO :101399 Min. : 0
1st Qu.:19.5 1st Qu.:104 YES: 59382 1st Qu.:18
Median :21.7 Median :114 Median :28
Mean :22.8 Mean :115 Mean :28
3rd Qu.:25.0 3rd Qu.:125 3rd Qu.:39
Max. :69.6 Max. :208 Max. :64
NA’s :101399

8.3 Canine Mortality

Source: Canine mortality data are publicly available at Pegram et al. (2021b). See Pegram et al. (2021a) for additional background.

Description: In this dataset, there are 29865 observations used to study euthanasia. Our interest is in survival patterns by age and so we remove records where either the date of birth or death is missing or incorrect (we also removed 17 records with missing sex information); this results in 5933 records for analysis.

The data are available using this download button:

Table 8.5: Canine Mortality First Five Rows
animalid Sex Neutered Breed CauseDeath Method.of.death dateBirth dateDeath
3534394 Female Entire Boston Terrier Disorder not diagnsosed Euthanasia 2016-02-02 2016-02-02
11836307 Female Entire Bulldog Behaviour disorder Euthanasia 2016-09-03 2016-09-03
12090746 Male Entire Chinese Shar-Pei Disorder not diagnsosed Euthanasia 2016-03-01 2016-03-01
12092145 Female Neutered Crossbreed Collapsed Euthanasia 2016-01-03 2016-01-03
11739930 Female Entire German Shepherd Dog Disorder not diagnsosed Euthanasia 2015-04-12 2015-04-12
Table 8.5: Canine Mortality Last Five Rows
animalid Sex Neutered Breed CauseDeath Method.of.death dateBirth dateDeath
11718193 Female Entire Border Terrier Heart disease Euthanasia 1998-12-03 2018-01-02
11514456 Male Entire Jack Russell Terrier Musculoskeletal disorder Euthanasia 1996-10-02 2016-10-02
6916114 Female Entire Jack Russell Terrier Disorder not diagnsosed Euthanasia 1996-06-09 2016-07-09
11542454 Male Neutered Crossbreed Disorder not diagnsosed Unassisted 1997-01-09 2017-02-12
3963609 Male Neutered Jack Russell Terrier Heart disease Euthanasia 1996-01-05 2017-01-09
Table 8.6: Canine Mortality Summary Statistics
animalid Sex Neutered Breed
Min. : 881900 Female:2841 Entire :2529 Crossbreed :1259
1st Qu.: 2401823 Male :3092 Neutered:3404 Staffordshire Bull Terrier: 507
Median : 6846177 Labrador Retriever : 481
Mean : 6824982 Jack Russell Terrier : 352
3rd Qu.:11559402 German Shepherd Dog : 234
Max. :12493098 Yorkshire Terrier : 192
(Other) :2908
CauseDeath Method.of.death dateBirth dateDeath
Disorder not diagnsosed:1126 Euthanasia:5325 2005-01-03: 28 2016-12-12: 41
Collapsed : 523 Unassisted: 462 2005-01-05: 26 2016-10-07: 38
Neoplasia : 522 Unrecorded: 146 2004-01-01: 25 2016-08-09: 37
Mass : 375 2005-01-06: 25 2016-09-12: 37
Brain disorder : 350 2003-01-05: 24 2016-11-09: 37
Behaviour disorder : 341 2003-01-10: 24 2016-04-04: 35
(Other) :2696 (Other) :5781 (Other) :5708
Show R Code to Pre-Process the Canine Mortality Data

8.4 Canadian Female Select and Ultimate Mortality Table

Source

Select data are available using this download button:

Table 8.7: Canadian Female Select Mortality First Five Rows
Age Select0 Select1 Select2 Select3 Select4 Select5 Select6
0 0.00035 0.00011 0.00011 0.00010 0.00009 0.00009 0.00009
1 0.00023 0.00011 0.00010 0.00010 0.00009 0.00009 0.00009
2 0.00011 0.00010 0.00010 0.00009 0.00009 0.00009 0.00009
3 0.00010 0.00010 0.00009 0.00009 0.00009 0.00009 0.00009
4 0.00010 0.00009 0.00009 0.00009 0.00009 0.00009 0.00009
Select7 Select8 Select9 Select10 Select11 Select12 Select13 Select14
0.00009 0.00009 0.00009 0.00010 0.00010 0.00011 0.00012 0.00014
0.00009 0.00009 0.00009 0.00010 0.00011 0.00012 0.00013 0.00014
0.00009 0.00009 0.00010 0.00011 0.00012 0.00013 0.00014 0.00016
0.00009 0.00010 0.00011 0.00012 0.00013 0.00014 0.00016 0.00017
0.00010 0.00011 0.00012 0.00013 0.00014 0.00016 0.00017 0.00019
Table 8.7: Canadian Female Select Mortality Last Five Rows
Age Select0 Select1 Select2 Select3 Select4 Select5 Select6
76 0.00544 0.00819 0.01031 0.01236 0.01440 0.01799 0.02341
77 0.00566 0.00853 0.01073 0.01285 0.01633 0.02153 0.02751
78 0.00588 0.00887 0.01115 0.01457 0.01955 0.02530 0.03235
79 0.00609 0.00920 0.01264 0.01743 0.02298 0.02975 0.03791
80 0.00630 0.01043 0.01513 0.02049 0.02702 0.03486 0.04415
Select7 Select8 Select9 Select10 Select11 Select12 Select13 Select14
0.02963 0.03723 0.04634 0.05706 0.06945 0.08367 0.09994 0.11842
0.03484 0.04363 0.05398 0.06599 0.07978 0.09560 0.11358 0.13176
0.04082 0.05082 0.06243 0.07580 0.09115 0.10864 0.12638 0.14378
0.04755 0.05877 0.07172 0.08661 0.10359 0.12088 0.13790 0.15707
0.05499 0.06751 0.08194 0.09843 0.11526 0.13190 0.15065 0.17193

Ultimate data are available using this download button:

Table 8.8: Canadian Ultimate Mortality First Five Rows
Age qx
15 0.00015
16 0.00016
17 0.00018
18 0.00019
19 0.00021
Table 8.8: Canadian Ultimate Mortality Last Five Rows
Age qx
116 0.45
117 0.45
118 0.45
119 0.45
120 1.00
Show R Code to Pre-Process the Canadian Female Select and Ultimate Mortality Data

8.5 Korean Mortality by Insured Status

Sources:

  • Insured lives and annuitant rates were drawn from a database organized by the Society of Actuaries, https://mort.soa.org/. These represent (beginning of the year) mortality rates (\(q_x\)).
  • The general population data are drawn from the Human Mortality Database. General population data are central death rates (\(m_x\)).

Korean Mortality - Insured Lives

The data are available using this download button:

Table 8.9: Korean Female Insured Lives First Five Rows
Age qx
0 0.00505
1 0.00044
2 0.00034
3 0.00025
4 0.00019
Table 8.9: Korean Female Insured Lives Last Five Rows
Age qx
108 0.76028
109 0.80509
110 0.84620
111 0.88274
112 1.00000

Korean Mortality - Annuitants

The data are available using this download button:

Table 8.10: Korean Female Annuitant First Five Rows
Age qx
45 0.00037
46 0.00041
47 0.00045
48 0.00048
49 0.00051
Table 8.10: Korean Female Annuitant Last Five Rows
Age qx
108 0.66236
109 0.71596
110 0.76430
111 0.80563
112 1.00000

Korean Mortality - Population

The data are available using this download button:

Table 8.11: Korean Female Population First Five Rows
Year Age Female Male Total
2003 0 0.00485 0.005998 0.00545
2003 1 0.00048 0.000418 0.00045
2003 2 0.00029 0.000409 0.00035
2003 3 0.00029 0.00031 0.00030
2003 4 0.00026 0.000347 0.00031
Table 8.11: Korean Female Population Last Five Rows
Year Age Female Male Total
2018 106 0.55619 0.667144 0.57057
2018 107 0.61531 0.848961 0.64287
2018 108 0.69828 1.426573 0.76136
2018 109 0.87470 4.285714 1.00386
2018 110 1.80482 . 1.80482

8.6 Mortality by Country

Source: The Human Mortality Database. In addition to classification by country, we also look at experience by sex and age as these distinctions are well known. General population data are central death rates (\(m_x\)).

Population Mortality - Japan

The data are available using this download button:

Table 8.12: Japan Population First Five Rows
Year Age Female Male Total
1947 0 0.083595 0.095448 0.089645
1947 1 0.035643 0.037017 0.036341
1947 2 0.017271 0.017203 0.017237
1947 3 0.011227 0.01133 0.011279
1947 4 0.007023 0.007468 0.007248
Table 8.12: Japan Population Last Five Rows
Year Age Female Male Total
2019 106 0.522855 0.69263 0.537623
2019 107 0.534649 0.66628 0.544456
2019 108 0.569069 0.607969 0.571733
2019 109 0.608248 0.806337 0.619736
2019 110 0.630664 0.927907 0.643787

Population Mortality - Poland

The data are available using this download button:

Table 8.13: Poland Population First Five Rows
Year Age Female Male Total
1958 0 0.065617 0.08371 0.074886
1958 1 0.004736 0.005034 0.004888
1958 2 0.001734 0.002034 0.001888
1958 3 0.001191 0.001463 0.00133
1958 4 0.000853 0.001073 0.000965
Table 8.13: Poland Population Last Five Rows
Year Age Female Male Total
2019 106 0.423814 0.307903 0.404767
2019 107 0.394572 0.350672 0.387116
2019 108 0.657174 0.760938 0.673882
2019 109 0.529723 0 0.449214
2019 110 1.753104 0 1.517067

Population Mortality - United States

The data are available using this download button:

Table 8.14: USA Population First Five Rows
Year Age Female Male Total
1933 0 0.05418 0.06818 0.06129
1933 1 0.00887 0.01004 0.00946
1933 2 0.00402 0.00467 0.00435
1933 3 0.00287 0.00333 0.00310
1933 4 0.00223 0.00254 0.00239
Table 8.14: USA Population Last Five Rows
Year Age Female Male Total
2019 106 0.51709 0.52166 0.51764
2019 107 0.50964 0.58639 0.51840
2019 108 0.63771 0.72411 0.64703
2019 109 0.55038 0.37360 0.53175
2019 110 0.59847 0.50963 0.58832
Show R Code to Pre-Process the Mortality Data by Country

8.7 Disability Income

Source: Zayatz (2015), Tables 14A, 14B, 21A, 21B

Description:

  • Entitlement age Entryx denotes age last birthday at entitlement to disability benefits.
  • The duration Select.y is measured in years since entitlement.
  • Attained age Attainedx is calculated as sum of Entryx and duration.
  • The select and ultimate table is read across the row for 0-10 years of entitlement, and down the last (ultimate) column for 10 or more years of entitlement.

Disability Recovery Rates - Male

The data are available using this download button:

Table 8.15: Male Disability Recovery Rates First Five Rows
Entryx Select.0 Select.1 Select.2 Select.3 Select.4 Select.5
16 0.00946 0.02160 0.02233 0.01267 0.07644 0.10214
17 0.00826 0.01658 0.01450 0.01257 0.06944 0.08764
18 0.00670 0.01218 0.00836 0.01350 0.05874 0.06920
19 0.00473 0.00892 0.00630 0.01320 0.04634 0.05240
20 0.00380 0.00731 0.00554 0.01249 0.03792 0.04318
Select.6 Select.7 Select.8 Select.9 Ultimate Attainedx
0.11206 0.07516 0.05687 0.05694 0.04050 26
0.08750 0.06374 0.05063 0.04880 0.03568 27
0.06238 0.05117 0.04317 0.03816 0.03049 28
0.04754 0.04087 0.03595 0.03003 0.02587 29
0.03896 0.03292 0.02919 0.02251 0.02053 30
Table 8.15: Male Disability Recovery Rates Last Five Rows
Entryx Select.0 Select.1 Select.2 Select.3 Select.4 Select.5
60 0.00062 0.00063 0.00011 0.00079 0.00162 0.00117
61 0.00048 0.00023 0.00009 0.00062 0.00139 NA
62 0.00026 0.00016 0.00007 0.00026 NA NA
63 0.00025 0.00024 0.00011 NA NA NA
64 0.00026 0.00030 NA NA NA NA
Select.6 Select.7 Select.8 Select.9 Ultimate Attainedx
NA NA NA NA NA 70
NA NA NA NA NA 71
NA NA NA NA NA 72
NA NA NA NA NA 73
NA NA NA NA NA NA

Disability Recovery Rates - FeMale

The data are available using this download button:

Table 8.16: Female Disability Recovery Rates First Five Rows
Entryx Select.0 Select.1 Select.2 Select.3 Select.4 Select.5
16 0.01215 0.01899 0.01408 0.01206 0.04706 0.07218
17 0.00934 0.01440 0.00959 0.01099 0.04697 0.06196
18 0.00609 0.01058 0.00715 0.01107 0.04464 0.04965
19 0.00362 0.00818 0.00568 0.01007 0.03644 0.03809
20 0.00329 0.00661 0.00566 0.00979 0.02933 0.03575
Select.6 Select.7 Select.8 Select.9 Ultimate Attainedx
0.07609 0.06826 0.07033 0.06010 0.04714 26
0.06484 0.05891 0.06032 0.04966 0.03823 27
0.05337 0.04903 0.04884 0.03708 0.03098 28
0.04048 0.03914 0.03924 0.02870 0.02575 29
0.03482 0.03085 0.03089 0.02314 0.02075 30
Table 8.16: Female Disability Recovery Rates Last Five Rows
Entryx Select.0 Select.1 Select.2 Select.3 Select.4 Select.5
61 0.00042 0.00041 0.00014 0.00072 0.00114 NA
62 0.00014 0.00028 0.00014 0.00027 NA NA
63 0.00020 0.00026 0.00008 NA NA NA
64 0.00022 0.00022 NA NA NA NA
65 0.00021 NA NA NA NA NA
Select.6 Select.7 Select.8 Select.9 Ultimate Attainedx
NA NA NA NA NA 71
NA NA NA NA NA 72
NA NA NA NA NA 73
NA NA NA NA NA 74
NA NA NA NA NA 75

Death and Disability Recovery Rates - Male

The data are available using this download button:

Table 8.17: Male Death and Disability Recovery Rates First Five Rows
Entryx Select.0 Select.1 Select.2 Select.3 Select.4 Select.5
16 0.01346 0.02872 0.02758 0.01848 0.08046 0.10458
17 0.01400 0.02485 0.02070 0.01930 0.07425 0.09238
18 0.01401 0.02107 0.01529 0.02065 0.06494 0.07627
19 0.01464 0.01730 0.01359 0.02088 0.05255 0.05890
20 0.01395 0.01596 0.01244 0.02044 0.04439 0.04934
Select.6 Select.7 Select.8 Select.9 Ultimate Attainedx
0.12126 0.08229 0.06258 0.06316 0.04864 26
0.09451 0.07086 0.05702 0.05431 0.04258 27
0.06708 0.05782 0.04991 0.04365 0.03684 28
0.05211 0.04633 0.04320 0.03540 0.03175 29
0.04340 0.03904 0.03687 0.02785 0.02664 30
Table 8.17: Male Death and Disability Recovery Rates Last Five Rows
Entryx Select.0 Select.1 Select.2 Select.3 Select.4 Select.5
61 0.07463 0.04978 0.04177 0.04009 0.04029 0.04387
62 0.08180 0.05426 0.04426 0.04303 0.04556 0.04598
63 0.09568 0.06421 0.05185 0.05028 0.05032 0.05444
64 0.11528 0.07067 0.05558 0.05582 0.05614 0.06220
65 0.12402 0.07477 0.05577 0.06081 0.06072 0.06201
Select.6 Select.7 Select.8 Select.9 Ultimate Attainedx
0.04736 0.04843 0.05258 0.05482 0.06168 71
0.04972 0.05281 0.05587 0.05792 0.06546 72
0.05605 0.06006 0.06522 0.06520 0.06913 73
0.06177 0.06450 0.07336 0.07356 0.07384 74
0.06473 0.06626 0.08112 0.08198 0.07921 75

Death and Disability Recovery Rates - Female

The data are available using this download button:

Table 8.18: Female Death and Disability Recovery Rates First Five Rows
Entryx Select.0 Select.1 Select.2 Select.3 Select.4 Select.5
16 0.01481 0.02314 0.01797 0.01737 0.04979 0.07516
17 0.01449 0.02075 0.01430 0.01620 0.05078 0.06603
18 0.01364 0.01887 0.01258 0.01679 0.04945 0.05459
19 0.01337 0.01690 0.01172 0.01595 0.04237 0.04229
20 0.01250 0.01551 0.01115 0.01620 0.03486 0.04026
Select.6 Select.7 Select.8 Select.9 Ultimate Attainedx
0.08078 0.07706 0.07673 0.06399 0.05316 26
0.07066 0.06682 0.06595 0.05464 0.04434 27
0.06011 0.05638 0.05342 0.04309 0.03694 28
0.04734 0.04622 0.04415 0.03550 0.03215 29
0.04152 0.03665 0.03668 0.03017 0.02811 30
Table 8.18: Female Death and Disability Recovery Rates Last Five Rows
Entryx Select.0 Select.1 Select.2 Select.3 Select.4 Select.5
61 0.06242 0.04250 0.03321 0.03149 0.02889 0.03294
62 0.06712 0.04808 0.03830 0.03315 0.03409 0.03497
63 0.07996 0.05552 0.04142 0.04077 0.03742 0.04052
64 0.10067 0.06061 0.04973 0.04358 0.04612 0.04430
65 0.10870 0.06833 0.05638 0.04641 0.05169 0.04504
Select.6 Select.7 Select.8 Select.9 Ultimate Attainedx
0.03299 0.03625 0.03931 0.04157 0.04377 71
0.03673 0.03888 0.04240 0.04548 0.04678 72
0.04316 0.04318 0.04863 0.04894 0.05029 73
0.04906 0.04904 0.05498 0.05536 0.05343 74
0.05310 0.05493 0.06121 0.06175 0.05798 75

Test a few things

Access from Github instead of OneDrive???

Access button

Then, do a “File==> Save As” to download the data….

References

Pegram, Camilla, Carol Gray, Rowena MA Packer, Ysabelle Richards, David B Church, Dave C Brodbelt, and Dan G O’Neill. 2021a. “Proportion and Risk Factors for Death by Euthanasia in Dogs in the Uk.” Scientific Reports 11 (1): 1–12.

Pegram, Camilla, Carol Gray, Rowena MA Packer, Ysabelle Richards, David B Church, Dave C Brodbelt, and Dan G O’Neill. 2021b. “Proportion and Risk Factors for Death by Euthanasia in Dogs in the Uk– Supporting Data.” https://researchonline.rvc.ac.uk/id/eprint/13486/.

Zayatz, Tim. 2015. “Social Security Disability Insurance Program Worker Experience.” Actuarial Study 123. https://ai.ssa.gov/OACT/NOTES/pdf_studies/study123.pdf.