Foundations of Statistical and Machine Learning for Actuaries

Authors

Edward (Jed) Frees, University of Wisconsin - Madison

Andrés Villegas Ramirez, University of New South Wales

Published

26 June 2025

Course Overview. This short course introduces statistical and machine learning with an emphasis on actuarial applications. Our approach stems from the fact that many modern machine learning tools can be interpreted through the lense of statistical principles, including classical regression techniques. Beginning with an overview of the statistical foundations, we show how to develop models based on “learning” from the data. We emphasize classical techniques widely used in actuarial applications such as those based on generalized linear models. This development of the foundation naturally leads to a modern approach towards statistical analysis known as statistical learning. For this approach, we describe techniques such as those based on boosting and tree-based methods, including random forests.

More on the Course Overview: Statistical and Machine Learning.


Course Format. The format of the course will consist of alternating blocks between presentations of the underlying principles and practical applications. In a typical block, the instructor will spend 45 minutes reviewing the insurance motivation and key mathematical underpinnings. This will be followed by a 45 minute block of time in which participants will actively explore a selected case study. Thus, it is anticipated that participants will bring a laptop.

More on the Course Format: Coding in R and Python.


Target Audience: Practicing actuaries, students, and educators interested in exposure to the foundations of insurance analytics.

Short Course Brief Outline

  • Day 1 - Foundations and Statistical Learning
  • Day 2 - Statistical and Machine Learning
    • Lecture on Ethical Aspects of AI from Dr. Fei Huang
  • Day 3 - More on Machine Learning and How it Affects Actuarial Practice
    • Lecture on AI applications in insurance context from Prof. Dani Bauer
A More Detailed Plan

Google Colaboratory and Jupyter Notebooks

To deliver this course, we will utilize two resources that will likely be unfamiliar to some participants.

  • Google colaboratory (colab for short) is a cloud-based system of servers designed to process machine learning code.
    • We will use the free base system - you only need a (free) Google account.
    • Colab handles both R and python code - perfect for our needs.
    • Machine learning applications often depend upon large datasets and utilize computationally intensive algorithms - Colab is designed to accomodate these demands.
  • Jupyter notebooks provide a handy way to combine executable code, code outputs, and text into one connected file.

Data

For this short course, you will find links to the data embedded in our notebooks (that you will retrieve on the fly). So, you will not need to download data in advance for this course.

More Data Resources

Reading

For this course, we will not ask participants to read material in advance of the course.

Nonetheless, we have listed here many freely available resources to learn about statistical and machine learning. Our hope is that the course will inspire some attendees to continue learning about these approaches after the course. If you wish to continue your learning journey, these resources may be useful to you.

More Reading Resources

Learn Python

In this course, we will not assume knowledge of Python although we will assume that participants have some familiarity with R. Through exposure to the Python scripts, we simply want to demonstrate how machine learning tools can be utilized.

A Few Python Resources