Foundations of Statistical and Machine Learning for Actuaries

Authors

Edward (Jed) Frees, University of Wisconsin - Madison

Andrés Villegas Ramirez, University of New South Wales

Published

21 July 2025

I Curso Verano Asociación Colombiana de Actuarios 2025

Course Overview. This short course introduces statistical and machine learning with an emphasis on actuarial applications. Our approach stems from the fact that many modern machine learning tools can be interpreted through the lense of statistical principles, including classical regression techniques. Beginning with an overview of the statistical foundations, we show how to develop models based on “learning” from the data. We emphasize classical techniques widely used in actuarial applications such as those based on generalized linear models. This development of the foundation naturally leads to a modern approach towards statistical analysis known as statistical learning. For this approach, we describe techniques such as those based on boosting and tree-based methods, including random forests.

More on the Course Overview: Statistical and Machine Learning.

Course Format. The format of the course will consist of alternating blocks between presentations of the underlying principles and practical applications. In a typical block, the instructor will spend 50 minutes reviewing the insurance motivation and key mathematical underpinnings. This will be followed by a 40 minute block of time in which participants will actively explore a selected case study. Thus, it is anticipated that participants will bring a laptop.

More on the Course Format: Coding in R and Python.

Target Audience: Practicing actuaries, students, and educators interested in exposure to the foundations of insurance analytics.

Short Course Brief Outline

Day 1 - Foundations and Statistical Learning
Day 2 - Statistical and Machine Learning
- Lecture on Ethical Aspects of AI from Dr. Fei Huang
Day 3 - More on Machine Learning and How it Affects Actuarial Practice
- Lecture on AI applications in insurance context from Prof. Dani Bauer

A More Detailed Plan

**Detailed Schedule**
Day and Time	Presenter	Topics*	Notebooks* for Participant Activity
Monday Morning	Jed	Welcome and Foundations Hello to Google Colab	Auto Liability Claims
	Jed	Classical Regression Modeling	Medical Expenditures (MEPS)
Monday Afternoon	Andrés	Resampling, Cross-validation and Regularisation	Seattle House Prices
	Andrés	Classification, Logistic Regression and Trees	Victoria road crash data
Tuesday Morning	Andrés	Tree-based Ensemble methods and Interpretability	Victoria road crash data
	Jed	Big Data, Dimension Reduction and Non-Supervised Learning	Big Data, Dimension Reduction, and Non-Supervised Learning
Tuesday Afternoon	Jed	Neural Networks	Seattle House Prices Claim Counts
	Jed	Graphic Data Neural Networks	MNIST Digits Data
Tuesday 4 pm	Fei	Fei Huang Thoughts on Ethics
Wednesday Morning	Jed	Recurrent Neural Networks, Word Embedding	Insurer Stock Returns
	Jed	Artificial Intelligence, Natural Language Processing, and ChatGPT
Wednesday After Lunch	Dani	Dani Bauer Insights
Wednesday Afternoon	Andrés	Applications and Wrap-Up
		*Click on a Topic to take you to a pdf of the slides.	*Click on a Notebook to take you to an html of the notebook (provided by NB viewer).

Presenters

Plus - Special Guest Lecturers

You can click on the links to learn more about our backgrounds.

Google Colaboratory and Jupyter Notebooks

To deliver this course, we will utilize two resources that will likely be unfamiliar to some participants.

Google colaboratory (colab for short) is a cloud-based system of servers designed to process machine learning code.
- We will use the free base system - you only need a (free) Google account.
- Colab handles both R and python code - perfect for our needs.
- Machine learning applications often depend upon large datasets and utilize computationally intensive algorithms - Colab is designed to accomodate these demands.
Jupyter notebooks provide a handy way to combine executable code, code outputs, and text into one connected file.
- You can take a look at the course notebooks by going to nbviewer site. Then, enter the course Github repo URL https://github.com/OpenActTextDev/ActuarialRegression, select the folder and then a notebook that you want to view.
- To interact with notebook, go to Colab!

Data

For this short course, you will find links to the data embedded in our notebooks (that you will retrieve on the fly). So, you will not need to download data in advance for this course.

More Data Resources

Reading

For this course, we will not ask participants to read material in advance of the course.

Nonetheless, we have listed here many freely available resources to learn about statistical and machine learning. Our hope is that the course will inspire some attendees to continue learning about these approaches after the course. If you wish to continue your learning journey, these resources may be useful to you.

Learn Python

In this course, we will not assume knowledge of Python although we will assume that participants have some familiarity with R. Through exposure to the Python scripts, we simply want to demonstrate how machine learning tools can be utilized.

Foundations of Statistical and Machine Learning for Actuaries

More on the Course Overview: Statistical and Machine Learning.

More on the Course Format: Coding in R and Python.

Short Course Brief Outline

A More Detailed Plan

Presenters

Google Colaboratory and Jupyter Notebooks

Data

More Data Resources

Reading

More Reading Resources

1 Actuarial courses

2 Other actuarial resources

2.1 Thought Provoking Articles and Commentaries

3 Other Courses (Some with Books)

Statistical Learning

Machine Learning and AI

Books

Learn Python

A Few Python Resources