Healthcare horizons: Embracing modern challenges
The increase in life expectancy, driven by advances in medical and pharmaceutical technologies, along with the promotion of prevention and screening campaigns for chronic and infectious diseases, represents a notable achievement. However, it poses considerable challenges to the Public National Health Service (NHS), the first pillar of the Italian public healthcare system. In fact, the financial pressure on the NHS is increasing. The yearly spending reviews have not adequately covered the ongoing increase in healthcare expenditure and have not improved the waiting times for medical services.1
The increasing gap has prompted citizens to rely on out-of-pocket expenses, resulting in a significant shift towards the second and third pillars of the National Health System, which are managed by supplementary healthcare institutions (e.g., health funds) and private insurance companies.
About €40 billion (23%) of current total health expenditures in Italy is out-of-pocket and funded directly by households.2
Given these new dynamics, it is crucial for insurance companies to leverage extensive datasets to analyse the evolution of healthcare costs, as well as utilisation rates. A data-driven approach can be very useful for optimising product pricing and enhancing risk monitoring activities to protect the health and financial well-being of the population.
Public health data: How to use it strategically
In a complex market such as healthcare, it is important not only to monitor the trends in expenditures but also to evaluate the health status and spread of diseases within the population. The collection of public health data is complex, but it represents a necessary process for obtaining detailed information regarding the population's health conditions.
Public health data is accessible from several sources, including national statistics organizations (e.g., Istituto nazionale di statistica , or Istat), public and private hospitals, and healthcare organizations. These sources provide health information based on different territorial levels, professions and ages.
Hospital discharge cards3 (SDOs) and diagnosis-related groups4 (DRGs) are the two main ways public information is used to enhance and update insurance health data. In this analysis, a crucial part of the process involved creating a comprehensive 'dictionary' to map descriptions from the benefits of the private dataset to the corresponding public information, followed by a DRG code-matching process to validate these connections.
The findings showed that this flexible and scalable methodology effectively facilitated the association of key metrics derived from SDOs and DRGs to internal datasets. This integration enabled regular updates regarding financial costs and healthcare service utilization rates, while also establishing a robust framework for ongoing data analysis and policy formulation within the healthcare sector.
The application and aggregation of the underlying data show considerable implementation challenges. Once aggregated, the data reflects the differences and variety of their sources. Consequently, there are inevitable discrepancies in terms of cost and information implied in the data. These are just a few of the challenges that must be addressed when managing public data.
SDOs and DRGs are the two main public information sources used to enrich and enhance the insurance health data.
Private health data: How to develop an internal private dataset
Over the years, thanks to in-depth research and productive collaborations, Milliman has developed a deep and complete private, multivariate internal health dataset, containing primarily information on the costs and utilisation of individual healthcare services and the relative utilisation rate.
The data-gathering process involved some of the main private health portfolios, which today represent nearly 60% of the covered members of the Italian supplementary health market.
Our internal private dataset belongs exclusively to group health insurance coverages and collects information useful to provide a flexible and consistent basis for determining claim costs, utilisation and premium rates for different health plan benefits. The private dataset encompasses health experience data (e.g., 6 million lives in the latest 3 years) over a time horizon of more than 10 years.
Figure 1: Internal data’s consistency — exposure of the internal dataset for the last 3 years (in millions)
| YEAR | NUMBER OF POLICIES | EXPOSURE (COVERED LIVES) | 
|---|---|---|
| N-2 | 1,27 | 1,95 | 
| N-1 | 1,29 | 2,01 | 
| N | 1,31 | 1,88 | 
| TOTAL | 3,87 | 5,84 | 
Source: Milliman’s analysis on internal data
The internal private dataset, developed through the analysis of multiple data sources, reveals significant variety, resulting in critical issues that needed to be corrected before implementing the data aggregation process. For example, differences were noted in benefit nomenclature, granularity levels (territory- or facility-specific), underlying limits, copayments and deductibles for each type of coverage.
To bridge these issues, Milliman has implemented a scalable strategy which involves the following steps:
- Cleaning and standardisation process: The process of cleaning and standardising data formats to ensure consistency and accuracy.
- Enrichment process: The process of filling gaps and discrepancies in the dataset.
- Mapping and decoding process: The process of setting taxonomy for the features of different coverages (e.g., limits, copayments, deductibles).
- Portfolio calibration process: The process of normalising data after the mapping process, including the underlying limits.
Coverages were mapped to DRGs and grouped into macro modules, each representing a specific treatment category. For example, all coverages related to hospitalization or hospital facility access were grouped into a macro module labelled ‘Hospitalization, surgery and day-hospital.’ Additional examples of macro modules include ‘Major interventions’, ‘Diagnostic findings, Laboratory analysis and Specialist examinations’, ‘High diagnostics and oncology care’ and ‘Maternity package’.
Matching approach: How to merge an internal private dataset with open public data
Public and private healthcare data can be seen as complementary. An integrated analysis of these sources is essential for gaining a comprehensive understanding of the population's health status, healthcare service utilization and associated costs.
The adopted integrated approach involves an in-depth analysis of both private internal and public data, as well as the identification of key elements necessary for accurate matching.
One of the most challenging implementation tasks was merging the health services of the insurance coverages with the DRG codes, which uniquely classify the health diagnoses.
Enhancing health insurance rates: A benchmarking approach using stochastic models
The construction of this internal private dataset, designed to ensure continuous updates of the underlying data, has enabled the development of a benchmark for health benefits. This benchmark allows Italian insurance companies to optimise their pricing strategies for health products.
Utilising careful preprocessing and clustering techniques on the source dataset, multivariate generalised linear models (GLMs) were implemented. These models have been instrumental in developing a pricing system that accurately assesses the risk levels of both individual and portfolio health benefits.
Implementing GLMs
The dataset's benefits have been grouped into modules, such as ‘Hospitalization, surgery and day-hospital’ (M1) and ‘Major interventions’ (M2), which are the primary focus of the analysis. Following an extensive exploratory data analysis, appropriate data picking and general neutralization to reduce the impact of limits, deductibles and coverage richness, an initial cluster analysis was conducted. The selected variables were grouped based on similar basic insurance ratios, which are used as predictors for the actuarial models.
Before employing a stochastic approach to estimate the utilisation rate and associated costs, which requires establishing a solid probabilistic technical foundation, an empirical approach based purely on statistical observation was adopted. Using this method, through the study of individual univariate variables under analysis, initial relativities were obtained, which were subsequently used to perform a comparison with those obtained through predictive models, allowing for further validation and refinement of the estimates. Using Akur8, software that employs machine learning (ML) and predictive analysis to automate multivariate GLMs, the utilisation rate and average cost stochastic models for modules M1 and M2 were obtained and used to generate an index that could estimate the risk of the different portfolio compositions of the company.
To enhance the robustness of the findings, the same models were implemented using R software. This dual implementation enabled the assessment of model stability and the validation of results, thereby ensuring increased reliability and rigor in the conclusions drawn from this analysis.
Comparative analysis: A results-based conclusion
The GLM used in the two softwares found small differences on the estimation approach: R GLM was inspired by the S function,5 Akur8 used a penalised GLM with ML and artificial intelligence (AI) enhancement. The relativities obtained using both Akur8 and R exhibited similar trends, with minimal differences between them.6 However, the same cannot be said for the relativities obtained through the univariate approach. For example, as shown in Figure 2, these relativities are lower compared to those derived from the previously mentioned methods, indicating that stochastic models have a superior ability to capture the intrinsic risk associated with each class.
In fact, in contrast to the univariate approach, the output coefficients of these predictive models reflect the relationships between the target variable and key predictors (such as territory, gender, number of household members, insured age and age-gender interactions).
Optimise health insurance pricing with transparent AI to automate GLMs and generalized additive models (GAMs).7
This enables a more accurate understanding of the risk dynamics. Examples of the relativities obtained from the three models are illustrated in Figures 2 and 3.
Figure 2: Frequency model M1 — age group relativities
Source: Milliman’s analysis on internal data
Figure 3: Average cost model M1 — age group relativities
Source: Milliman’s analysis on internal data
Effects on risk management strategies
The implementation of these models has facilitated the establishment of a benchmark quotation system designed to evaluate portfolio mix and understand the main health trends customised to align with specific risk profiles. The analysis of the available data not only enables the identification of high-risk profiles but also allows for an assessment of whether an investor's portfolio is disproportionately focused on these high-risk segments. This insight supports the formulation of targeted pricing strategies aimed at achieving a more balanced portfolio composition.
For example, the adoption of pricing models that consider not only demographic factors such as age but also geographic variables enhances the accuracy of risk exposure assessments within the portfolio. The reliability and validity of this tool are bolstered by a multivariate analytical approach, which offers more robust insights and a deeper understanding, compared to traditional univariate methodologies.
Leveraging healthcare data: From health indices to quotation systems
The development of an integrated approach has enabled the creation of valuable health indices, which encompass key metrics such as annual admissions, length of stay, annual utilisation. These indices, designed to be adaptable to specific client needs based on appropriate data picking and general neutralization to reduce the impact of limits, deductibles and coverage richness, could be used to estimate the cost and utilisation of health services.
Replicating this integrated and linked approach, it is possible to generate customised indicators tailored to the specific products of different companies, enabling further refinement and personalisation of the indices.
Transform healthcare insurance with predictive data and personalised risk assessment tools.
Moreover, by harnessing the extensive availability of healthcare data and employing multivariate GLMs with Akur8, an interactive ‘quotation system’ has been developed. This system is aimed at assessing both individual and portfolio risks, tailored to each selected risk profile and their associated clusters, which helps insurance companies optimise the products offered and premium rates.
The combined use of the health indices and a quotation system not only establishes a comprehensive and up-to-date reference standard in the insurance and health context but also allows for more efficient portfolio management in terms of risk monitoring.
1 Ricciardi W, Tarricone R. The evolution of the Italian National Health Service. Lancet. 2021 Dec. Available on January 14, 2025, from https://pubmed.ncbi.nlm.nih.gov/34695372/.
2 Istituto nazionale di statistica (ISTAT). (2024). Healthcare accounting system: Healthcare expenditure by type of healthcare function and provider. Available on January 14, 2025, from http://dati.istat.it/Index.aspx?QueryId=29023.
3 Process for collecting information related to all hospitalisation episodes provided in public and private facilities, throughout the national territory.
4 Patient classification system that standardises prospective payment to hospitals and encourages cost-saving initiatives. DRGs are used to classify hospital cases into one of approximately 500 groups expected to use similar hospital resources.
5 Hastie, T.J. (Ed.). (1992). Statistical Models in S (1st ed.). Routledge. Available on January 7, 2025, from www.taylorfrancis.com/chapters/edit/10.1201/9780203738535-6/generalized-linear-models-trevor-hastie-daryl-pregibon.
6 Holmes, T., & Casotto, M. (2024). Penalized regression and lasso credibility (CAS Monograph Series No. 13). Casualty Actuarial Society. Retrieved January 7, 2025, from www.casact.org/sites/default/files/2024-10/CAS_Monograph_No_13.pdf.
7 Retrieved January 7, 2025, from www.akur8.com/pricing/risk.