---
title: "Introduction to MAIHDA"
author: "Hamid Bulut"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to MAIHDA}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 4.5
)
```

## Introduction

The **MAIHDA** package provides specialized tools for conducting Multilevel Analysis of Individual Heterogeneity and Discriminatory Accuracy. This modern epidemiological approach is highly effective for investigating intersectional health inequalities and understanding how joint social categories (e.g., Race x Gender x Education) influence individual outcomes.

By utilizing multilevel mixed-effects models (via `lme4` or `brms`), MAIHDA allows researchers to:
1. Automatically construct intersectional strata.
2. Estimate between-stratum variance and Variance Partition Coefficients (VPC).
3. Evaluate the Proportional Change in Variance (PCV): a descriptive, model-dependent measure of how much the between-stratum variance changes when additive main effects are added (it is not a causal decomposition, and a negative PCV does not by itself establish hidden structural inequality).
4. Launch an interactive Shiny Dashboard for code-free analysis.

## Installation

```{r eval=FALSE}
# Released version (CRAN):
install.packages("MAIHDA")

# Development version (GitHub):
# install.packages("remotes")
# remotes::install_github("hdbt/MAIHDA")
```

## Quick Start: the `maihda()` one-call workflow

If you just want the standard analysis, `maihda()` runs the whole pipeline in a
single call -- it fits the model, summarises the VPC/ICC and variance components,
and (optionally) compares intersectional inequality across a higher-level grouping
variable. It returns one object with `print()`, `summary()`, and `plot()` methods.

```{r quickstart}
library(MAIHDA)
data("maihda_health_data")

# Fit + variance summary in one call
analysis <- maihda(BMI ~ Age + (1 | Gender:Race:Education), data = maihda_health_data)
analysis                      # concise report (VPC/ICC, #strata)
summary(analysis)             # full variance components
plot(analysis, type = "vpc")  # any plot.maihda_model type works here
```

You can add a higher-level grouping variable to also compare across its levels.
`maihda_country_data` (PISA 2018) has a real country grouping; this workflow has
its own [group comparison vignette](group_comparison.html), so here we just point
to it:

```{r quickstart-group, eval=FALSE}
data("maihda_country_data")
by_country <- maihda(math ~ 1 + (1 | gender:ses), data = maihda_country_data,
                     group = "country")
by_country
plot(by_country, type = "group_vpc")
```

The sections below walk through the same steps individually with `fit_maihda()`
and friends, which you can use directly when you need finer control.

## Real-World Example Analysis

The package includes a pedagogical subset of the National Health and Nutrition Examination Survey (`maihda_health_data`). We will use this to examine how Body Mass Index (BMI) varies across intersectional demographic groups.

> **Note.** The bundled `maihda_health_data` and `maihda_country_data` are for teaching only. They are non-random subsamples, and the package ignores the surveys' weights and complex sampling design (and uses a single PISA plausible value), so the results below are **not** survey-representative and should not be read as substantive population findings.

### Step 1 & 2: Create Intersectional Strata and Fit a Null MAIHDA Model

Use `fit_maihda()` to combine multiple social categories directly in the random effect formula. Providing the variables in the random effect combined with `:` allows the function to automatically build the intersectional strata on the fly and fit the multilevel model.

```{r null-model}
# PVC compares variance across models, so both models must use the same
# analytic sample. Keep complete cases for all variables used below.
health_complete <- maihda_health_data[complete.cases(
  maihda_health_data[, c("BMI", "Age", "Gender", "Race", "Education", "Poverty")]
), ]

# Fit the initial Null model with auto-generated strata
model_null <- fit_maihda(
  BMI ~ 1 + (1 | Gender:Race:Education),
  data = health_complete,
  engine = "lme4"
)

# Summarize the variance components (VPC)
summary_null <- summary(model_null)
print(summary_null)
```

**Interpretation:** The resulting Variance Partition Coefficient (VPC or ICC) tells us what percentage of the total variance in BMI in the population lies *between* the intersectional social groups, rather than just *within* them.

### Step 3: Evaluate Proportional Change in Variance (PCV)

To understand whether these intersectional inequalities are largely the sum of their parts (additive), we evaluate how much the between-stratum variance changes when the additive main effects are added to the model.

If the variance drops substantially (high PCV), the between-stratum differences are much smaller once the additive main effects are included. If it remains or even *increases* (negative PCV), the additive main effects account for little of it. Interpret this cautiously: the PCV is a model-dependent variance change, not a causal decomposition, and a low or negative value does not by itself prove a "true" intersectional interaction -- it can also reflect suppression, rescaling (on the latent scale for non-Gaussian models), sample composition, or estimation uncertainty.

```{r adjusted-model}
# Fit an adjusted model on the SAME analytic sample
model_adj <- fit_maihda(
  BMI ~ Age + Gender + Race + Education + Poverty + (1 | Gender:Race:Education),
  data = health_complete
)

# Point estimate of the proportional change in between-stratum variance
pcv_result <- calculate_pvc(model_null, model_adj)
print(pcv_result)
```

For publication-ready uncertainty, add parametric bootstrap confidence intervals.
This refits the model many times, so it is not run when the vignette is built:

```{r pcv-bootstrap, eval=FALSE}
pcv_boot <- calculate_pvc(model_null, model_adj, bootstrap = TRUE, n_boot = 500)
print(pcv_boot)
```

### Step 4: Stepwise PCV Decomposition

Often, researchers want to know exactly *which* variable explained the variance. Use the `stepwise_pcv()` function to add covariates one-by-one and track the variance dynamically.

```{r stepwise}
# Run a stepwise variance decomposition using the prepared data with strata
stepwise_results <- stepwise_pcv(
  data = model_null$original_data,
  outcome = "BMI",
  vars = c("Age", "Gender", "Race", "Education", "Poverty")
)

print(stepwise_results)
```

Negative step PCVs in this table indicate an "unmasking"/suppression pattern: adding a variable increased the between-stratum variance. This is a model-dependent change in variance, not proof that a hidden structural inequality was causally revealed -- a negative PCV can also reflect suppression, rescaling, sample composition, or estimation uncertainty, so interpret it cautiously alongside the fitted effects.

### Step 5: Visualizations

The package provides multiple pre-configured, advanced visualization options for checking your model estimates natively mirroring the Shiny application logic. The dedicated [plot interpretation vignette](interpreting_plots.html) explains how to read each one.

```{r plot-predicted}
# Predicted stratum values with 95% CIs
plot(model_adj, type = "predicted")
```

```{r plot-vpc}
# Variance partition (VPC) visualization
plot(model_adj, type = "vpc")
```

```{r plot-risk}
# Mean predicted outcome against the stratum random effect
plot(model_adj, type = "risk_vs_effect")
```

```{r plot-decomp}
# Additive versus Intersectional Effect decomposition
plot(model_adj, type = "effect_decomp")
```

```{r plot-prediction-deviation}
# Individual Prediction Deviance Dashboard
plot(model_adj, type = "prediction_deviation")
```

The ternary diagnostic needs the optional `ggtern` package, so it is only drawn
when that package is installed:

```{r plot-ternary, eval = requireNamespace("ggtern", quietly = TRUE), warning = FALSE, message = FALSE}
# Ternary diagnostic: additive vs intersection-specific signal vs uncertainty
plot(model_adj, type = "ternary")
```

### Step 6: Compare Intersectional Inequality Across Groups

When the data spans several higher-level contexts -- countries, regions, survey
waves -- you often want to know whether the *degree* of intersectional
inequality differs across them. `compare_maihda_groups()` fits a separate
intercept-only MAIHDA model within each level of a grouping variable and reports
the VPC/ICC and variance components side by side.

By default the intersectional strata are defined once on the full data
(`shared_strata = TRUE`), so a stratum denotes the same combination in every
group. Shared definitions make the stratum *labels* comparable across groups, but
they are necessary rather than sufficient for the VPCs themselves to be directly
comparable: a group need not contain every stratum, so two groups' VPCs can be
estimated over different sets of *populated* strata, and differences in residual
variance and `var_between` still matter (`compare_maihda_groups()` warns when
groups end up with different populated strata; see the
[group comparison vignette](group_comparison.html)). This is a *stratified*
analysis (one model per group), not a cross-classified model.

The bundled `maihda_country_data` (OECD PISA 2018) is designed for exactly this:
it asks how much the joint **gender x socioeconomic-status** inequality in
mathematics achievement differs across six countries.

```{r group-compare}
data("maihda_country_data")

group_cmp <- compare_maihda_groups(
  math ~ 1 + (1 | gender:ses),
  data  = maihda_country_data,
  group = "country"
)
group_cmp

# VPC by country (add bootstrap = TRUE for per-group confidence intervals)
plot(group_cmp, type = "vpc")

# Between- vs within-stratum variance share by country
plot(group_cmp, type = "components")
```

Groups smaller than `min_group_n`, or with fewer than two populated strata, are
skipped with a warning rather than producing an unstable VPC; a group with no
between-stratum variation reports a VPC of 0 instead of erroring.

## Where to next

This vignette is the hub for the rest of the documentation:

- **[MAIHDA for binary outcomes](binary_outcomes.html)** -- logistic MAIHDA, the
  latent-scale VPC, and a discriminatory-accuracy (AUC) recipe.
- **[Interpreting MAIHDA plots and diagnostics](interpreting_plots.html)** -- how
  to read every plot type shown above, and what *not* to conclude from each.
- **[Comparing intersectional inequality across groups](group_comparison.html)** --
  the full `compare_maihda_groups()` / `maihda(group = ...)` workflow.
- **[Interactive data analysis](interactive_app.html)** -- the no-code Shiny
  dashboard.

## Interactive Shiny App

The MAIHDA package ships with a fully-featured, interactive Shiny Dashboard.

You can upload your own data (CSV, SPSS `.sav`, Stata `.dta`), dynamically select variables, and compute Stepwise PCV tables and prediction plots.

```{r eval=FALSE}
# Launch the interactive interface
run_maihda_app()
```

## References

- Evans, C. R., Williams, D. R., Onnela, J. P., & Subramanian, S. V. (2018). A multilevel approach to modeling health inequalities at the intersection of multiple social identities. *Social Science & Medicine*, 203, 64-73.

- Merlo, J. (2018). Multilevel analysis of individual heterogeneity and discriminatory accuracy (MAIHDA) within an intersectional framework. *Social Science & Medicine*, 203, 74-80.