--- title: "Introduction to MAIHDA" author: "Hamid Bulut" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to MAIHDA} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4.5 ) ``` ## Introduction The **MAIHDA** package provides specialized tools for conducting Multilevel Analysis of Individual Heterogeneity and Discriminatory Accuracy. This modern epidemiological approach is highly effective for investigating intersectional health inequalities and understanding how joint social categories (e.g., Race x Gender x Education) influence individual outcomes. By utilizing multilevel mixed-effects models (via `lme4` or `brms`), MAIHDA allows researchers to: 1. Automatically construct intersectional strata. 2. Estimate between-stratum variance and Variance Partition Coefficients (VPC). 3. Evaluate the Proportional Change in Variance (PCV): a descriptive, model-dependent measure of how much the between-stratum variance changes when additive main effects are added (it is not a causal decomposition, and a negative PCV does not by itself establish hidden structural inequality). 4. Launch an interactive Shiny Dashboard for code-free analysis. ## Installation ```{r eval=FALSE} # Released version (CRAN): install.packages("MAIHDA") # Development version (GitHub): # install.packages("remotes") # remotes::install_github("hdbt/MAIHDA") ``` ## Quick Start: the `maihda()` one-call workflow If you just want the standard analysis, `maihda()` runs the whole pipeline in a single call -- it fits the model, summarises the VPC/ICC and variance components, and (optionally) compares intersectional inequality across a higher-level grouping variable. It returns one object with `print()`, `summary()`, and `plot()` methods. ```{r quickstart} library(MAIHDA) data("maihda_health_data") # Fit + variance summary in one call analysis <- maihda(BMI ~ Age + (1 | Gender:Race:Education), data = maihda_health_data) analysis # concise report (VPC/ICC, #strata) summary(analysis) # full variance components plot(analysis, type = "vpc") # any plot.maihda_model type works here ``` You can add a higher-level grouping variable to also compare across its levels. `maihda_country_data` (PISA 2018) has a real country grouping; this workflow has its own [group comparison vignette](group_comparison.html), so here we just point to it: ```{r quickstart-group, eval=FALSE} data("maihda_country_data") by_country <- maihda(math ~ 1 + (1 | gender:ses), data = maihda_country_data, group = "country") by_country plot(by_country, type = "group_vpc") ``` The sections below walk through the same steps individually with `fit_maihda()` and friends, which you can use directly when you need finer control. ## Real-World Example Analysis The package includes a pedagogical subset of the National Health and Nutrition Examination Survey (`maihda_health_data`). We will use this to examine how Body Mass Index (BMI) varies across intersectional demographic groups. > **Note.** The bundled `maihda_health_data` and `maihda_country_data` are for teaching only. They are non-random subsamples, and the package ignores the surveys' weights and complex sampling design (and uses a single PISA plausible value), so the results below are **not** survey-representative and should not be read as substantive population findings. ### Step 1 & 2: Create Intersectional Strata and Fit a Null MAIHDA Model Use `fit_maihda()` to combine multiple social categories directly in the random effect formula. Providing the variables in the random effect combined with `:` allows the function to automatically build the intersectional strata on the fly and fit the multilevel model. ```{r null-model} # PVC compares variance across models, so both models must use the same # analytic sample. Keep complete cases for all variables used below. health_complete <- maihda_health_data[complete.cases( maihda_health_data[, c("BMI", "Age", "Gender", "Race", "Education", "Poverty")] ), ] # Fit the initial Null model with auto-generated strata model_null <- fit_maihda( BMI ~ 1 + (1 | Gender:Race:Education), data = health_complete, engine = "lme4" ) # Summarize the variance components (VPC) summary_null <- summary(model_null) print(summary_null) ``` **Interpretation:** The resulting Variance Partition Coefficient (VPC or ICC) tells us what percentage of the total variance in BMI in the population lies *between* the intersectional social groups, rather than just *within* them. ### Step 3: Evaluate Proportional Change in Variance (PCV) To understand whether these intersectional inequalities are largely the sum of their parts (additive), we evaluate how much the between-stratum variance changes when the additive main effects are added to the model. If the variance drops substantially (high PCV), the between-stratum differences are much smaller once the additive main effects are included. If it remains or even *increases* (negative PCV), the additive main effects account for little of it. Interpret this cautiously: the PCV is a model-dependent variance change, not a causal decomposition, and a low or negative value does not by itself prove a "true" intersectional interaction -- it can also reflect suppression, rescaling (on the latent scale for non-Gaussian models), sample composition, or estimation uncertainty. ```{r adjusted-model} # Fit an adjusted model on the SAME analytic sample model_adj <- fit_maihda( BMI ~ Age + Gender + Race + Education + Poverty + (1 | Gender:Race:Education), data = health_complete ) # Point estimate of the proportional change in between-stratum variance pcv_result <- calculate_pvc(model_null, model_adj) print(pcv_result) ``` For publication-ready uncertainty, add parametric bootstrap confidence intervals. This refits the model many times, so it is not run when the vignette is built: ```{r pcv-bootstrap, eval=FALSE} pcv_boot <- calculate_pvc(model_null, model_adj, bootstrap = TRUE, n_boot = 500) print(pcv_boot) ``` ### Step 4: Stepwise PCV Decomposition Often, researchers want to know exactly *which* variable explained the variance. Use the `stepwise_pcv()` function to add covariates one-by-one and track the variance dynamically. ```{r stepwise} # Run a stepwise variance decomposition using the prepared data with strata stepwise_results <- stepwise_pcv( data = model_null$original_data, outcome = "BMI", vars = c("Age", "Gender", "Race", "Education", "Poverty") ) print(stepwise_results) ``` Negative step PCVs in this table indicate an "unmasking"/suppression pattern: adding a variable increased the between-stratum variance. This is a model-dependent change in variance, not proof that a hidden structural inequality was causally revealed -- a negative PCV can also reflect suppression, rescaling, sample composition, or estimation uncertainty, so interpret it cautiously alongside the fitted effects. ### Step 5: Visualizations The package provides multiple pre-configured, advanced visualization options for checking your model estimates natively mirroring the Shiny application logic. The dedicated [plot interpretation vignette](interpreting_plots.html) explains how to read each one. ```{r plot-predicted} # Predicted stratum values with 95% CIs plot(model_adj, type = "predicted") ``` ```{r plot-vpc} # Variance partition (VPC) visualization plot(model_adj, type = "vpc") ``` ```{r plot-risk} # Mean predicted outcome against the stratum random effect plot(model_adj, type = "risk_vs_effect") ``` ```{r plot-decomp} # Additive versus Intersectional Effect decomposition plot(model_adj, type = "effect_decomp") ``` ```{r plot-prediction-deviation} # Individual Prediction Deviance Dashboard plot(model_adj, type = "prediction_deviation") ``` The ternary diagnostic needs the optional `ggtern` package, so it is only drawn when that package is installed: ```{r plot-ternary, eval = requireNamespace("ggtern", quietly = TRUE), warning = FALSE, message = FALSE} # Ternary diagnostic: additive vs intersection-specific signal vs uncertainty plot(model_adj, type = "ternary") ``` ### Step 6: Compare Intersectional Inequality Across Groups When the data spans several higher-level contexts -- countries, regions, survey waves -- you often want to know whether the *degree* of intersectional inequality differs across them. `compare_maihda_groups()` fits a separate intercept-only MAIHDA model within each level of a grouping variable and reports the VPC/ICC and variance components side by side. By default the intersectional strata are defined once on the full data (`shared_strata = TRUE`), so a stratum denotes the same combination in every group. Shared definitions make the stratum *labels* comparable across groups, but they are necessary rather than sufficient for the VPCs themselves to be directly comparable: a group need not contain every stratum, so two groups' VPCs can be estimated over different sets of *populated* strata, and differences in residual variance and `var_between` still matter (`compare_maihda_groups()` warns when groups end up with different populated strata; see the [group comparison vignette](group_comparison.html)). This is a *stratified* analysis (one model per group), not a cross-classified model. The bundled `maihda_country_data` (OECD PISA 2018) is designed for exactly this: it asks how much the joint **gender x socioeconomic-status** inequality in mathematics achievement differs across six countries. ```{r group-compare} data("maihda_country_data") group_cmp <- compare_maihda_groups( math ~ 1 + (1 | gender:ses), data = maihda_country_data, group = "country" ) group_cmp # VPC by country (add bootstrap = TRUE for per-group confidence intervals) plot(group_cmp, type = "vpc") # Between- vs within-stratum variance share by country plot(group_cmp, type = "components") ``` Groups smaller than `min_group_n`, or with fewer than two populated strata, are skipped with a warning rather than producing an unstable VPC; a group with no between-stratum variation reports a VPC of 0 instead of erroring. ## Where to next This vignette is the hub for the rest of the documentation: - **[MAIHDA for binary outcomes](binary_outcomes.html)** -- logistic MAIHDA, the latent-scale VPC, and a discriminatory-accuracy (AUC) recipe. - **[Interpreting MAIHDA plots and diagnostics](interpreting_plots.html)** -- how to read every plot type shown above, and what *not* to conclude from each. - **[Comparing intersectional inequality across groups](group_comparison.html)** -- the full `compare_maihda_groups()` / `maihda(group = ...)` workflow. - **[Interactive data analysis](interactive_app.html)** -- the no-code Shiny dashboard. ## Interactive Shiny App The MAIHDA package ships with a fully-featured, interactive Shiny Dashboard. You can upload your own data (CSV, SPSS `.sav`, Stata `.dta`), dynamically select variables, and compute Stepwise PCV tables and prediction plots. ```{r eval=FALSE} # Launch the interactive interface run_maihda_app() ``` ## References - Evans, C. R., Williams, D. R., Onnela, J. P., & Subramanian, S. V. (2018). A multilevel approach to modeling health inequalities at the intersection of multiple social identities. *Social Science & Medicine*, 203, 64-73. - Merlo, J. (2018). Multilevel analysis of individual heterogeneity and discriminatory accuracy (MAIHDA) within an intersectional framework. *Social Science & Medicine*, 203, 74-80.