Breaking the Black Box: Inherently Interpretable Physics-Informed Machine Learning for Imbalanced Seismic Data

Publié le 27 août 2025

Auteurs : Sreenath Vemula, Filippo Gatti, Pierre Jehel

Earthquakes are the most damaging natural disasters in the past century, making ground motion models (GMMs) critical components in seismic risk mitigation and infrastructure design. With rapidly expanding strong motion databases, machine learning (ML) models are increasingly applied to GMM development. However, existing ML-based GMMs rely on post-hoc explanations (e.g., SHAP) to understand model behavior, creating opacity that undermines confidence in high-stakes engineering decisions. Moreover, seismic datasets exhibit severe imbalance, with scarce large-magnitude near-field records compared to abundant small-magnitude far-field data, causing systematic underprediction of critical damage-inducing scenarios. Despite these fundamental limitations, research addressing both interpretability and data imbalance remains limited. This study develops an inherently interpretable neural network architecture employing independent additive pathways coupled with a novel HazBinLoss function. HazBinLoss integrates physicsinformed weighting with inverse bin count scaling to address underfitting in sparse, high-hazard regions systematically. The developed model achieves robust performance metrics: mean squared error = 0.5402, mean absolute error = 0.5795, and coefficient of determination = 88.51%. Pathway scaling analysis confirms adherence to established seismological principles across all spectral periods. Comparative SHAP analysis reveals significant limitations, with systematic underestimation of multiple pathway contributions, validating the necessity of inherently interpretable architectures. Nonlinear mixed-effects analysis demonstrates unbiased residuals with sigma components ranging from 0.4033-0.5463 (inter-event), 0.1641-0.4467 (inter-region), 0.5542-0.6892 (intra-region-event), and 0.7449-0.9460 (total). Model predictions exhibit strong agreement with NGA-West2 GMMs and observed data across diverse input conditions. This interpretable framework establishes a foundation for transparent, physics-consistent ML adoption in seismic hazard assessment and risk-informed decision making.