FibroX: a machine learning model for detecting and prognosticating advanced fibrosis in metabolic dysfunction-associated steatotic liver disease

Basile Njei; Yazan A. Al-Ajlouni; Ysabel Ilagan-Ying; Omar Al Ta’ani; Sarpong Boateng; Nelvis Njei; Ulrick Sidney Kanmounye; Joseph Lim; Bubu Banini

doi:10.21037/tgh-2026-0017

Original Article

FibroX: a machine learning model for detecting and prognosticating advanced fibrosis in metabolic dysfunction-associated steatotic liver disease

Basile Njei^1,2,3,4,5,6, Yazan A. Al-Ajlouni^6,7, Ysabel Ilagan-Ying^2,3, Omar Al Ta’ani⁸, Sarpong Boateng⁹, Nelvis Njei¹⁰, Ulrick Sidney Kanmounye¹¹, Joseph Lim², Bubu Banini²

¹Engelhardt School of Global Health and Bioethics, Euclid University, Bangui, Central African Republic; ²Section of Digestive Diseases, Department of Medicine, Yale University, New Haven, CT, USA; ³VA Connecticut Healthcare, West Haven, CT, USA; ⁴Ohio University Heritage College of Osteopathic Medicine, Athens, OH, USA; ⁵Yale Liver Center, Yale New Haven Health, New Haven, CT, USA; ⁶Yale International Medicine Program, Yale University, New Haven, CT, USA; ⁷Department of Rehabilitation, Montefiore Medical Center, Bronx, NY, USA; ⁸Department of Medicine, Allegheny Health Network, Pittsburgh, PA, USA; ⁹Yale Affiliated Hospitals Program, Bridgeport, CT, USA; ¹⁰Centers for Machine Learning Intelligence (M-LINT), Ellicott City, MD, USA; ¹¹Research Department, Association of Future African Neurosurgeons, Yaounde, Cameroon

Contributions: (I) Conception and design: B Njei; (II) Administrative support: B Njei, YA Al-Ajlouni; (III) Provision of study materials or patients: B Njei, Y Ilagan-Ying, O Al Ta’ani, S Boateng; (IV) Collection and assembly of data: YA Al-Ajlouni, Y Ilagan-Ying, O Al Ta’ani, S Boateng, N Njei, US Kanmounye; (V) Data analysis and interpretation: All authors; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

Correspondence to: Basile Njei, MD, MPH, PhD. Engelhardt School of Global Health and Bioethics, Euclid University, Avenue de France, Campus ENAM, Bangui, Central African Republic; Section of Digestive Diseases, Department of Medicine, Yale University, New Haven, CT, USA; VA Connecticut Healthcare, West Haven, CT, USA; Ohio University Heritage College of Osteopathic Medicine, Athens, OH, USA; Yale Liver Center, Yale New Haven Health, New Haven, CT, USA; Yale International Medicine Program, Yale University, New Haven, CT, USA. Email: basilenjei@gmail.com.

Background: Noninvasive liver disease assessment (NILDA) tools, such as the fibrosis-4 (FIB-4) index, are widely used to identify patients with metabolic dysfunction-associated steatotic liver disease (MASLD) at risk of advanced fibrosis (≥ F3). However, their predictive accuracy for fibrosis staging and long-term outcomes remains limited. This study aimed to develop and validate FibroX, an explainable machine learning-based model for improving the detection of advanced fibrosis and the prediction of long-term clinical outcomes in adults with MASLD.

Methods: FibroX was developed using data from adults with MASLD (N=1,487) in the National Health and Nutrition Examination Survey (NHANES) 2017–2020. Fibrosis stage was determined using a two-step approach—initial NILDA screening followed by vibration-controlled transient elastography (TE)—in accordance with 2024 American Association for the Study of Liver Diseases guidance. The model was built using extreme gradient boosting (XGBoost), optimized at a 95% specificity threshold, and internally validated using 5-fold cross-validation. External validation was performed in two independent cohorts: 337 biopsy-confirmed MASLD patients and 4,276 participants from NHANES III with up to 30 years of mortality follow-up. Model interpretability was evaluated using SHapley Additive Explanations (SHAP).

Results: In NHANES 2017–2020, FibroX demonstrated superior discrimination for advanced fibrosis compared with FIB-4 [area under the receiver operating characteristic curve (AUROC) 0.97 vs. 0.62; P<0.001]. In biopsy-confirmed MASLD patients, FibroX also outperformed FIB-4 (AUROC 0.84 vs. 0.82; P<0.001). In NHANES III, FibroX predicted all-cause mortality (C-statistic 0.80) and remained independently associated with cardiovascular mortality after adjustment for established risk factors [adjusted hazard ratio (HR) 1.22; 95% confidence interval (CI): 1.01–1.47]. SHAP analysis identified platelet count, age, hemoglobin A1c, aspartate aminotransferase (AST), and estimated glomerular filtration rate (eGFR) as the most influential predictors.

Conclusions: FibroX is a transparent and accurate machine learning model that improves detection of advanced fibrosis and prediction of long-term cardiovascular mortality compared with FIB-4. These findings support the use of explainable machine learning-based noninvasive tools for risk stratification and outcome prediction in patients with MASLD.

Keywords: Metabolic dysfunction-associated steatotic liver disease; advanced fibrosis; explainable artificial intelligence; SHapley Additive Explanations (SHAP)

Received: 13 February 2026; Accepted: 23 April 2026; Published online: 28 May 2026.

doi: 10.21037/tgh-2026-0017

Highlight box

Key findings

• FibroX, an explainable machine learning model using seven routine clinical variables, demonstrated high accuracy for detecting advanced fibrosis in metabolic dysfunction-associated steatotic liver disease (MASLD).

• FibroX outperformed fibrosis-4 (FIB-4) in the derivation cohort (area under the receiver operating characteristic curve 0.97 vs. 0.62) and showed modest but significant improvement in biopsy-based validation.

• FibroX predicted long-term outcomes, including 30-year all-cause and cardiovascular mortality, with independent prognostic value after multivariable adjustment.

• SHapley Additive Explanations-based explanations identified platelet count, age, hemoglobin A1c, aspartate aminotransferase, and estimated glomerular filtration rate as key contributors to model predictions.

What is known and what is new?

• Existing noninvasive tools (e.g., FIB-4) are widely used but have limitations in specificity, intermediate-risk classification, and prognostic performance.

• FibroX integrates explainable machine learning with routinely available clinical data to improve fibrosis detection and provides transparent, individualized risk interpretation while incorporating long-term mortality prediction.

What is the implication, and what should change now?

• FibroX offers a scalable, noninvasive tool that may enhance early identification of high-risk MASLD patients and improve risk stratification beyond current approaches.

• Its interpretability supports clinical trust and integration into real-world workflows, particularly in primary care settings.

• Future efforts should focus on prospective validation, integration into electronic health records, and evaluation alongside existing clinical pathways to guide implementation.

Introduction

Metabolic dysfunction-associated steatotic liver disease (MASLD) is a growing global health issue, driven by rising obesity, insulin resistance, and metabolic disorders. It affects 24% to 38% of the global population, with rates as high as 67% to 83% in individuals with type 2 diabetes (1-3). MASLD can progress to advanced fibrosis (≥ F3), cirrhosis, and liver failure, increasing the risk of cardiovascular disease and malignancies (4). Early detection of advanced fibrosis is critical for timely intervention, which can significantly reduce liver-related morbidity and mortality (5). Identifying fibrosis early correlates with better long-term outcomes, even in asymptomatic individuals (6).

Current diagnostic tools, such as liver biopsy and transient elastography (TE), and non-invasive scores, including the fibrosis-4 (FIB-4) index and the nonalcoholic fatty liver disease (NAFLD) fibrosis score (NFS), have limitations. Liver biopsy, though the gold standard, is invasive and prone to sampling error, while TE can be affected by liver inflammation and requires specialized equipment (7-9). Current fibrosis risk stratification increasingly relies on stepwise noninvasive pathways that combine simple blood-based scores, such as FIB-4, with imaging-based assessment when indicated; however, these approaches still have important limitations in specificity, intermediate-risk classification, and performance across populations (10-12). Thus, there is a clear need for accurate, non-invasive tools that leverage readily available clinical data to detect advanced fibrosis early in the disease process.

Artificial intelligence (AI) is transforming medical diagnostics, particularly in imaging and non-invasive predictive modeling. It has proven to be invaluable in healthcare by enhancing diagnostic accuracy and efficiency. For example, AI algorithms are now routinely used in medical imaging to detect liver conditions, such as fibrosis and tumors, with high precision by analyzing images from ultrasound, computed tomography (CT), and magnetic resonance imaging (MRI) scans (13,14). In addition to imaging, algorithm-based non-invasive models, like the FIB-4 index and NFS, have revolutionized liver disease assessment by accurately predicting fibrosis stages without the need for invasive biopsies, enabling timely interventions (15). While AI holds immense potential to improve clinical outcomes, challenges such as the need for large, diverse training datasets and algorithmic transparency persist. The adoption of AI in healthcare depends on the explainability of its predictions, as models like extreme gradient boosting (XGBoost) provide insights into how predictions are made. XGBoost is an improvement from the classic “black box” pattern-recognition model that identifies the contributors of a prediction and fosters trust among healthcare providers and patients, thereby ensuring clinical integration (16).

Existing AI models for liver fibrosis diagnosis, though promising, still face significant limitations, such as challenges in distinguishing between intermediate fibrosis stages and variations in diagnostic accuracy across diverse populations (17). The fixed formulas of non-AI prediction models limit precision and specificity for individual patients, posing a challenge in guiding optimal management. There remains a need for more accurate, non-invasive tools that provide clinicians with actionable insights from widely available clinical data (18). In this study, we developed, validated, and assessed the long-term prognostic value of FibroX, an explainable XGBoost-based ML model for detecting advanced fibrosis in MASLD. Using three independent cohorts, a derivation cohort from the National Health and Nutrition Examination Survey (NHANES) 2017–2020, an external validation cohort of biopsy-proven fibrosis cases from Yale University, and a prognostication cohort from NHANES III (1988–1994) with up to 30 years of follow-up, we aimed to evaluate the model’s diagnostic accuracy and its ability to stratify mortality risk. Our findings provide insights into the role of ML-based risk stratification in MASLD and its potential implications for clinical practice. We present this article in accordance with the TRIPOD reporting checklist (available at https://tgh.amegroups.com/article/view/10.21037/tgh-2026-0017/rc).

Methods

Study design and populations

We used three independent cohorts to develop, validate, and evaluate FibroX for advanced fibrosis in MASLD. The derivation cohort included 1,487 adults from NHANES 2017–2020 with hepatic steatosis [transient elastography-controlled attenuation parameter (TE-CAP) ≥275 dB/m] and ≥1 cardiometabolic risk factor (19). The validation cohort consisted of patients with biopsy-proven MASLD from the Yale Fatty Liver Disease Program. Prognostic performance was assessed in NHANES III (1988–1994) participants (N=4,276) followed for up to 30 years. Fibrosis assessment differed across cohorts, with a two-step noninvasive liver disease assessment (NILDA) plus TE approach in NHANES 2017–2020, biopsy-confirmed staging in the external validation cohort, and ultrasound-defined steatosis with cardiometabolic criteria in NHANES III for prognostic analyses. Key differences in cohort characteristics and fibrosis assessment methods are summarized in Table S1. Detailed eligibility criteria are provided in Appendix 1.

Outcomes and predictors

In the derivation cohort, the primary outcome was advanced fibrosis (≥F3) as estimated by TE-based liver stiffness measurement (TE-LSM). Fibrosis stage was estimated using a modified two-step approach aligned with the 2024 American Association for the Study of Liver Diseases (AASLD) guidance, combining blood-based NILDA tools with vibration-controlled TE (advanced fibrosis defined as ≥12 kPa). In the validation cohort, the outcome was biopsy-confirmed advanced liver fibrosis. In the prognostic cohort, outcomes included all-cause and cardiovascular mortality, with a 30-year follow-up. Candidate predictors included age, body mass index (BMI), alanine aminotransferase (ALT), aspartate aminotransferase (AST), platelet count, estimated glomerular filtration rate [eGFR, 2021 Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) equation], and HbA1c. These variables were selected based on prior evidence linking them to fibrosis and mortality in MASLD (20). Additionally, we assessed the performance of FibroX and FIB-4 for liver-related clinical outcomes [i.e., liver events, liver decompensation, hepatocellular carcinoma, and at-risk metabolic dysfunction-associated steatohepatitis (MASH)] and evaluated correlations between FibroX scores and histologic features in the biopsy cohort.

Model development and validation

An XGBoost classifier was developed to predict advanced fibrosis. FibroX, based on XGBoost, is a supervised machine learning model that constructs multiple decision trees to capture nonlinear relationships among clinical and laboratory variables. The model uses seven routinely available predictors, including age, BMI, AST, ALT, platelet count, eGFR, and hemoglobin A1c, to estimate the probability of advanced fibrosis. Although several of these variables overlap with those used in FIB-4, FibroX differs by applying gradient-boosting to adaptively weight features and model complex interactions, thereby improving accuracy and generalizability. XGBoost was selected for its strong performance with structured clinical data, ability to model complex nonlinear relationships, and resilience to missing values. XGBoost’s native ability to handle missing data through learned split directions was leveraged, and no additional imputation was performed. Continuous variables were used in their original scale, as tree-based models do not require feature normalization. These advantages make it well-suited for population-based datasets like NHANES. To enhance interpretability, we applied SHapley Additive Explanations (SHAP), which quantifies each variable’s contribution to the model output and is introduced in detail below. Model development and internal validation were performed using a stratified 5-fold cross-validation framework, with performance metrics averaged across folds to estimate generalizability. Class imbalance was addressed by applying class weights during model training, with higher weights assigned to the minority class (advanced fibrosis) to improve sensitivity to clinically significant cases. Model performance was evaluated using the area under the receiver operating characteristic curve (AUROC), sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). Logistic regression served as a benchmark comparator.

The XGBoost classifier was implemented using a binary logistic objective function to estimate the probability of advanced fibrosis. Model hyperparameters, including maximum tree depth, learning rate, number of estimators, subsampling ratio, and column sampling, were optimized using the tree-structured Parzen estimator (TPE) framework. Optimization was guided by maximizing the harmonic mean of specificity and PPV, reflecting the clinical priority of minimizing false positives while maintaining diagnostic precision. Final model parameters were selected based on performance in the validation folds.

External validation was performed in the Yale biopsy cohort, with biopsy-confirmed MASLD. Prognostic value was further tested in NHANES III using Cox proportional hazards models for 30-year all-cause and cardiovascular mortality.

Explainability and fairness

Model interpretability was evaluated using SHAP, which quantified the relative influence of each predictor on FibroX outputs at both the global and individual levels. This approach provided transparency into the model’s decision process, supporting clinical interpretability and trust. Fairness was evaluated by testing performance across age, sex, and racial subgroups.

Statistical analysis

Continuous variables were compared using t-tests and categorical variables using χ² tests. AUROCs were compared with DeLong’s test. Mortality models adjusted for demographic and cardiometabolic risk factors, including age, sex, race, smoking, education, hypertension, and diabetes. Analyses were conducted in Python 3.8 using XGBoost, Hyperopt, and Lifelines.

A comprehensive description of cohort definitions, fibrosis staging procedures, exclusion criteria, and full analytic details is available in Appendix 1.

Ethics approval and consent to participate

The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Yale University Institutional Review Board (No. HIC 2000027433). Informed consent for the Yale biopsy cohort was waived by the Yale University Institutional Review Board because the analysis used retrospective/de-identified patient-level data. Analyses of publicly available NHANES data did not require additional institutional review board approval.

Results

Study population and baseline characteristics

The analytic samples consisted of 1,487 participants in the training cohort, 337 in the validation biopsy cohort, and 4,276 in the prognostication cohort. In the training cohort, 19.8% had estimated advanced fibrosis. In the biopsy validation cohort, 48.1% had biopsy-confirmed advanced fibrosis. The prognostication cohort, followed for up to 30 years, included 4,276 individuals with MASLD, of whom 44.3% died from any cause, and 14.6% died from cardiovascular causes. Baseline demographic and clinical characteristics are presented in Table 1.

Table 1

Baseline demographic characteristics of the training, validation and prognostication cohorts of patients with MASLD

Variables	Training cohort (N=1,487)	Validation cohort (N=337)	Prognostication cohort (N=4,276)
Age (years)^†	49.5±16.3	51.5±14.4	47.6±15.3
Gender, %
Female	48.1	58.5	50.8
Male	51.9	41.5	49.2
Race-ethnicity, %
Hispanic	32.7	27.3	38.0
Non-Hispanic Asian	11.1	2.4	2.4
Non-Hispanic Black	15.4	4.5	23.6
Non-Hispanic White	35.8	62.3	34.5
Other	4.9	3.6	3.9
Body mass index (kg/m²)^†	32.5±5.8	34.6±8.3	30.1±6.5
Alanine aminotransferase (U/L)^†	24.4±17.2	68.9±75.4	23.3±21.0
Aspartate aminotransferase (U/L)^†	21.6±14.1	59.9±66.4	24.9±18.5
Gamma glutamyl transferase (IU/L)^†	36.5±48.6	126.3±130.2	42.9±62.4
Platelet count (1,000 cells/µL)^†	256.8±72.8	221.5±81.8	278.9±71.8
Hemoglobin A1c (%)^†	6.0±1.1	6.8±1.6	5.8±1.4
Direct HDL-cholesterol (mg/dL)^†	47.8±12.8	43.9±22.7	46.6±14.7
Triglyceride (mg/dL)^†	156.5±176.9	161.4±97.0	188.5±152.1
2021 CKD-EPI eGFR (mL/min/1.73 m²)^†	98.2±20.5	94.0±23.1	79.0±18.2
Creatinine (mg/dL)^†	0.8±0.3	0.9±0.6	1.1±0.3
Albumin (g/dL)^†	4.0±0.3	4.1±0.6	4.1±0.4
Cardiometabolic risk factors, n (%)
Obesity	157 (96.9)	96 (92.0)	3,774 (88.3)
Type 2 diabetes mellitus	124 (76.5)	53 (51.0)	2,232 (52.2)
Hypertension	105 (64.8)	64 (61.7)	2,207 (51.6)
Dyslipidemia	107 (66.0)	47 (45.1)	2,745 (64.2)
Blood-based NILDA—fibrosis-4^†	1.0±0.8	2.5±4.4	1.0±0.8
NAFLD fibrosis score^†	0.3±1.4	−0.5±2.3	−1.6±1.6

^†, presented as mean ± standard deviation. CKD-EPI, Chronic Kidney Disease Epidemiology Collaboration; eGFR, estimated glomerular filtration rate; HDL, high-density lipoprotein; MASLD, metabolic dysfunction-associated steatotic liver disease; NAFLD, nonalcoholic fatty liver disease; NILDA, noninvasive liver disease assessment.

Model development

We developed an XGBoost model, designated FibroX, to predict estimated advanced fibrosis using the training cohort. FibroX achieved a mean ± standard deviation (SD) for AUROC of 0.965±0.019, sensitivity of 98%±5%, specificity of 92%±4%, PPV of 77%±11%, and NPV of 99%±2% across 5-fold cross-validation (Figure 1). The optimal FibroX prediction threshold (0.54 on a 0–1 probability scale) was determined by maximizing Youden’s index (Figure S1). At this threshold, FibroX showed an AUROC of 0.99, sensitivity of 94%, specificity of 97%, PPV of 88%, and NPV of 98%. A logistic regression-based FibroX model developed for comparison had similar performance (AUROC 0.99) but a lower PPV (78%), favoring the selection of XGBoost-based FibroX for subsequent analyses. FibroX demonstrated superior predictive accuracy compared to FIB-4 (AUROC: 0.97 vs. 0.62; P<0.001) for detecting advanced fibrosis.

Figure 1 XGBoost-based FibroX performance during 5-fold cross-validation in the training cohort. Performance of FibroX in the training cohort across 5-fold cross-validation. The model achieved consistently high AUROC values (0.94–1.00), demonstrating strong internal validity for detecting estimated advanced fibrosis (E[≥F3]). AUROC, area under the receiver operating characteristic curve; XGBoost, extreme gradient boosting.

External validation

In the external validation cohort (N=337) with biopsy-proven fibrosis staging, FibroX achieved higher accuracy (AUROC 0.84 vs. 0.82; P<0.001), classification (net reclassification index of +0.02), discrimination (+30.6%) and calibration (+40.8%, Figure S2) compared to FIB-4. Calibration performance in the external cohort is shown in Figure S2. Although statistically significant, the magnitude of improvement over FIB-4 was modest.

FibroX showed moderate positive correlations with fibrosis stage (Spearman ρ=0.453, P<0.001), portal inflammation (ρ=0.265, P<0.001), and ballooning (ρ=0.236, P<0.001), a weak positive correlation with microvesicular fat (ρ=0.151, P<0.01), a weak inverse correlation with steatosis (ρ=−0.138, P=0.01), and no significant correlations with lobular inflammation (ρ=−0.138, P=0.16), NAFLD activity score (ρ=0.064, P=0.24), iron (ρ=0.084, P=0.12), or acidophils (ρ=−0.044, P=0.42) (Figure S3).

Prognostication

Among 4,276 participants in the NHANES III prognostication cohort, those with a FibroX score of ≥0.54 had significantly higher risks of all-cause and cardiovascular mortality over 30 years of follow-up (Figure 2). In univariate Cox analyses, FibroX ≥0.54 was associated with a hazard ratio (HR) of 3.88 [95% confidence interval (CI), 3.50–4.31] for 30-year all-cause mortality. Age (HR, 1.08; 95% CI, 1.08–1.09; P<0.01), female gender (HR, 0.80; 95% CI, 0.73–0.88; P<0.01), hypertension (HR, 3.07; 95% CI, 2.78–3.40; P<0.01), diabetes (HR, 2.18; 95% CI, 1.98–2.40; P<0.01), and FibroX ≥0.54 (HR, 3.88; 95% CI, 3.50–4.31; P<0.01, Table S2) were statistically significant predictors in the univariate model. In multivariable Cox proportional hazards models adjusted for age, sex, race and ethnicity, smoking, education, hypertension, and diabetes, FibroX ≥0.54 was associated with elevated 30-year all-cause mortality [adjusted HR (aHR), 1.19; 95% CI, 1.07–1.34; P=0.002] and cardiovascular mortality (aHR, 1.22; 95% CI, 1.01–1.47; P=0.04).

Figure 2 Adjusted Cox proportional hazards survival curves for all-cause and cardiovascular-related mortality by FibroX score in patients with MASLD. Adjusted survival curves derived from Cox proportional hazards models depicting survival probabilities over a 30-year follow-up period for patients with MASLD, stratified by FibroX score [<0.54 (blue line) vs. ≥0.54 (red line)]. The models are adjusted for relevant covariates, though specific adjustments are not detailed in the figure. The x-axis represents years of follow-up (0 to 30 years), and the y-axis represents survival probability (0 to 1.0). Two panels are shown: (left) adjusted all-cause mortality, with an aHR of 1.19 (95% CI, 1.07–1.34), indicating a higher mortality risk for FibroX score ≥0.54; and (right) adjusted cardiovascular-related mortality, with an aHR of 1.22 (95% CI, 1.01–1.47), suggesting an increased risk of cardiovascular mortality for the higher FibroX score group. aHR, adjusted hazard ratio; CI, confidence interval; MASLD, metabolic dysfunction-associated steatotic liver disease.

Liver-related clinical outcomes

FibroX demonstrated acceptable performance in predicting liver-related clinical outcomes in the biopsy cohort. For liver decompensation, FibroX achieved a lower AUROC than FIB-4 (0.89 vs. 0.93), but higher accuracy for predicting hepatocellular carcinoma (0.73 vs. 0.69) and all-cause mortality (0.75 vs. 0.70; Figure S4).

Model explanation

Platelet count and age demonstrated the widest variability in SHAP values, indicating their substantial influence on advanced fibrosis and mortality risk predictions. Hemoglobin A1c, AST, and eGFR showed intermediate ranges of contribution, while BMI and ALT exhibited relatively smaller overall effects. Mean (SD) absolute SHAP values were as follows: platelet count, 0.44 (0.12); age, 0.39 (0.14); hemoglobin A1c, 0.27 (0.04); AST, 0.20 (0.04); eGFR, 0.07 (0.07); BMI, 0.03 (0.02); and ALT, 0.02 (0.01).

Feature-level SHAP distributions for advanced fibrosis predictions are shown in Figure 3, and global and local SHAP explanations for mortality predictions are presented in Figure 4. The heatmap in Figure 4 summarizes feature contributions across all participants, and two representative local explanations illustrate how individual predictors influenced the FibroX score in a true-positive and true-negative case.

Figure 3 SHAP plot of FibroX for predicting advanced fibrosis. SHAP values ranked from highest to lowest impact on model prediction behavior for advanced fibrosis. eGFR, estimated glomerular filtration rate; SHAP, SHapley Additive Explanations.

Figure 4 Heatmap of SHAP values across seven predictors. (A) Global explanation heatmap displaying SHAP values across seven input features for all participants, arranged from left to right by increasing model-predicted mortality risk [top trace, f(x)]. Each vertical column represents one patient. Red shading indicates a positive contribution to mortality risk (higher predicted risk), whereas blue indicates a negative contribution (lower risk). The bottom horizontal bar marks observed mortality outcomes (black bars). (B) Local explanations for two representative individuals. Case #414 (true positive) had a FibroX score of 0.58 (above the ≥0.54 threshold) despite low-risk FIB-4 (1.16) and NFS (−1.46). SHAP values show a protective effect from high platelet count (−0.49), counterbalanced by elevated risk contributions from age (+0.22), hemoglobin A1c (+0.26), AST (+0.22), and eGFR (+0.15). Case #36 (true negative) had a FibroX score of 0.52 (below the threshold), with intermediate risk according to FIB-4 (1.33) and NFS (0.12). SHAP values indicate protective effects from younger age (−0.55) and lower hemoglobin A1c (–0.21), alongside positive contributions from platelet count (+0.57), AST (+0.23), ALT (+0.02), BMI (+0.02), and eGFR (+0.01). ALT, alanine aminotransferase; AST, aspartate aminotransferase; BMI, body mass index; eGFR, estimated glomerular filtration rate; FIB-4, fibrosis-4; NFS, nonalcoholic fatty liver disease fibrosis score; SHAP, SHapley Additive Explanations.

Discussion

In this study, we developed and validated FibroX, a machine learning-based tool for noninvasive detection of advanced fibrosis and long-term mortality risk stratification in individuals with MASLD. The development of FibroX builds on our prior work in developing explainable AI models (21), aiming to leverage explainable AI to offer transparency in its predictions through SHAP values, a feature lacking in many current AI models. We selected XGBoost because it offers high accuracy and computational efficiency in structured datasets like NHANES, while maintaining transparency through SHAP interpretability which is essential for clinical adoption (16). Overall, FibroX demonstrated high diagnostic performance in the training cohort, with strong sensitivity and specificity for detecting estimated advanced fibrosis. As expected, performance declined in the external validation cohort where biopsy-confirmed fibrosis was used as the reference standard, reflecting the typical drop observed when applying models to independent clinical populations. In the biopsy cohort, however, the incremental improvement over FIB-4 was numerically modest despite statistical significance, and should therefore be interpreted in terms of potential complementary clinical value rather than as a replacement of existing tools. To contextualize FibroX within the current noninvasive landscape, we summarized commonly used serum-based, elastography-based, imaging-based, and machine learning-based approaches for fibrosis risk stratification in MASLD, along with their key strengths and limitations (Table S3).

FibroX outperformed FIB-4 in the detection of advanced fibrosis and in its long-term prognostic utility, particularly for cardiovascular-related mortality, where it was the only model to show a statistically significant association after multivariable adjustment. This is consistent with emerging evidence that noninvasive tests in MASLD may provide prognostic value beyond fibrosis staging alone, including for clinically meaningful long-term outcomes (22). This suggests that while FibroX performs comparably to existing blood-based NILDA tools in predicting advanced liver disease and all-cause mortality, it provides additional value as a predictive marker for cardiovascular-related mortality. The magnitude of these associations was modest, suggesting that FibroX provides incremental risk stratification rather than serving as a standalone predictor of long-term outcomes.

Another notable strength of FibroX lies in its interpretability, demonstrated by SHAP explanations that reveal both global and individual-level predictions. Unlike traditional models that function as “black boxes”, SHAP assigns each variable a contribution value that reflects how much it shifts an individual prediction relative to the cohort baseline. While these values do not correspond to an absolute clinical scale (e.g., they are not capped at 0–1 or interpretable like laboratory thresholds), their magnitude is directly comparable across variables within the model. Thus, higher SHAP values indicate a stronger influence on the prediction. In our analyses, platelet count and age emerged as the most influential predictors of advanced fibrosis and mortality, consistent with prior findings associating these features, along with AST, ALT, type 2 diabetes, eGFR and major adverse liver outcomes, including cirrhosis and hepatocellular carcinoma (23). From a clinical perspective, lower platelet counts and older age typically shift FibroX predictions toward higher risk, consistent with their association with advanced fibrosis and portal hypertension. Conversely, higher platelet counts and younger age tend to contribute to lower predicted risk. These findings are biologically plausible, as thrombocytopenia reflects progressive portal hypertension and splenic sequestration, while age captures cumulative metabolic and inflammatory exposure associated with fibrosis progression. Prior machine learning-based tools, such as ALADDIN, have demonstrated strong diagnostic accuracy for MASLD-related fibrosis (24), but they did not incorporate long-term prognostic outcomes or ensure interpretability. FibroX directly addresses these gaps by combining transparency with the ability to predict both advanced fibrosis and 30-year mortality. The role of BMI in MASLD prognosis is complex; lean patients with MASLD-related compensated cirrhosis paradoxically exhibit higher mortality yet lower rates of hepatic decompensation compared with non-lean counterparts, despite better baseline cardiometabolic profiles and comparable cardiovascular event rates (25). This suggests that adverse outcomes can arise at both extremes of BMI. Moreover, MASLD encompasses distinct patient subgroups, ranging from younger, metabolically healthy individuals to older, obese patients with varying degrees of insulin resistance, diabetes, hypertension, and dyslipidemia, each carrying distinct mortality risks (26,27). By highlighting the relative importance of predictors, FibroX can capture these nuanced interactions, positioning it as a valuable clinical tool for individualized risk stratification, addressing limitations inherent to traditional linear scoring systems.

Clinical and public health implications

The development of FibroX represents a significant advancement in the noninvasive assessment of advanced fibrosis in individuals with MASLD. Unlike traditional methods such as liver biopsy, which are invasive and prone to complications, sampling error, and interobserver variability (28,29), FibroX offers a non-invasive alternative that leverages commonly available clinical data. Liver biopsy is highly dependent on factors such as specimen length and is subject to significant inter- and intra-observer variability in histopathologic interpretation (18), with studies showing that even minor variations in biopsy length can significantly impact fibrosis assessment (30,31). Furthermore, FibroX’s ability to detect advanced fibrosis with high diagnostic accuracy and interpretability addresses the limitations of biopsy and improves upon existing non-invasive models like FIB-4, offering a comprehensive tool for fibrosis detection and long-term prognostic risk stratification. Moreover, the integration of tools like FibroX into clinical practice, particularly in primary care settings, could reduce the need for more invasive and costly procedures, while promoting early detection of MASLD, predicting fibrosis and cardiovascular risk, and facilitating timely intervention (32,33).

Beyond advanced fibrosis detection and mortality prediction, FibroX demonstrated meaningful discriminatory performance for several liver-related outcomes, including liver decompensation and hepatocellular carcinoma. Although not specifically trained on hepatic events, FibroX performed comparably to FIB-4 across these outcomes and showed positive correlations with the histologic hallmarks most closely linked to disease progression, particularly fibrosis, portal inflammation, and ballooning. These findings support the biological plausibility of FibroX and highlight its potential role as an integrative risk marker that captures broader hepatic injury pathways, positioning the model as a clinically valuable tool for stratifying both fibrosis severity and downstream liver-related risk.

In the context of public health, if subsequently validated across larger cohorts, FibroX may provide a cost-effective solution for widespread screening, potentially reducing the burden of liver disease-related morbidity and mortality. By detecting advanced fibrosis early, FibroX can guide timely interventions that may prevent progression to cirrhosis or liver failure, ultimately reducing healthcare costs. Furthermore, FibroX's non-invasive nature makes it ideal for large-scale screening, particularly in resource-limited settings where liver biopsies and advanced imaging techniques may not be readily accessible. This aligns with the growing push towards precision medicine, where AI models like FibroX can help tailor interventions to individual patients based on clinical, biochemical, and imaging data (32,33).

Limitations

Our study had several limitations that warrant consideration. First, although the derivation cohort was sizable, fibrosis staging in NHANES was based on a two-step noninvasive algorithm rather than biopsy confirmation, which may introduce misclassification bias, particularly in intermediate fibrosis stages (F2–3 overlap). This trade-off was necessary given the population-based nature of NHANES, but it aligns with current AASLD guidance for noninvasive staging and reflects real-world screening practice. Second, the external biopsy validation cohort was relatively small, which may limit power for subgroup analyses and reduce generalizability across demographic strata. Third, differences in fibrosis reference standards and steatosis definitions across cohorts, including two-step noninvasive staging in NHANES 2017–2020, biopsy-confirmed fibrosis in the external validation cohort, and ultrasound-based steatosis assessment with cardiometabolic criteria in NHANES III, may introduce heterogeneity, misclassification, and spectrum bias that could influence model discrimination and calibration across datasets. The high internal discrimination may reflect some degree of optimism, and more robust resampling approaches such as nested cross-validation or bootstrapping were not performed. Fourth, although BMI was selected instead of waist circumference to maintain model practicality and ensure clinical usability, this substitution could reduce sensitivity to certain metabolic risk profiles, particularly in populations with central obesity. Additionally, liver related outcome analyses were exploratory and limited by relatively small event numbers for hepatocellular carcinoma and hepatic decompensation, which may reduce statistical power. We did not formally evaluate incremental discrimination (e.g., change in C-statistic or net reclassification improvement) beyond established cardiovascular risk models or existing fibrosis scores, which warrants further study. Further, confidence intervals for discrimination and reclassification metrics (e.g., AUROC and net reclassification index) were not formally calculated, which may limit the precision of performance estimates.Finally, FibroX currently relies on seven routinely available features; while this enhances accessibility and integration into clinical workflows, future iterations could incorporate additional metabolic and imaging biomarkers to further improve predictive accuracy and subgroup calibration.

Future research and directions

While FibroX demonstrates promising diagnostic and prognostic utility, several avenues remain for future research. Further validation in diverse populations is needed to confirm its robustness across different ethnic groups and healthcare settings. This could include exploring how FibroX performs in regions with varied healthcare infrastructures or in populations with comorbidities such as obesity, diabetes, and cardiovascular disease. Moreover, prospective studies are necessary to evaluate the long-term clinical outcomes of patients diagnosed with FibroX and to assess the model’s ability to monitor disease progression over time. The integration of FibroX with electronic health records could facilitate its adoption in routine clinical practice, enabling comprehensive risk assessments and personalized care strategies for liver disease (32,33).

Additionally, FibroX could be integrated with advanced imaging techniques, such as TE and magnetic resonance elastography, to improve its ability to provide quantitative assessments of liver fibrosis (34). AI systems are already capable of handling complex datasets generated by these imaging modalities, providing more accurate and reproducible staging of liver fibrosis compared to traditional methods (34,35). Future versions of FibroX could leverage such integration to enhance its performance and accuracy in detecting earlier stages of fibrosis or differentiating between stages of disease.

Conclusions

In conclusion, FibroX provides a transparent, accessible, and clinically relevant approach for identifying individuals with advanced fibrosis and predicting long-term mortality risk in MASLD. FibroX improves the detection of advanced liver fibrosis in MASLD and enhances long-term cardiovascular mortality predictions compared to FIB-4. Its superior performance, interpretability, and potential cost savings underscore its value for integration into clinical workflows to improve risk stratification and management in MASLD patients. Although further prospective validation is needed, the model’s consistent performance across diverse cohorts highlights its potential to become a scalable solution for early detection and risk assessment, applicable in both primary care and specialty settings.

Acknowledgments

We would like to thank Dr. Eri Osta for his valuable contributions as a consultant on data analysis for this study. An OpenAI large language model was used only to assist with grammar and syntax editing of the manuscript. It was not used for data analysis, figure generation, study design, interpretation of results, or preparation of scientific content. All scientific content, analyses, and interpretations were developed by the authors, and all outputs were reviewed, verified, and approved by the co-authors.

Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://tgh.amegroups.com/article/view/10.21037/tgh-2026-0017/rc

Data Sharing Statement: Available at https://tgh.amegroups.com/article/view/10.21037/tgh-2026-0017/dss

Peer Review File: Available at https://tgh.amegroups.com/article/view/10.21037/tgh-2026-0017/prf

Funding: None.

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tgh.amegroups.com/article/view/10.21037/tgh-2026-0017/coif). J.L. received research contracts (to Yale University) from Akero, Becton Dickinson, Gilead, Inventiva, Novo Nordisk, and Viking. The other authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Yale University Institutional Review Board (No. HIC 2000027433). Informed consent for the Yale biopsy cohort was waived by the Yale University Institutional Review Board because the analysis used retrospective/de-identified patient-level data. Analyses of publicly available NHANES data did not require additional institutional review board approval.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Younossi ZM, Mangla KK, Berentzen TL, et al. Liver histology is associated with long-term clinical outcomes in patients with metabolic dysfunction-associated steatohepatitis. Hepatol Commun 2024;8:e0423. [Crossref] [PubMed]
Maher S, et al. Role of the gut microbiome in metabolic dysfunction-associated steatotic liver disease. Semin Liver Dis 2024;44:457-473. [Crossref] [PubMed]
Younossi ZM, Kalligeros M, Henry L. Epidemiology of metabolic dysfunction-associated steatotic liver disease. Clin Mol Hepatol 2025;31:S32-50. [Crossref] [PubMed]
Targher G, Byrne CD, Tilg H. MASLD: a systemic metabolic disorder with cardiovascular and malignant complications. Gut 2024;73:691-702. [Crossref] [PubMed]
Estes C, Anstee QM, Arias-Loste MT, et al. Modeling NAFLD disease burden in China, France, Germany, Italy, Japan, Spain, United Kingdom, and United States for the period 2016-2030. J Hepatol 2018;69:896-904. [Crossref] [PubMed]
Hagström H, Talbäck M, Andreasson A, et al. Repeated FIB-4 measurements can help identify individuals at risk of severe liver disease. J Hepatol 2020;73:1023-9. [Crossref] [PubMed]
Yoon JH, Lee JM, Joo I, et al. Hepatic fibrosis: prospective comparison of MR elastography and US shear-wave elastography for evaluation. Radiology 2014;273:772-82. [Crossref] [PubMed]
Wong VW, Vergniol J, Wong GL, et al. Diagnosis of fibrosis and cirrhosis using liver stiffness measurement in nonalcoholic fatty liver disease. Hepatology 2010;51:454-62. [Crossref] [PubMed]
Chon YE, Jin YJ, An J, et al. Optimal cut-offs of vibration-controlled transient elastography and magnetic resonance elastography in diagnosing advanced liver fibrosis in patients with nonalcoholic fatty liver disease: A systematic review and meta-analysis. Clin Mol Hepatol 2024;30:S117-33. [Crossref] [PubMed]
Meng F, Zheng Y, Zhang Q, et al. Noninvasive evaluation of liver fibrosis using real-time tissue elastography and transient elastography (FibroScan). J Ultrasound Med 2015;34:403-10. [Crossref] [PubMed]
Han JW, Kim HY, Yu JH, et al. Diagnostic accuracy of the Fibrosis-4 index for advanced liver fibrosis in nonalcoholic fatty liver disease with type 2 diabetes: A systematic review and meta-analysis. Clin Mol Hepatol 2024;30:S147-58. [Crossref] [PubMed]
Kim MN, Han JW, An J, et al. KASL clinical practice guidelines for noninvasive tests to assess liver fibrosis in chronic liver disease. Clin Mol Hepatol 2024;30:S5-105. [Crossref] [PubMed]
Xiong M, Xu Y, Zhao Y, et al. Quantitative analysis of artificial intelligence on liver cancer: A bibliometric analysis. Front Oncol 2023;13:990306. [Crossref] [PubMed]
Wong GL, Yuen PC, Ma AJ, et al. Artificial intelligence in prediction of non-alcoholic fatty liver disease and fibrosis. J Gastroenterol Hepatol 2021;36:543-50. [Crossref] [PubMed]
Han S, Choi M, Lee B, et al. Accuracy of Noninvasive Scoring Systems in Assessing Liver Fibrosis in Patients with Nonalcoholic Fatty Liver Disease: A Systematic Review and Meta-Analysis. Gut Liver 2022;16:952-63. [Crossref] [PubMed]
Tarwidi D, Pudjaprasetya SR, Adytia D, et al. An optimized XGBoost-based machine learning method for predicting wave run-up on a sloping beach. MethodsX 2023;10:102119. [Crossref] [PubMed]
Kim RG, Deng J, Reaso JN, et al. Noninvasive Fibrosis Screening in Fatty Liver Disease Among Vulnerable Populations: Impact of Diabetes and Obesity on FIB-4 Score Accuracy. Diabetes Care 2022;45:2449-51. [Crossref] [PubMed]
Ratziu V, Charlotte F, Heurtier A, et al. Sampling variability of liver biopsy in nonalcoholic fatty liver disease. Gastroenterology 2005;128:1898-906. [Crossref] [PubMed]
Rinella ME, Lazarus JV, Ratziu V, et al. A multisociety Delphi consensus statement on new fatty liver disease nomenclature. Hepatology 2023;78:1966-86. [Crossref] [PubMed]
Abdelhameed F, Kite C, Lagojda L, et al. Non-invasive Scores and Serum Biomarkers for Fatty Liver in the Era of Metabolic Dysfunction-associated Steatotic Liver Disease (MASLD): A Comprehensive Review From NAFLD to MAFLD and MASLD. Curr Obes Rep 2024;13:510-31. [Crossref] [PubMed]
Njei B, Osta E, Njei N, et al. An explainable machine learning model for prediction of high-risk nonalcoholic steatohepatitis. Sci Rep 2024;14:8589. [Crossref] [PubMed]
Wang Y, Song SJ, Jiang Y, et al. Role of noninvasive tests in the prognostication of metabolic dysfunction-associated steatotic liver disease. Clin Mol Hepatol 2025;31:S51-75. [Crossref] [PubMed]
Shang Y, Akbari C, Dodd M, et al. Association between longitudinal biomarkers and major adverse liver outcomes in patients with non-cirrhotic metabolic dysfunction-associated steatotic liver disease. Hepatology 2025;81:1501-11. [Crossref] [PubMed]
Alkhouri N, Cheuk-Fung Yip T, Castera L, et al. ALADDIN: A Machine Learning Approach to Enhance the Prediction of Significant Fibrosis or Higher in Metabolic Dysfunction-Associated Steatotic Liver Disease. Am J Gastroenterol 2026;121:362-74. [Crossref] [PubMed]
Njei B, Mezzacappa C, John BV, et al. Mortality, Hepatic Decompensation, and Cardiovascular Outcomes in Lean vs. Non-lean MASLD Cirrhosis: A Veterans Affairs Cohort Study. Dig Dis Sci 2025;70:802-13.
Yi J, Wang L, Guo J, et al. Novel metabolic phenotypes for extrahepatic complication of nonalcoholic fatty liver disease. Hepatol Commun 2023;7:e0016. [Crossref] [PubMed]
Stefan N, Yki-Järvinen H, Neuschwander-Tetri BA. Metabolic dysfunction-associated steatotic liver disease: heterogeneous pathomechanisms and effectiveness of metabolism-based treatment. Lancet Diabetes Endocrinol 2025;13:134-48. [Crossref] [PubMed]
Buivydienė A, Basytė V, Valantinas J. Non-invasive serum markers and transient elastography in staging advanced chronic hepatitis C. Acta Med Lituan 2016;22:188-95.
Myers RP, Pomier-Layrargues G, Kirsch R, et al. Discordance in fibrosis staging between liver biopsy and transient elastography using the FibroScan XL probe. J Hepatol 2012;56:564-70. [Crossref] [PubMed]
Siddiqui MS, Vuppalanchi R, Van Natta ML, et al. Vibration-Controlled Transient Elastography to Assess Fibrosis and Steatosis in Patients With Nonalcoholic Fatty Liver Disease. Clin Gastroenterol Hepatol 2019;17:156-163.e2. [Crossref] [PubMed]
Fraquelli M, Baccarin A, Casazza G, et al. Liver stiffness measurement reliability and main determinants of point shear-wave elastography in patients with chronic liver disease. Aliment Pharmacol Ther 2016;44:356-65. [Crossref] [PubMed]
Chakraborty S, Chandran D, Chopra H, et al. Advances in artificial intelligence based diagnosis and treatment of liver diseases - Correspondence. Int J Surg 2023;109:3234-5. [Crossref] [PubMed]
Su TH, Wu CH, Kao JH. Artificial intelligence in precision medicine in hepatology. J Gastroenterol Hepatol 2021;36:569-80. [Crossref] [PubMed]
Yu Y, Wang J, Ng CW, et al. Deep learning enables automated scoring of liver fibrosis stages. Sci Rep 2018;8:16016. [Crossref] [PubMed]
Popa SL, Ismaiel A, Abenavoli L, et al. Diagnosis of Liver Fibrosis Using Artificial Intelligence: A Systematic Review. Medicina (Kaunas) 2023;59:992. [Crossref] [PubMed]

doi: 10.21037/tgh-2026-0017
Cite this article as: Njei B, Al-Ajlouni YA, Ilagan-Ying Y, Al Ta’ani O, Boateng S, Njei N, Kanmounye US, Lim J, Banini B. FibroX: a machine learning model for detecting and prognosticating advanced fibrosis in metabolic dysfunction-associated steatotic liver disease. Transl Gastroenterol Hepatol 2026;11:67.

FibroX: a machine learning model for detecting and prognosticating advanced fibrosis in metabolic dysfunction-associated steatotic liver disease

Highlight box

Introduction

Methods

Study design and populations

Outcomes and predictors

Model development and validation

Explainability and fairness

Statistical analysis

Ethics approval and consent to participate

Results

Study population and baseline characteristics

Table 1

Model development

External validation

Prognostication

Liver-related clinical outcomes

Model explanation

Discussion

Clinical and public health implications

Limitations

Future research and directions

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share