Development and validation of a machine learning-based nomogram for preoperative prediction of laparoscopic surgical difficulty in gallstone patients
Original Article

Development and validation of a machine learning-based nomogram for preoperative prediction of laparoscopic surgical difficulty in gallstone patients

Kun Huang1#, Shunhu Jia1#, Xinzhu Yuan2, Pingwu Zhao1, Dou Bai3

1Department of General Surgery, Mianyang Hospital of Traditional Chinese Medicine, Mianyang, China; 2Department of Nephrology, The Second Clinical Medical Institution of North Sichuan Medical College (Nanchong Central Hospital) and Nanchong Key Laboratory of Basic Science & Clinical Research on Chronic Kidney Disease, Nanchong, China; 3Department of General Surgery, Mianyang Central Hospital, Mianyang, China

Contributions: (I) Conception and design: K Huang, X Yuan, D Bai; (II) Administrative support: P Zhao, D Bai; (III) Provision of study materials or patients: P Zhao; (IV) Collection and assembly of data: S Jia; (V) Data analysis and interpretation: K Huang, S Jia; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

#These authors contributed equally to this work as co-first authors.

Correspondence to: Dou Bai, Mmed. Department of General Surgery, Mianyang Central Hospital, No. 12, Changjia Alley, Jingzhong Street, Fucheng District, Mianyang 621000, China. Email: bdbd2000@126.com; Pingwu Zhao, MB. Department of General Surgery, Mianyang Hospital of Traditional Chinese Medicine, No. 14, Fucheng Road, Fucheng District, Mianyang 621000, China. Email: zhaopingwu01@hotmail.com.

Background: Preoperative prediction of laparoscopic surgical difficulty in gallstone patients is crucial for improving surgical outcomes. This study aimed to develop and validate a nomogram based on advanced machine learning algorithms, incorporating key clinical and systemic inflammatory response indicators, such as the C-reactive protein to albumin ratio (CAR).

Methods: A retrospective analysis was conducted on 362 eligible patients who underwent laparoscopic cholecystectomy (LC) for gallstones between 2013 and 2019. A total of 420 patients were initially identified, with 58 excluded based on predefined criteria such as age and incomplete records. The remaining patients were divided into a training set (n=253) and a validation set (n=109). The development of the nomogram involved multiple analytical techniques, including machine learning methods such as least absolute shrinkage and selection operator (LASSO) regression, decision tree analysis, and support vector machine (SVM) models, along with traditional statistical methods like univariate and multivariate logistic regression. Significant predictors, including CAR, white blood cell count (WBC), and gallbladder wall thickness, were integrated into the final predictive model. Model performance was evaluated using receiver operating characteristic (ROC) curve analysis and calibration plots.

Results: The machine learning-based model demonstrated strong predictive capability, with an area under the curve (AUC) of 0.774 in the training set and 0.863 in the validation set. Calibration plots showed good agreement between predicted and actual outcomes, with mean absolute errors of 0.035 and 0.05 for the training and validation sets, respectively.

Conclusions: This study demonstrates the utility of applying machine learning algorithms to develop a robust nomogram for preoperative prediction of laparoscopic surgical difficulty. By integrating key clinical variables and systemic inflammatory markers, the model provides an effective tool for improving surgical planning and enhancing patient outcomes.

Keywords: Laparoscopic surgery; preoperative prediction; machine learning; systemic inflammatory response; nomogram


Received: 18 September 2024; Accepted: 04 March 2025; Published online: 09 June 2025.

doi: 10.21037/tgh-24-124


Highlight box

Key findings

• This study developed and validated a machine learning-based nomogram for predicting the difficulty of laparoscopic cholecystectomy (LC) in gallstone patients.

• Key predictors included systemic inflammatory markers [e.g., C-reactive protein to albumin ratio (CAR), white blood cell count (WBC)] and imaging findings (e.g., gallbladder wall thickness).

• The model demonstrated strong predictive performance with an area under the curve of 0.863 in the validation set.

What is known and what is new?

• Systemic inflammatory markers like WBC and CAR are associated with surgical complexity, and preoperative imaging findings can provide important risk insights.

• This manuscript integrates machine learning algorithms with clinical and imaging data to create a comprehensive and practical risk prediction tool for LC.

What is the implication, and what should change now?

• This model provides surgeons with a reliable tool for preoperative risk stratification, enabling better resource allocation and surgical planning.

• Further prospective multicenter studies are required to validate and refine the model in diverse clinical settings.


Introduction

The advent of laparoscopic cholecystectomy (LC) has revolutionized the surgical management of gallstone disease, offering reduced postoperative pain, shorter hospital stays, and quicker recovery compared to traditional open surgery (1-3). However, despite these advantages, the procedure is not without risks, notably bile duct injuries (BDIs), which remain a significant concern due to their severe implications (4-6). These injuries can lead to complex clinical scenarios requiring additional interventions and prolonged treatment. The critical view of safety (CVS) technique and intraoperative cholangiography (IOC) are recommended strategies to mitigate these risks, aiming to enhance the identification of cystic structures and prevent misidentification (7).

Recent advances in machine learning and artificial intelligence have begun to be applied to the field of predictive analytics in surgery, focusing on the prediction of surgical difficulties and outcomes (8,9). These technologies analyze preoperative data to predict the likelihood of surgical complications, such as BDIs, thereby potentially guiding the surgical approach and decision-making process. Machine learning models can integrate diverse data types, including demographic information, clinical parameters, and imaging data, to provide a comprehensive risk assessment (10,11).

Incorporating systemic inflammatory markers such as C-reactive protein (CRP) to albumin ratio (CAR) and white blood cell count (WBC) in predictive models has shown promise in enhancing the accuracy of predicting surgical difficulties in gallstone diseases (12). These biomarkers reflect the patient’s inflammatory state, which can be crucial for anticipating complications and surgical challenges. The integration of these markers into a machine learning-based nomogram offers a novel approach to preoperatively determine the potential for a challenging LC, thereby enabling better preparedness and tailored surgical planning. We present this article in accordance with the TRIPOD reporting checklist (available at https://tgh.amegroups.com/article/view/10.21037/tgh-24-124/rc).


Methods

Patients and outcome definition

This retrospective study included patients who underwent LC for gallstones at Mianyang Hospital of Traditional Chinese Medicine between 2013 and 2019. Surgical difficulty, the primary outcome predicted by the model, was classified as either “easy” or “difficult”, based on intraoperative findings and surgeon assessments. A surgery was considered “difficult” if any of the following criteria were met: (I) conversion to open surgery due to dense adhesions around the gallbladder neck, uncontrollable bleeding, or organ injury; (II) prolonged operative time (≥120 minutes); (III) complications such as gangrenous cholecystitis or peri-cholecystic abscess, confirmed by postoperative pathology; or (IV) severe adhesions complicating the dissection of Calot’s triangle. These outcomes were assessed intraoperatively by the surgeon and confirmed by postoperative pathology as needed. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. This study was approved by the Institutional Review Board of Mianyang Hospital of Traditional Chinese Medicine (No. 2025-001). All relevant data were recorded in the patients’ medical charts and collected retrospectively for analysis.

Eligibility criteria

Patients were included in the study if they met the following criteria: (I) age ≥18 years; (II) diagnosed with gallstones confirmed by ultrasound or computed tomography (CT) scan; and (III) underwent LC at Mianyang Hospital of Traditional Chinese Medicine between 2013 and 2019. Patients with acute cholecystitis were included in the study. To ensure consistency, blood samples were collected at the time of admission, capturing the patients’ inflammatory status at presentation. Patients were excluded if they had previously undergone abdominal surgery, had a history of malignancy, or presented with severe systemic diseases that could affect surgical outcomes (e.g., severe cardiovascular or respiratory conditions). Patients with incomplete medical records or those who did not provide informed consent were also excluded from the analysis.

Informed consent statement

Given the retrospective nature of this study, formal consent was not required. However, all patient data were anonymized to ensure confidentiality.

Surgical procedure

Standardized laparoscopic techniques were performed by experienced surgeons. To control for variability introduced by operator factors, the study included only cases performed by surgeons with more than 500 LC procedures. This ensured that operative time and blood loss reflected case-specific complexity rather than differences in surgical expertise. The decision to convert to open surgery was based on factors such as dense adhesions around the gallbladder neck, uncontrollable bleeding from major vessels, or damage to liver and gastrointestinal tissues. Gangrene or abscesses were identified based on intraoperative observations and confirmed by postoperative pathology.

Data collection

Clinical data including demographic information, preoperative laboratory results (such as liver function tests and WBCs), and intraoperative findings were collected retrospectively. The clarity of Calot’s triangle and gallbladder wall thickness were assessed preoperatively using ultrasound imaging as part of routine diagnostic evaluation. The liver function score was calculated using a formula derived from least absolute shrinkage and selection operator (LASSO) regression analysis: Liver Function Score=− 2.0443 + 0.00824 × TBIL + 0.00625 × ALP − 0.00296 × AST (where TBIL, ALP, and AST represent total bilirubin, alkaline phosphatase, and aspartate aminotransferase levels, respectively). The cutoff values for liver function score, WBC, hemoglobin, neutrophil-to-lymphocyte ratio (NLR), CAR, and platelet-to-lymphocyte ratio (PLR) were determined using receiver operating characteristic (ROC) curve analysis. This approach is a widely accepted method for identifying optimal thresholds that balance sensitivity and specificity in predicting outcomes. For each variable, the ROC curve was generated based on its ability to distinguish between easy and difficult LC cases. The cutoff points were selected as the values that maximized the Youden index, ensuring a robust and reproducible basis for stratification in subsequent analyses. The following cutoff values were identified:

  • Liver Function Score: ≥0.586;
  • WBC: ≥8.055×109/L;
  • Hemoglobin: ≥129.5 g/L;
  • NLR: ≥3.479;
  • CAR: ≥0.108;
  • PLR: ≥155.990.

Study size determination

The study size was determined based on the availability of patient records within the specified timeframe (2013 to 2019) at Mianyang Hospital of Traditional Chinese Medicine. A total of 362 patients who met the eligibility criteria and had complete medical records were included in the analysis. The dataset was then divided into two sets: a training set (n=253, 70% of the total sample) used for model development, and a validation set (n=109, 30% of the total sample) used to assess the model’s performance. The sample size was deemed sufficient for the purposes of developing a reliable predictive model, as it allowed for adequate power to detect significant predictors of surgical difficulty using machine learning techniques, such as LASSO regression and support vector machine (SVM). Additionally, previous studies in similar contexts have used comparable sample sizes, making this dataset appropriate for modeling preoperative risk stratification (13-15). No formal sample size calculation was performed due to the retrospective nature of the study and the focus on leveraging all available data.

Statistical analysis

Data were analyzed using R software (version 4.3.3). Continuous variables were assessed for normality. Normally distributed variables were presented as mean ± standard deviation (x ± s), while non-normally distributed variables were described as median (interquartile range) [M (IQR)]. Categorical variables were expressed as rates (%), and comparisons between groups were made using the Chi-squared test. The dataset was randomly divided into a training set (70%) and a validation set (30%). The training set was used to construct the nomogram model and conduct internal validation, while the validation set was used for external validation. Analytical techniques included machine learning methods such as LASSO regression, decision tree analysis, and SVM models, alongside traditional statistical methods like univariate and multivariate logistic regression analyses, to identify significant predictors of surgical difficulty. The final predictive model, a nomogram, incorporated significant variables such as CAR, WBC, and gallbladder wall thickness. The model’s performance was evaluated using ROC curve analysis, calibration plots, and decision curve analysis (DCA) in both training and validation sets, with bootstrap resampling (1,000 samples) used for internal validation.


Results

Participant flow and study cohorts

A total of 420 patients who underwent LC were initially identified between 2013 and 2019. After applying exclusion criteria, 58 patients were removed, with reasons including age <18 or >85 years (n=20), multiple primary conditions (n=18), incomplete medical records (n=15), and lack of consent (n=5). The remaining 362 patients were eligible for inclusion in the study. These patients were then divided into two cohorts: 253 patients (70%) formed the training set, while 109 patients (30%) comprised the validation set. In the training set, 81.4% of patients (n=206) underwent surgeries classified as “easy”, while 18.6% (n=47) were considered “difficult” surgeries. The validation set followed a similar distribution, with 79.8% (n=87) easy surgeries and 20.2% (n=22) difficult surgeries. This flow of participants through the study illustrates the balanced distribution across the training and validation groups, ensuring robust predictive modeling and validation (Figure 1).

Figure 1 Flow chart of participant selection and categorization. This flow chart summarizes the process of participant selection and exclusion, starting from the initial 420 patients identified, down to the final 362 eligible patients. Key exclusion criteria were age (<18 or >85 years), presence of multiple primary conditions, incomplete records, and lack of consent, resulting in the exclusion of 58 patients. The remaining participants were divided into a training set (n=253) and a validation set (n=109), with the further breakdown of easy and difficult surgeries in each group. The chart provides a clear visual representation of how participants flowed through the different stages of the study.

The baseline demographic and clinical characteristics of the patients

The baseline demographic and clinical characteristics of the patients, as summarized in Table 1, indicate a well-matched distribution between the training and validation sets, ensuring the robustness of the predictive model for surgical difficulty. Age distribution was slightly skewed towards younger patients (<53 years) in both sets, with no significant difference (P=0.55). Gender was similarly balanced, with a higher proportion of females (68.2% overall), and again, no significant differences were observed between the groups (P=0.65). Complications were present in over half of the patients, slightly more in the validation set (58.7%) than in the training set (52.6%), though this difference was not statistically significant (P=0.34). Key surgical factors, such as the clarity of Calot’s triangle and gallbladder wall thickness, were also consistent across both groups (P=0.90 and P=0.34, respectively), further supporting the comparability of the cohorts.

Table 1

Baseline characteristics of patients

Characteristics Total (n=362) Training sets (n=253) Validation set (n=109) P
Age (years) 0.55
   <53 216 (59.7) 154 (60.9) 62 (56.9)
   ≥53 146 (40.3) 99 (39.1) 47 (43.1)
Gender 0.65
   Female 247 (68.2) 175 (69.2) 72 (66.1)
   Male 115 (31.8) 78 (30.8) 37 (33.9)
Complication 0.34
   No 165 (45.6) 120 (47.4) 45 (41.3)
   Yes 197 (54.4) 133 (52.6) 64 (58.7)
Calot’s triangle 0.90
   Unclear 91 (25.1) 63 (24.9) 28 (25.7)
   Clear 165 (45.6) 114 (45.1) 51 (46.8)
   Unknown 106 (29.3) 76 (30.0) 30 (27.5)
Gallbladder wall thickness 0.34
   Unknown 133 (36.7) 87 (34.4) 46 (42.2)
   ≤2 mm 48 (13.3) 36 (14.2) 12 (11.0)
   >2 mm 181 (50.0) 130 (51.4) 51 (46.8)
Liver function score 0.91
   <0.586 229 (63.3) 161 (63.6) 68 (62.4)
   ≥0.586 133 (36.7) 92 (36.4) 41 (37.6)
WBC (×109/L) 0.26
   <8.055 284 (78.5) 203 (80.2) 81 (74.3)
   ≥8.055 78 (21.5) 50 (19.8) 28 (25.7)
Hemoglobin (g/L) 0.56
   <129.5 156 (43.1) 106 (41.9) 50 (45.9)
   ≥129.5 206 (56.9) 147 (58.1) 59 (54.1)
NLR 0.62
   <3.479 244 (67.4) 168 (66.4) 76 (69.7)
   ≥3.479 118 (32.6) 85 (33.6) 33 (30.3)
CAR >0.99
   <0.108 266 (73.5) 186 (73.5) 80 (73.4)
   ≥0.108 96 (26.5) 67 (26.5) 29 (26.6)
PLR§ 0.14
   <155.990 265 (73.2) 179 (70.8) 86 (78.9)
   ≥155.990 97 (26.8) 74 (29.2) 23 (21.1)
Surgical difficulty 0.83
   Ease 293 (80.9) 206 (81.4) 87 (79.8)
   Difficulty 69 (19.1) 47 (18.6) 22 (20.2)

The data are presented as n (%). , NEUT/LYM; , CRP/ALB; §, PLT/LYM. ALB, albumin; CAR, C-reactive protein to albumin ratio; CRP, C-reactive protein; LYM, lymphocyte count; NEUT, neutrophil count; NLR, neutrophil-to-lymphocyte ratio; PLR, platelet-to-lymphocyte ratio; PLT, platelet count; WBC, white blood cell count.

Regarding clinical and laboratory parameters, liver function scores and WBCs, important indicators of patient health and surgical risk, showed no significant differences between the training and validation sets (P=0.91 and P=0.26, respectively). Hemoglobin levels and the NLR, markers for systemic inflammation, were similarly distributed (P=0.56 and P=0.62, respectively). While the PLR approached significance (P=0.14), with the validation set showing a slightly lower PLR, this did not impact the overall balance between the groups. Importantly, the primary outcome of surgical difficulty was nearly identical between the cohorts, with approximately 80% of surgeries classified as easy, and no significant difference observed (P=0.83). This consistency across demographic, clinical, and outcome measures underscores the validity of the data and the reliability of the predictive model for broader clinical applications (Table 1).

The univariate and multivariate logistic regression analyses

The univariate and multivariate logistic regression analyses, as detailed in Table 2, reveal several key factors that significantly influence the likelihood of surgical difficulty in patients with gallstones. Age was a prominent predictor, with patients aged 53 years and older showing a significantly higher risk. The univariate analysis revealed an odds ratio (OR) of 2.80 [95% confidence interval (CI): 1.473–5.43; P=0.002]. This relationship remained robust in the multivariate analysis, with an adjusted OR of 2.54 (95% CI: 1.20–5.51; P=0.02), confirming age as an independent risk factor. Additionally, elevated WBC was a critical predictor, where patients with WBC levels of 8.055 or higher had an OR of 4.93 (95% CI: 2.454–9.94) in univariate analysis (P<0.001), and this remained significant in the multivariate model with an adjusted OR of 3.81 (95% CI: 1.53–9.80; P=0.005).

Table 2

Univariate and multivariate logistic analysis

Variables Univariate Multivariate
OR (95% CI) P OR (95% CI) P
Age (years)
   <53 1 1
   ≥53 2.80 (1.473–5.43) 0.002 2.54 (1.20–5.51) 0.02
Gender
   Female 1
   Male 1.20 (0.60–2.32) 0.60
Complication
   No 1
   Yes 1.58 (0.83–3.07) 0.17
Calot’s triangle
   Unclear 1 1
   Clear 0.88 (0.46–1.67) 0.70 0.92 (0.43–1.91) 0.82
Gallbladder wall thickness
   ≤2 mm 1 1
   >2 mm 2.09 (1.093–4.14) 0.03 1.98 (0.94–4.28) 0.08
Liver function score
   <0.586 1 1
   ≥0.586 2.12 (1.1–4.04) 0.02 1.76 (0.84–3.66) 0.13
WBC (×109/L)
   <8.055 1 1
   ≥8.055 4.93 (2.454–9.94) <0.001 3.81 (1.53–9.80) 0.005
Hemoglobin (g/L)
   <129.5 1
   ≥129.5 0.63 (0.33–1.20) 0.16
NLR
   <3.479 1 1
   ≥3.479 3.85 (2.010–7.5) <0.001 1.43 (0.56–3.542) 0.45
CAR
   <0.108 1 1
   ≥0.108 3.53 (1.818–6.87) <0.001 2.47 (1.16–5.27) 0.02
PLR§
   <155.990 1 1
   ≥155.990 2.33 (1.20–4.49) 0.01 1.82 (0.80–4.11) 0.15

, NEUT/LYM; , CRP/ALB; §, PLT/LYM. ALB, albumin; CAR, C-reactive protein to albumin ratio; CRP, C-reactive protein; LYM, lymphocyte count; NEUT, neutrophil count; NLR, neutrophil-to-lymphocyte ratio; PLR, platelet-to-lymphocyte ratio; PLT, platelet count; WBC, white blood cell count.

The CAR also stood out as a significant factor. Patients with a CAR of 0.108 or higher were found to have an increased risk, with an OR of 3.53 (95% CI: 1.818–6.87) in the univariate analysis (P<0.001), which persisted in the multivariate analysis (adjusted OR: 2.47, 95% CI: 1.16–5.27; P=0.02). While gallbladder wall thickness greater than 2 mm and the PLR showed some association with surgical difficulty in univariate analysis, these factors lost significance in the multivariate analysis. Other variables such as gender, the presence of complications, and the clarity of Calot’s triangle did not show significant associations in either analysis. These findings underscore the importance of specific demographic and inflammatory markers, particularly age, WBC, and CAR, in predicting surgical difficulty, thus providing essential insights for preoperative risk assessment (Table 2).

LASSO regression analysis for variable selection

The LASSO regression analysis, as shown in Figure 2, was performed to identify the most significant predictors for the surgical difficulty prediction model. In Figure 2A, the coefficient paths of various predictors are plotted against the regularization parameter, log(λ). As the regularization strength increases, coefficients for less important variables shrink towards zero, indicating their exclusion from the model. The variables that maintain non-zero coefficients as log(λ) increases are considered key predictors. Figure 2B illustrates the model’s performance, measured by the area under the curve (AUC), across different log(λ) values. The model achieves optimal predictive performance at a specific log(λ), as indicated by the peak AUC value. This balance between model complexity and accuracy is crucial for developing a robust predictive model. The final LASSO analysis results, reflected in the coefficients after regularization, indicate that TBIL, ALP, and AST are the most significant predictors. Specifically, the coefficients are as follows: TBIL (β=0.0082, OR =1.0083), ALP (β=0.0062, OR =1.0063), and AST (β=−0.0030, OR =0.9970). These results suggest that increases in TBIL and ALP are associated with a higher risk of surgical difficulty, while higher levels of AST are associated with a slightly reduced risk. The coefficients for other variables, such as gamma-glutamyl transferase (GGT), alanine aminotransferase (ALT), indirect bilirubin (IB), and direct bilirubin (DB), were shrunk to zero, indicating that they do not significantly contribute to the model’s predictive power. This refined model emphasizes the most relevant factors, ensuring both accuracy and interpretability in predicting surgical outcomes.

Figure 2 LASSO regression analysis for variable selection. It shows the results of the LASSO regression analysis, used for selecting the most significant predictors in the model for surgical difficulty in patients with gallstones. (A) The coefficient profiles of the variables as a function of the regularization parameter, log(λ). Each curve represents a different variable, and as log(λ) increases (moving left to right), coefficients shrink towards zero, demonstrating the penalization effect. The variables with non-zero coefficients at higher log(λ) values are considered the most significant predictors. (B) The AUC performance as a function of log(λ). The red dots represent the AUC values, while the gray error bars indicate the standard errors. The vertical dashed line marks the log(λ) value where the AUC is maximized, reflecting the optimal balance between model complexity and predictive accuracy. AUC, area under the curve; LASSO, least absolute shrinkage and selection operator.

Optimal variable selection in SVM model

The results from Figure 3 highlight the performance of the SVM model when varying numbers of variables are included in the prediction. The model achieved its highest accuracy (0.830) with four variables, emphasizing the importance of selecting an optimal number of predictors for the model to perform effectively. Including fewer or more than four variables resulted in a decrease in accuracy, with a significant drop observed when eight variables were included, further reinforcing the need for careful feature selection in machine learning models. These findings suggest that an overabundance of variables may introduce noise or overfitting, thus reducing model efficiency, while the optimal selection of variables can lead to better predictive performance. The results of the analysis identified the top five variables that most significantly contribute to the model’s accuracy: WBC, CAR, NLR, gallbladder wall thickness (thick), and PLR. These variables were consistently selected during the feature selection process, underscoring their importance in predicting surgical difficulty. The inclusion of these variables optimizes the model’s predictive capability while avoiding unnecessary complexity.

Figure 3 Optimal variable selection in SVM model. The figure presents the cross-validation accuracy of the SVM model across different numbers of variables used in the prediction. The X-axis denotes the number of variables included in the model, ranging from 1 to 10, and the Y-axis represents the model’s accuracy during cross-validation. The highest accuracy, 0.830, is observed when 4 variables are used. The accuracy shows fluctuations as more variables are incorporated, with a sharp decline at 8 variables. The accuracy slightly improves with additional variables but remains below the peak achieved with 4 variables. SVM, support vector machine.

Decision tree analysis for predicting surgical difficulty

Figure 4 provides a visual representation of a decision tree model used to classify surgical difficulty in patients based on key variables: WBC, CAR, and gallbladder wall thickness. The tree illustrates that WBC serves as the most critical initial predictor. Patients with WBC levels below 8.055 are primarily associated with easier surgeries (Node 2), whereas those with higher WBC levels require further stratification by CAR. For patients with a CAR below 0.108, the likelihood of surgical difficulty remains low (Node 4). However, those with higher CAR values are further divided based on gallbladder wall thickness, where thickness greater than 2 mm (Node 7) strongly correlates with increased surgical difficulty. This hierarchical structure effectively highlights how the combination of these clinical variables can be used to stratify patients and predict surgical outcomes, aiding in preoperative planning and risk assessment.

Figure 4 A decision tree model used to predict surgical difficulty in patients with gallstones, based on key clinical variables. The tree structure illustrates the sequential decision-making process, starting with the WBC as the primary node. Subsequent splits are based on CAR and gallbladder wall thickness, which further refine the prediction. The decision tree begins at the root node with WBC, where patients are divided into two groups based on whether their WBC is less than or greater than or equal to 8.055. Patients with a WBC of less than 8.055 generally fall into Node 2, which predominantly predicts an easy surgery (with a large proportion of outcomes labeled “0”). Those with higher WBC values are further split based on CAR levels, with those having a CAR of 0.108 or higher being further divided by gallbladder wall thickness. The terminal nodes, Nodes 6 and 7, represent the most refined predictions, showing that patients with higher CAR and gallbladder wall thickness greater than 2 mm are more likely to experience surgical difficulty (indicated by a larger proportion of outcomes labeled “1” in Node 7). CAR, C-reactive protein to albumin ratio; WBC, white blood cell count.

Nomogram analysis for predicting surgical difficulty

Figure 5 illustrates a nomogram that consolidates various clinical predictors to estimate the likelihood of surgical difficulty in patients undergoing cholecystectomy. The nomogram assigns points to each predictor, such as CAR, NLR, and WBC, based on their respective values or categories. For instance, higher values of CAR (≥0.108), NLR (≥3.479), and WBC (≥8.055) contribute more points, indicating a higher risk of surgical difficulty. The total points are calculated by summing the individual scores from each predictor. This total is then used to derive a predicted probability of surgical difficulty, shown at the bottom of the nomogram. The prediction rate provides a quantitative measure that clinicians can use to guide preoperative decision-making. The nomogram effectively synthesizes multiple risk factors into a single, interpretable tool, facilitating the identification of patients at higher risk for challenging surgeries.

Figure 5 A nomogram developed to predict the likelihood of surgical difficulty in patients with gallstones. The nomogram integrates several clinical variables to estimate the probability of a difficult surgery. The variables included are CAR, NLR, WBC, liver function score, gallbladder wall thickness, the clarity of Calot’s triangle, and age. Each variable is plotted along a line, where specific ranges or categories of the variable correspond to different point values. These points are then summed to give a total score, which is mapped to the prediction rate at the bottom of the nomogram. This prediction rate indicates the probability of surgical difficulty, allowing clinicians to assess preoperative risk more accurately. CAR, C-reactive protein to albumin ratio; NLR, neutrophil-to-lymphocyte ratio; WBC, white blood cell count.

ROC curve analysis for training and validation sets

Figure 6 illustrates the ROC curves for the predictive models applied to the training and validation sets. Figure 6A shows that the model, when tested on the training set, achieved an AUC of 0.774, indicating a solid foundation in distinguishing between easy and difficult surgeries. This performance demonstrates the model’s initial effectiveness during the development phase. However, the model’s performance improved when applied to the validation set, as shown in Figure 6B, where the AUC increased to 0.863. This higher AUC suggests that the model not only performed well on the training data but also generalized effectively to the validation data, confirming its reliability and accuracy in predicting surgical difficulty across different patient groups. This validation provides strong evidence of the model’s utility in clinical practice.

Figure 6 ROC curves for two predictive models assessing surgical difficulty in patients with gallstones. The ROC curve plots sensitivity (true positive rate) against 1-specificity (false positive rate) to evaluate the model’s discriminative ability. (A) The ROC curve for the initial predictive model, with an AUC of 0.774; (B) the ROC curve for a refined model, which achieved a higher AUC of 0.863, indicating improved performance in distinguishing between easy and difficult surgeries. AUC, area under the curve; ROC, receiver operating characteristics.

Calibration curve analysis for training and validation sets

Figure 7 illustrates the calibration curves for the predictive model applied to both the training and validation sets. Figure 7A shows that the model’s predictions are well-calibrated for the training set, as evidenced by the solid red line closely following the ideal diagonal line. The mean absolute error of 0.035 suggests that the model’s predicted probabilities are highly consistent with the actual outcomes, confirming the model’s reliability during the development phase. Figure 7B shows the calibration curve for the validation set, where the model’s performance remains robust, though with a slightly higher mean absolute error of 0.05. This indicates that while the model’s predictions for new data are slightly less accurate than for the training data, the calibration is still within acceptable limits. The overall alignment of the calibration curves in both panels suggests that the predictive model is well-calibrated and reliable for estimating surgical difficulty across different datasets.

Figure 7 Calibration curves for the predictive model assessing surgical difficulty in patients with gallstones. Calibration curves evaluate the agreement between predicted probabilities and actual outcomes. The dashed diagonal line represents the ideal calibration, where predictions perfectly match the actual outcomes. The solid red line represents the model’s calibration after adjustment, and the dotted blue line shows the original calibration. (A) The calibration curve for the training set, with a mean absolute error of 0.035, indicating a good fit between the predicted and actual probabilities. (B) The calibration curve for the validation set, with a slightly higher mean absolute error of 0.05, but still demonstrating acceptable agreement between the predictions and actual outcomes.

Results from DCA

In the training set, Model C (nomogram) demonstrates the highest net benefit across most of the threshold range, suggesting superior performance in predicting surgical difficulty when integrating multiple clinical parameters. Model A shows moderate performance, while Model B, which relies solely on WBC, offers the least net benefit, particularly at lower threshold values. This indicates that while WBC is informative, its predictive power alone is limited compared to models that utilize a combination of clinical features. Similar trends are observed in the validation set, where Model C consistently outperforms the other models across nearly all thresholds. This underscores the robustness and generalizability of the nomogram in external settings. Model A performs slightly better in the validation set compared to the training set but still does not reach the performance level of Model C. Model B’s performance remains the lowest, highlighting the importance of a multifactorial approach in the predictive modeling of surgical outcomes. The DCA effectively illustrates the practical utility of each model by quantifying the trade-offs between the benefits of correctly identifying high-risk cases and the risks of unnecessary interventions. The superior performance of Model C in both training and validation sets advocates for its use in clinical practice, providing a strong tool for decision-making in the management of patients undergoing LC (Figure 8).

Figure 8 Decision curve analysis for three predictive models across the training set (A) and validation set (B). The models are assessed based on their net benefit across a range of high-risk thresholds. Model A, which incorporates features related to Calot’s triangle and gallbladder wall thickness, is shown in red. Model B, which uses WBC as a predictor, is depicted in blue. Model C, the comprehensive nomogram, is represented in orange. The lines labeled “All” and “None” serve as references, indicating the net benefit if all or no variables are treated based on the predicted risk. WBC, white blood cell count.

Discussion

In this study, we developed and validated a predictive nomogram model to assess surgical difficulty in patients undergoing LC for gallstones. Utilizing a combination of clinical parameters, including CAR, WBC, and gallbladder wall thickness, along with a liver function score derived from LASSO regression, our model demonstrated strong predictive performance across both training and validation sets. The nomogram outperformed simpler models that relied on single predictors, as evidenced by superior net benefit in DCA and higher AUC values in ROC analysis. These findings suggest that our comprehensive model provides a reliable tool for preoperative risk stratification, potentially improving surgical outcomes by enabling more tailored clinical decision-making.

Our findings align with those of previous studies that have investigated predictors of surgical difficulty in LC. Similar to our work, previous research has emphasized the importance of preoperative inflammatory markers, such as WBC and CAR, in predicting surgical outcomes (16-18). For instance, Jessica Mok et al. (17) demonstrated that elevated CRP was significantly associated with increased surgical complexity, echoing the predictive value observed in our study. However, our study differs in its comprehensive approach, integrating these markers with other critical factors like gallbladder wall thickness and liver function scores, which were identified using advanced machine learning techniques. While some earlier studies, such as one of the studies by Kokoroskos et al. (19), has explored the role of imaging features like gallbladder wall thickness, they did not incorporate systemic inflammatory markers or utilize a multivariable predictive model like our nomogram. By combining multiple clinical and laboratory parameters, our model provides a more nuanced and accurate prediction of surgical difficulty, offering a significant improvement over single-factor model.

One of the distinctive aspects of our study is the application of advanced machine learning techniques, such as LASSO regression and decision tree analysis, to identify and integrate significant predictors into a comprehensive nomogram model. These methods allowed for the selection of the most relevant variables while minimizing overfitting, thereby enhancing the model’s generalizability and robustness (20-22). Unlike traditional statistical methods that may struggle with collinearity and overfitting, LASSO regression effectively shrinks less important coefficients to zero, leaving only the most impactful predictors in the model. This approach was critical in refining our predictive model to include key variables like CAR, WBC, and gallbladder wall thickness, along with the liver function score, resulting in a model that demonstrated superior performance across both internal and external validations. Additionally, the use of DCA provided a nuanced understanding of the clinical utility of our model by evaluating the net benefit across a range of risk thresholds, a method that has been increasingly recognized for its ability to inform decision-making in clinical settings (23). The robust performance of our nomogram, as evidenced by high AUC values and favorable calibration plots, underscores the effectiveness of integrating machine learning techniques with traditional clinical parameters, offering a significant advancement in the preoperative assessment of surgical difficulty.

In the context of the Diagnosis-Related Groups (DRG) system, this study focused on readily available clinical and imaging indicators, such as systemic inflammatory markers and ultrasound findings, to predict LC difficulty. While advanced imaging modalities, such as CT or magnetic resonance cholangiopancreatography (MRCP), can provide additional diagnostic details, these were not routinely performed in our cohort. Additionally, the study population included patients with diverse presentations, including acute and chronic cholecystitis, as well as asymptomatic gallbladder stones, ensuring broader applicability of the findings. Future studies could explore the integration of advanced imaging findings to further enhance predictive models in populations with more specific indications for these modalities.

Despite the promising results, our study has several limitations that should be acknowledged. Firstly, the retrospective nature of the study introduces potential biases in data collection and patient selection, which could impact the generalizability of our findings (24). Secondly, the study was conducted at a single center, which may limit the applicability of the results to broader populations with different demographic and clinical characteristics. Thirdly, while our model included several important clinical variables, there may be other relevant factors that were not captured in our dataset, such as patient comorbidities or specific surgical techniques, which could further enhance the predictive accuracy of the model. Lastly, the external validation was performed using a dataset from the same institution, and future studies should aim to validate the model in diverse, multicenter cohorts to ensure its robustness.


Conclusions

This study demonstrates the utility of integrating clinical parameters with advanced analytical methods to improve the preoperative assessment of surgical difficulty in LC. The nomogram developed, which includes key variables such as CAR, WBC, and gallbladder wall thickness, offers a reliable tool for risk stratification and surgical decision-making. The use of machine learning techniques has shown promise in enhancing predictive accuracy and providing more personalized care. Moving forward, prospective, multicenter studies should be conducted to validate and refine this model across diverse populations. Additionally, incorporating more advanced machine learning algorithms may further improve predictive capabilities and clinical utility, ultimately leading to more precise surgical planning and better patient outcomes.


Acknowledgments

None.


Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://tgh.amegroups.com/article/view/10.21037/tgh-24-124/rc

Data Sharing Statement: Available at https://tgh.amegroups.com/article/view/10.21037/tgh-24-124/dss

Peer Review File: Available at https://tgh.amegroups.com/article/view/10.21037/tgh-24-124/prf

Funding: This work was supported by the following funding sources: the Health Commission of Mianyang City Foundation Project (No. 202309), the Mianyang Traditional Chinese Medicine Hospital Foundation Project (No. MYSZYYKT2023117), and the Mianyang Chinese Medicine Society Inheritance and Innovation Project (No. MYSZYYXH-202426).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tgh.amegroups.com/article/view/10.21037/tgh-24-124/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. This study was approved by the Institutional Review Board of Mianyang Hospital of Traditional Chinese Medicine (No. 2025-001). Given the retrospective nature of this study, formal consent was not required.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Cuschieri A. Laparoscopic cholecystectomy. J R Coll Surg Edinb 1999;44:187-92. [PubMed]
  2. Keus F, de Jong JA, Gooszen HG, et al. Laparoscopic versus open cholecystectomy for patients with symptomatic cholecystolithiasis. Cochrane Database Syst Rev 2006;CD006231. [Crossref] [PubMed]
  3. Soper NJ, Stockmann PT, Dunnegan DL, et al. Laparoscopic cholecystectomy. The new 'gold standard'? Arch Surg 1992;127:917-21; discussion 921-3. [Crossref] [PubMed]
  4. Kalata S, Thumma JR, Norton EC, et al. Comparative Safety of Robotic-Assisted vs Laparoscopic Cholecystectomy. JAMA Surg 2023;158:1303-10. [Crossref] [PubMed]
  5. Humm GL, Peckham-Cooper A, Chang J, et al. Surgical experience and identification of errors in laparoscopic cholecystectomy. Br J Surg 2023;110:1535-42. [Crossref] [PubMed]
  6. Horvath KD. Strategies for the prevention of laparoscopic common bile duct injuries. Surg Endosc 1993;7:439-44. [Crossref] [PubMed]
  7. Seshadri A, Peitzman AB. The difficult cholecystectomy: What you need to know. J Trauma Acute Care Surg 2024;97:325-36. [Crossref] [PubMed]
  8. Lopez-Lopez V, Kuemmerli C, Maupoey J, et al. Textbook outcome in patients with biliary duct injury during cholecystectomy. J Gastrointest Surg 2024;28:725-30. [Crossref] [PubMed]
  9. Bektaş M, Tuynman JB, Costa Pereira J, et al. Machine Learning Algorithms for Predicting Surgical Outcomes after Colorectal Surgery: A Systematic Review. World J Surg 2022;46:3100-10. [Crossref] [PubMed]
  10. Gu J, Xie R, Zhao Y, et al. A machine learning-based approach to predicting the malignant and metastasis of thyroid cancer. Front Oncol 2022;12:938292. [Crossref] [PubMed]
  11. Petersen KK, Lipton RB, Grober E, et al. Predicting Amyloid Positivity in Cognitively Unimpaired Older Adults: A Machine Learning Approach Using A4 Data. Neurology 2022;98:e2425-35. [Crossref] [PubMed]
  12. Panni RZ, Strasberg SM. Preoperative predictors of conversion as indicators of local inflammation in acute cholecystitis: strategies for future studies to develop quantitative predictors. J Hepatobiliary Pancreat Sci 2018;25:101-8. [Crossref] [PubMed]
  13. Yang C, Liu Z, Fang Y, et al. Development and validation of a clinic machine-learning nomogram for the prediction of risk stratifications of prostate cancer based on functional subsets of peripheral lymphocyte. J Transl Med 2023;21:465. [Crossref] [PubMed]
  14. Fares MY, Liu HH, da Silva Etges APB, et al. Utility of Machine Learning, Natural Language Processing, and Artificial Intelligence in Predicting Hospital Readmissions After Orthopaedic Surgery: A Systematic Review and Meta-Analysis. JBJS Rev 2024; [Crossref] [PubMed]
  15. Kwiatkowska-Miernik A, Wasilewski PG, Mruk B, et al. Estimating Progression-Free Survival in Patients with Primary High-Grade Glioma Using Machine Learning. J Clin Med 2024;13:6172. [Crossref] [PubMed]
  16. Di Buono G, Romano G, Galia M, et al. Difficult laparoscopic cholecystectomy and preoperative predictive factors. Sci Rep 2021;11:2559. [Crossref] [PubMed]
  17. Jessica Mok KW, Goh YL, Howell LE, et al. Is C-reactive protein the single most useful predictor of difficult laparoscopic cholecystectomy or its conversion? A pilot study. J Minim Access Surg 2016;12:26-32. [Crossref] [PubMed]
  18. Díaz-Flores A, Cárdenas-Lailson E, Cuendis-Velázquez A, et al. C-Reactive Protein as a Predictor of Difficult Laparoscopic Cholecystectomy in Patients with Acute Calculous Cholecystitis: A Multivariate Analysis. J Laparoendosc Adv Surg Tech A 2017;27:1263-8. [Crossref] [PubMed]
  19. Kokoroskos N, Peponis T, Lee JM, et al. Gallbladder wall thickness as a predictor of intraoperative events during laparoscopic cholecystectomy: A prospective study of 1089 patients. Am J Surg 2020;220:1031-7. [Crossref] [PubMed]
  20. Kang J, Choi YJ, Kim IK, et al. LASSO-Based Machine Learning Algorithm for Prediction of Lymph Node Metastasis in T1 Colorectal Cancer. Cancer Res Treat 2021;53:773-83. [Crossref] [PubMed]
  21. Deo RC. Machine Learning in Medicine. Circulation 2015;132:1920-30. [Crossref] [PubMed]
  22. Jiang T, Gradus JL, Rosellini AJ. Supervised Machine Learning: A Brief Primer. Behav Ther 2020;51:675-87. [Crossref] [PubMed]
  23. Vickers AJ, Elkin EB. Decision curve analysis: a novel method for evaluating prediction models. Med Decis Making 2006;26:565-74. [Crossref] [PubMed]
  24. Talari K, Goyal M. Retrospective studies - utility and caveats. J R Coll Physicians Edinb 2020;50:398-402. [Crossref] [PubMed]
doi: 10.21037/tgh-24-124
Cite this article as: Huang K, Jia S, Yuan X, Zhao P, Bai D. Development and validation of a machine learning-based nomogram for preoperative prediction of laparoscopic surgical difficulty in gallstone patients. Transl Gastroenterol Hepatol 2025;10:49.

Download Citation