Current status of artificial intelligence analysis for the diagnosis of gallbladder diseases using ultrasonography: a scoping review
Highlight box
Key findings
• The present study comprehensively summarizes and analyzes the current status, limitations, and future perspectives of artificial intelligence (AI)-assisted ultrasonography in gallbladder diseases (GBDs).
What is known and what is new?
• While ultrasound (US) is the first-line imaging method for GBDs, using only visual judgment from US images to stratify the risk of GBDs is challenging. In addition, the diagnostic ability of sonographers is highly correlated with knowledge reserves, clinical experience, and proficiency in operation.
• We provide a comprehensive summary and analysis of the applications of US-based AI technology in various GBDs, highlight the significance and pitfalls of AI-based technologies.
What is the implication, and what should change now?
• Before transforming AI technology into widespread clinical applications, the advantages, limitations and public challenges should be clearly clarified. In the near future, AI has the potential to become a breakthrough in the diagnosis of GBDs, supporting doctors in improving the diagnostic ability of GBDs with ultrasonography.
Introduction
The gallbladder (GB) is a pear-shaped pouch located behind the liver under the right rib that concentrates and stores bile (1,2). Gallbladder diseases (GBDs) are common causes of upper abdominal pain, affecting a large portion of the population every year. The incidence of GBDs increases with age, with females being more common than males (1). Acute cholecystitis is currently the most common manifestation of GBDs, but gallstones, GB polyps, chronic cholecystitis, and gallbladder cancer (GBC) are also manifestations of GBDs (2). Many imaging techniques, such as transabdominal ultrasound (US), endoscopic US, magnetic resonance imaging (MRI), computed tomography (CT), or positron emission tomography-computed tomography (PET-CT), have been used for the diagnosis of GBDs. US examination is undoubtedly the preferred imaging method for diagnosing GBDs, with the advantages of easy acquisition and no radiation.
GBDs are complex and diverse, and sometimes it is difficult to accurately identify atypical gallstones, polypoid lesions, inflammation, and tumors. Previous studies have shown that some sonographic features are associated with GB neoplastic lesions; however, using only visual judgment of US images for risk stratification of GB lesions is challenging (3-6). Contrast enhanced ultrasound (CEUS) is a novel imaging technique that distinguishes GBDs by providing information on tissue vascular distribution and blood flow perfusion (7-9). Some studies have reported that the perfusion characteristics of CEUS and the integrity of the GB wall can help distinguish between benign and malignant GBDs (7,10-12). However, the high experience dependence, increased cost, low availability, anaphylactic reaction and potential contraindications of contrast agents limit the clinical application of CEUS. For this reason, many automated diagnostic systems based on artificial intelligence (AI) have been developed to provide accurate diagnosis in a time and cost saving perspective.
Recently, the application of AI in medical image recognition has attracted widespread attention (13). AI is a mathematical prediction technique which simulates human intelligence processes by computer systems, enabling machines to learn, reason, and solve problems. Deep learning (DL) is a branch of machine learning (ML) that focuses on more complex data processing and feature learning using multi-layer neural network structures. DL is often used in AI algorithms and has been applied in medical diagnosis.
This review aims to provide a comprehensive summary and analysis of the application of US-based (referring to transabdominal US) AI technology in various GBDs. In addition, it evaluates the diagnostic ability of US-based AI technology in GBDs based on the findings of published articles. We present this article in accordance with the PRISMA-ScR reporting checklist (available at https://tgh.amegroups.com/article/view/10.21037/tgh-24-61/rc).
Methods
Search strategy
The search strategies are summarized in Table 1. In this review, we searched the PubMed and WILEY databases using predefined keywords for articles published over the past two decades (from January 2003 to December 2023) to evaluate research progress in this field. Articles were screened for relevant publications about US-based AI applications in GBDs. Terms in the title, abstract, and keywords were searched as follows: (“artificial intelligence” OR “AI” OR “machine learning” OR “deep learning” OR “convolutional neural network” OR “computer-assisted” OR “computer-aided” OR “neural network” OR “digital image analysis” OR “digital image processing”) AND (“ultrasound” OR “ultrasonography” OR “US”) AND (“gall bladder” OR “gallbladder” OR “GB” OR “cholecyst”).
Table 1
Items | Specification |
---|---|
Date of search | December 1, 2023 |
Databases and other sources searched | PubMed and WILEY databases |
Search terms used | Terms in the title, abstract, and keywords were searched as follows: (“artificial intelligence” OR “AI” OR “machine learning” OR “deep learning” OR “convolutional neural network” OR “computer-assisted” OR “computer-aided” OR “neural network” OR “digital image analysis” OR “digital image processing”) AND (“ultrasound” OR “ultrasonography” OR “US”) AND (“gall bladder” OR “gallbladder” OR “GB” OR “cholecyst”) |
Timeframe | From January 2003 to December 2023 |
Inclusion criteria | (I) Evaluate US-based AI technology for the diagnosis of GBDs; (II) the final diagnosis was determined by the histopathological diagnosis obtained via surgery or long-term follow-up; (III) AI technology has been applied in US diagnosis of GBDs; (IV) the research findings detailed the diagnostic performance of AI models, including metrics such as area under the curve, sensitivity, specificity and accuracy |
Exclusion criteria | (I) AI technology was not used in the study; (II) research refers to meeting proceedings, abstracts, editorials, case reports, or letters; (III) articles with repetitive content |
AI, artificial intelligence; US, ultrasound; GB, gallbladder; GBD, gallbladder disease.
Four independent authors (X.W., H.Z., Z.B. and X.X.) reviewed the title, abstract, and full text of eligible articles. Data extraction was carried out independently by four reviewers. Any differences in the extracted data shall be resolved by mutual agreement. Extracted data such as author name, year of publication, study population, sample size, AI-based applications, etc. were combined into one table (Table S1). Figure 1 shows the search strategy using the flow diagram.
Inclusion and exclusion criteria
AI technology refers to a broad field of computer science that deals with the simulation, extension, and enhancement of human intelligence. It covers various areas such as ML, DL, natural language processing, computer vision, and robotics. The inclusion criteria are as follows: (I) evaluated US-based AI technology for the diagnosis of GBDs; (II) the final diagnosis was determined by the histopathological diagnosis obtained via surgery or long-term follow-up; (III) AI technology has been applied in US diagnosis of GBDs; (IV) the research findings detailed the diagnostic performance of AI models, including metrics such as area under the curve (AUC), sensitivity, specificity and accuracy. The AUC can be defined as the area underneath the receiver operating characteristic (ROC) curve, quantifying the classifier’s ability to distinguish between positive and negative classes across different thresholds. Accuracy measures the proportion of correct predictions among all predictions. Sensitivity refers to the proportion of people who are correctly diagnosed as having the disease (true positive). Specificity refers to the proportion of people who are correctly diagnosed as not having the disease (true negative) among all people who are actually not suffering from the disease.
Exclusion criteria are as follows: (I) AI technology was not used in the study; (II) researches were meeting proceedings, abstracts, editorials, case reports, or letters; (III) the content of the articles was repetitive.
Results
AI in medical imaging
The term AI is first coined by McCarthy in the 1950s (14-17). It is a computer program developed by humans that can mimic the thinking and judgment of the human brain. In recent decades, due to the improvement in computing power and the availability of large datasets, the application of AI has experienced unprecedented growth (18). Among many mathematical techniques, ML and DL are the main architectures of AI, widely used in medical image recognition.
ML
ML is a term first proposed by Arthur Samuel (17,19), and is a branch of AI used to “extract knowledge from large amounts of data” (18,20). That is to say, it is an algorithm that automatically analyzes and obtains rules from data, and uses these rules to predict unknown data. After the training is completed, the mathematical model is generated and the results are predicted using test set different from the training set to evaluate the ability of the generated model. The six common ML techniques are decision trees, naive Bayesian predictors, random forests, support vector machines, artificial neural networks, and deep neural networks (DL) (14,21-24). In medical imaging analysis, ML algorithm is a key component of computer-aided diagnosis (CAD) systems and radiomics researches (17).
Neural network is a ML technique based on the use of multiple neurons. These neurons are interconnected in a hierarchical manner, forming a network structure similar to the human brain nervous system. In neural networks, input data is transmitted and processed through multiple layers of neurons, ultimately producing output results. Through training and learning, neural networks can learn and recognize patterns and rules for classification, prediction, and solving other ML tasks. Neural networks include input layers, hidden layers, output layers, weights, biases, activation functions, optimizers, and loss functions. The part of a neural network with multiple hidden layers is called DL.
DL
DL is a term first proposed by Rina Dechter in 1986 (17,25). It is a new type of ML method developed through the development of artificial neural networks and is considered a hallmark of human neural structure (17,26). DL belongs to the field of ML and has broad application prospects. DL mainly involves learning the inherent patterns and representation levels of sample data, enabling machines to have analytical and learning abilities similar to those of humans. The ultimate goal of DL is to enable machines to recognize and interpret various data, such as text, images, and sound, in order to achieve the goals of AI. The development of deep convolutional neural network (CNN) technology has led to significant breakthroughs of AI in the field of computer vision. Reasonably applying AI to US medical images can significantly improve the efficiency and accuracy of clinical diagnosis.
CAD
CAD is a method of using AI technology to assist doctors in diagnosis. The CAD system analyzes medical imaging and other possible physiological and biochemical indicators, combined with AI algorithms and models, to automatically identify and classify lesions and provide diagnostic recommendations. CAD systems typically use ML techniques such as DL to train on how to identify and classify lesions through a large amount of medical data. These trained models can automatically analyze new medical images, detect lesions, and provide corresponding diagnostic information. It can effectively improve image quality and reduce misdiagnosis and missed diagnosis rates (17,27,28). Good expected results have been achieved in medical image reconstruction, medical image segmentation, and medical image fusion (17,29).
Radiomics
Radiomics studies medical imaging images. Its research method is to extract all the information contained in the image and conduct comprehensive and systematic analysis (17). Specifically, radiomics uses automated algorithms to extract a large amount of feature information from the region of interest (ROI) of images as research objects, and further uses diverse statistical analysis and data mining methods to extract the key information that truly works from a large amount of information, ultimately used for auxiliary diagnosis, classification, or grading of diseases. Radiomics has broken through the limitations of relying on subjective interpretation of images by physicians, which is widely used in clinical practice. It expands the great guiding value of medical imaging in clinical practice and promotes the breakthrough and leap development of medical imaging informatics. This is of great significance for promoting the development of precision medicine.
Application of AI in GBDs
The classification of GBDs and current distribution of US-based AI applications in the field of GB or GB related disorders are shown in Table 2.
Table 2
Field that AI has been applied |
Gallbladder polyps |
Gallbladder cancer |
Asymptomatic cholelithiasis |
Biliary atresia |
Field that AI has not yet been applied |
Acute calculous cholecystitis |
Acute acalculous cholecystitis |
Chronic cholecystitis |
Chronic cholecystitis |
Porcelain gallbladder |
Mirizzi syndrome |
Gallbladder hydrops |
GBD, gallbladder disease; US, ultrasound; AI, artificial intelligence.
GB polyps
GB polyps are found in 3–7% of healthy individuals (1,30-32). They are classified as benign or malignant, and can be further subdivided into non-neoplastic or neoplastic polyps (1). Non-neoplastic polyps account for 95% of polyps and have a lower risk of malignancy (1,33,34). Neoplastic polyps can be benign or malignant. Adenoma is the most common neoplastic polyp and is often considered a lesion with malignant potential (1,35). Neoplastic malignant polyps consist of adenocarcinoma, squamous cell carcinoma, mucinous cystadenoma, and adenoacanthomas (36,37).
According to the latest European guidelines (38-40), GB polyps with a length greater than 10mm are one of the surgical indications (34,41,42). Cholecystectomy is an invasive examination, and the final pathological result of these surgeries is mostly non-neoplastic polyps (38). For benign lesions, simple cholecystectomy can meet the treatment needs, but for malignant lesions, comprehensive evaluation is needed to determine the specific treatment strategy (7,43,44). Therefore, accurately distinguishing between GB neoplastic and non-neoplastic lesions, or GB malignant from benign lesions is crucial for preoperative evaluation.
Transabdominal US examination is the most common method for evaluating polyps. The commonly accepted risk factor associated with malignant lesions is the size of the lesion. The accuracy of abdominal US examination for polyps is various, with a false positive detection rate of up to 85% (1,45). Several studies have applied different AI algorithms to identify true and pseudo-GB polyps or neoplastic GB polyps. It is concluded that US-based AI may become a good choice for diagnosing GB polyps (Table 3).
Table 3
Study | AI model | Patient population | Diagnostic ability | Study design | Validation |
---|---|---|---|---|---|
Jeong et al. (46) | DL | 535 | AUC 0.92 | Retro | Internal (hold-out) |
Kim et al. (47) | DL | 501 | Accuracy 0.8761, specificity 0.8835, AUC 0.9082 | Retro + multi | Internal (cross) |
Choi et al. (48) | DL | 263 | Efficacy 0.944, accuracy 0.858, sensitivity 0.856, specificity 0.861 | Retro | Internal (cross) |
Li et al. (49) | Bayesian network | 1,296 | AUC 75.13%, accuracy 80.47% | Retro + multi | Internal (cross) |
Yuan et al. (50) | Radiomics of dual modal ultrasound | 100 | AUC 0.850±0.090, accuracy 0.828±0.097, sensitivity 0.892±0.144, specificity 0.803±0.149, Youden’s index 0.695±0.157 (mean ± SD) | Retro | Internal (cross) |
Chen et al. (51) | ML | 224 | Accuracy 87.54%, sensitivity 86.52%, specificity 89.40% | Retro | Internal (cross) |
Wang et al. (7) | ML-based ultrasound radiomics models | 640 | Discriminating neoplastic from non-neoplastic gallbladder lesions: AUC 0.822–0.853a; discriminating carcinomas from benign gallbladder lesions: AUC 0.904–0.979b | Pros + multi | Internal (cross) |
Li et al. (38) | Bayesian network | 759 | Accuracy 82.35% | Retro + multi | Internal (hold-out) |
Yuan et al. (41) | Ultrasound radiomic analysis | 96 | Accuracy 0.875, sensitivity 0.885, specificity 0.857, AUC 0.898 | Retro | Internal (cross) |
a, the researchers used Validation set, Test set A and Test set B to obtain the average AUC values of 0.837, 0.822 and 0.853, respectively, so the AUC was expressed as 0.822–0.853 in the original study. b, the researchers used Validation set, Test set A and Test set B to obtain the average AUC values of 0.904, 0.909 and 0.979 respectively, so the AUC was expressed as 0.904–0.979 in the original study. AI, artificial intelligence; US, ultrasound; AUC, area under the curve; Pros, prospective study; Retro, retrospective study; multi, multicenter; cross, cross-validation; DL, deep learning; ML, machine learning; SD, standard deviation.
Both DL and ML play an important role in differentiating benign and malignant polypoid lesions of GB. Jeong et al. developed and studied the value of DL-based decision support system (DL-DSS) in the diagnosis of GB polypoid lesions by US (46). In their study, three radiologists with different levels of experience retrospectively graded the likelihood of neoplastic polyps and re-evaluated their scores using the DL-DSS assistant. With the assistance of DL-DSS, the AUC of reviewers has increased to 0.91–0.95, indicating that DL-DSS can serve as an auxiliary tool to reduce the gap between reviewers. Kim et al. proposed a comprehensive model that combines three CNN models to classify GB polyps smaller than 20 mm in US images (47). The true diagnosis of polyps was achieved through an integrated model learned from US images, patient age, and polyp size, with an AUC of 0.896 and an accuracy of 83.63%. In 2023, Choi et al. conducted a multicenter retrospective study to evaluate the accuracy of DL-based image analysis in distinguishing between neoplastic and non-neoplastic GB polyps, and identified the potential application of CAD in clinical practice. They found that the effectiveness of using DL to distinguish between neoplastic and non-neoplastic GB polyps was 0.944 (accuracy, 0.858; sensitivity, 0.856; specificity, 0.861) (48). Therefore, they concluded that DL-based models can accurately distinguish neoplastic polyps from non-neoplastic polyps, which can help inexperienced doctors improve diagnostic accuracy.
Li’s team predicted the risk of neoplastic polyps in GB polyps of 10–20 and 8–15 mm based on Bayesian networks. The accuracy of the former in the training and testing sets was 81.88% and 82.35%, respectively (38). The accuracy of the latter in the training and testing sets is 75.58% and 80.47%, respectively (49). The results indicate that the Bayesian network prediction model can effectively identify neoplastic polyps.
In recent years, the field of radiomics technology based on AI has developed rapidly. Computers process massive datasets through hierarchical mathematical models, which can detect patterns that cannot be deciphered using biostatistics (41,52). Many researchers have made progress in the field of radiomics. Yuan et al. included 99 cases of GB polyps confirmed by pathology among 96 patients with GB adenoma and GB cholesterol polyps, and obtained spatial and morphological features on preoperative US images (41). The classification accuracy, sensitivity, specificity, and AUC values were 0.875, 0.885, 0.857, and 0.898, respectively. Wang et al. prospectively examined 640 pathologically confirmed GB masses in 640 patients to investigate the diagnostic performance of the ML-based US radiomics model in distinguishing the risk of GB masses (discrimination of neoplastic from non-neoplastic lesions, and discrimination of malignant from benign lesions), and whether the ML-based US radiomics model can outperform CEUS in diagnostic performance (7). Research has shown that both the US radiomics model used to distinguish between neoplastic and non-neoplastic lesions, as well as the US radiomics model used to distinguish between malignant and benign lesions, exhibit higher diagnostic performance than traditional US models, and the US radiomics model performs better than the CEUS model in distinguishing GBC. Yuan et al. conducted a retrospective study based on bimodal US using B-mode and ultra small vessel imaging (SMI), revealing that radiomics analysis can improve the classification accuracy of GB neoplastic polyps and cholesterol polyps (50). Therefore, compared with traditional imaging methods, radiomics can effectively improve the diagnostic accuracy of preoperative identification of GB neoplastic polyps, providing important references for the surgical diagnosis and treatment of GB polyps.
CAD systems have enormous potential in image-based diagnosis. Chen et al. constructed a CAD system for differential diagnosis of neoplastic and non-neoplastic GB polyps, achieving good diagnostic results with accuracy, sensitivity, and specificity of 87.54%, 86.52%, and 89.40%, respectively (51). In addition, the diagnostic results are slightly higher than those of US experts, and the diagnostic speed is much faster, indicating that CAD diagnostic accuracy is competitive compared to professional US experts.
GBC
GBC is a highly malignant disease and is the most common malignant tumor in the biliary system (1,53). GBC shows significant geographic variation, with the highest incidence among indigenous populations in South America, northern India, and East Asia (54-56). When diagnosing GBC, it is often in the late stage because the disease lacks specific symptoms in early stage. The overall average survival rate of advanced GBC patients is 6 months, and the 5-year survival rate is <5% (56,57). Early diagnosis is crucial for improving the prognosis of GBC patients and increasing their survival rate. US is a first-line imaging modality for evaluating patients with suspected GBDs. Although it is easy to identify gallstones and other abnormalities (such as GB wall thickening) in routine US examinations, accurately detecting early signs of GBC is challenging (56,58). The above difficulties have led to the exploration of whether US-AI can help solve the above diagnostic difficulties (Table 4).
Table 4
Study | AI model/architecture | Patient population | Diagnostic ability | Study design | Validation |
---|---|---|---|---|---|
Basu et al. (59) | DL | 218 | Accuracy 0.921±0.062, specificity 0.961±0.049, sensitivity 0.923±0.062, AUC 0.971±0.028 (mean ± SD) | Retro | Internal (cross) |
Basu et al. (60) | DL | 218 | Accuracy 91.0%, specificity 95.0%, sensitivity 97.6% | Retro | Internal (cross) |
Xue et al. (61) | DL | 300 | Increased the IoU by 7.3%, the precision by 8.2%, and the recall rate by 11.1% | Retro | Internal (hold-out) |
Gupta et al. (56) | DL | 565 | Sensitivity 92.3%, specificity 74.4%, AUC 0.887 | Pros | Internal (hold-out) |
AI, artificial intelligence; US, ultrasound; IoU, the intersection of union; AUC, area under the curve; Pros, prospective study; Retro, retrospective study; cross, cross-validation; DL, deep learning; SD, standard deviation.
Basu et al. proposed a new deep neural network architecture, RadFormer (Radiology Transformer), for detecting GBC from US images and providing semantic interpretation of diagnosis like a radiologist (59). The proposed model not only helps to understand the decision-making of neural network models, but also helps to discover new visual features related to GBC diagnosis, even surpassing human radiologists in detection accuracy. Basu’s team proposed a new GBCNet technique that can extract ROIs by detecting GB (rather than cancer), and then use a new multi-scale, second-order pooling structure specifically to classify GBCs. Meanwhile, they proposed a course inspired by human vision that can reduce texture bias in GBCNet (60). Their experimental results indicate that GBCNet is significantly superior to previous SOTA (state-of-the-art) CNN models and professional radiologists. Xue et al. proposed a new Segnet algorithm to process conventional two-dimensional US images of gallstones and GBC, in order to investigate the accuracy, specificity, and sensitivity of CEUS in the GBC diagnosis (61). A study found that the optimized Segnet network algorithm improved the intersection of union (IoU) by 7.3%, precision by 8.2%, recall rate by 11.1%, and the diagnostic coincidence rates of CEUS examinations for GBC was 87.5%. Gupta et al. conducted a prospective diagnostic study to train and evaluate a multiscale, second-order pooling-based DL classifier model using the US data of patients with GB lesions (56). In the test set, the sensitivity, specificity, and AUC of the DL model in detecting GBC were 92.3% (95% CI, 88.1–95.6), 74.4% (95% CI, 65.3–79.9), and 0.887 (95% CI, 0.844–0.930), respectively, which were comparable to experienced radiologists.
GB stones
Gallstones increases are one of the most common diseases in the population, with an estimated prevalence of 10–20% (62). The incidence of gallstones increases with age, and is higher in women (1). In fact, the diagnosis of gallstones is relatively simple. However, occasionally there are some atypical stones, such as stones with moderate echo, stones that do not move with posture, or stones with GB polyps or cholecystitis, making the differential diagnosis difficult.
Obaid et al. successfully applied a deep neural network-based classification model to a rich database to simultaneously detect nine GBDs, including gallstones, and determine disease types using US imaging (2). They established a balanced database consisting of 10,692 GB US images from 1,782 patients and preprocessed and enhanced the dataset images to achieve segmentation steps. Finally, they used four DNN models to classify these images and detect nine types of GBDs. All models have achieved good results in the detection of GBDs; the MobileNet model is the best with an accuracy of 98.35%.
When using US to detect gallstones, if it is necessary to quantify the size of the GB and gallstones separately, a time-consuming and labor-intensive manual depiction is currently necessary. Lian et al. proposed a new method for segmenting GB and gallstones in US images and achieved encouraging detection results (63). For the GB and gallstones, average similarity percent of contours containing metrics dice’s similarity, overlap fraction and overlap value are 86.01% and 79.81% respectively, position error are 1.7675 and 0.5414 mm respectively, runtime is 4.2211 and 0.6603 s respectively, respectively, concluding that the proposed method has the potential to help doctors diagnose the GBD rapidly and effectively (63).
Biliary atresia (BA)
BA is a rare infant disease that affects intrahepatic and extrahepatic bile ducts, with a global prevalence of approximately 1/5,000–1/19,000 (64). It is the most common reason for liver transplantation in infants under 1 year old (64). Timely diagnosis and Kasai portal ostomy (KPE) can achieve good clinical outcomes (64-67). However, early recognition of BA in infants with cholestasis remains challenging. Researchers have found that screening for direct bilirubin concentration (64,68,69) or fecal color (64,65,70) in newborns and infants can achieve high sensitivity (97.1–100%) for early identification of BA. Recently, serum matrix metalloproteinase-7 has been reported as a valid diagnostic biomarker for BA, occupying a sensitivity of 97.0–98.7% (64,71,72). However, these screening methods are difficult to popularize and promote in countries and regions with underdeveloped medical conditions.
US examination is the initial examination method for clinically suspected BA patients (64,73-76). GB abnormalities are specific US signs for BA diagnosis. The sensitivity and specificity of GB abnormalities in BA diagnosis are higher than 90% (12,77). However, due to BA being a rare disease, accurate diagnosis of BA is very challenging for many doctors, which leads to many patients being misdiagnosed or missed, missing the best treatment opportunity (64,78).
Zhou et al. first proposed the development of an ensembled deep learning model (EDLM) for automatic and accurate recognition of infant BA-based on GB US images (64). They conducted a multicenter study and retrospectively obtained 3,705 GB US images as the training cohort, and prospectively obtained 841 GB US images as the external validation cohort. The patient-level sensitivity and specificity generated by EDLM on the internal validation dataset were 93.3% and 85.2%, respectively. The sensitivity and specificity generated on the multi-center external validation dataset were 93.1% and 93.9%, respectively, with AUC of 0.956, which was superior to that of human experts. With the help of this model, the performance of human experts at different levels has been improved. This is the first DL model for diagnosing BA-based on GB US, which will benefit infants with suspected BA in many underdeveloped regions around the world.
Discussion
Medical imaging provides a comprehensive view of both tumor and non-neoplastic lesions, which can improve the detection rate and diagnostic accuracy of early lesions, and effectively distinguish benign and malignant lesions (79). In recent decades, images have been transformed into quantitative data in various ways and subsequently analyzed by AI.
US has the advantages of portability and no radiation, making it the preferred imaging examination method for GBDs. However, the disease diagnostic ability of US physicians is highly correlated with their theoretical knowledge accumulation, clinical skills, and operational proficiency. US-AI combines AI with US examination, which can diagnose diseases early, accelerate treatment progress, and improve patient prognosis.
There are various types of GBDs, and common diseases in clinical practice include gallstones, cholecystitis, GB polyps, and GBC. The differentiation between non-neoplastic and neoplastic polyps, as well as the differentiation between benign and malignant GB tumors, remains a key and difficult issue in clinical diagnosis and treatment (80). The diagnosis of some rare diseases, such as BA, also poses challenges to clinics. At present, the application of AI in the diagnosis of GBDs is still in the initial stage of development, and most AI technologies are applied in the differential diagnosis of GB polyps and GBC. Therefore, there is still broad development space for US-based AI technology.
It is important to note that AI systems learn on a case-by-case basis (81). Due to the high degree of heterogeneity and variability in human tissue between and within subjects, there is no limited training set that can represent the variety of cases that may occur in clinical practice absolutely. From this perspective, diagnosis using AI applications alone should still be avoided in clinical practice, and lesion assessment should be achieved through a combination of clinician assessment and ML or DL results. In addition, it is worth noting that most AI-based studies on GB lesions are conducted using data that are collected retrospectively, in which a cohort is selected from patients primarily diagnosed through histopathological examination. Therefore, more prospective studies on medical AI models should be conducted to reduce the risk of overfitting and improve the accuracy of clinical outcomes.
The application of US images-based AI in GB field has quite a number of limitations. Firstly, most current studies are single-center retrospective studies, which means that the data sources obtained may lack universal representation. Therefore, models built on the basis of these data may be prone to information bias. Secondly, the standardization level of data collection and analysis is inadequate. To ensure accurate diagnosis and wide applicability, standardized data collection and analysis processes need to be established. Thirdly, the appearance of machine-human collaborative inspection has brought significant changes to the traditional doctor-patient relationship. Any misdiagnosis caused by the use of AI needs to be jointly responsible for doctors, model developers, and software platform vendors (82). With the widespread application and development of AI assisted medical diagnosis, relevant laws and regulations also need to be improved (82).
In the information age, AI is still in the early stages in US diagnosis of GBDs. In the future, large sample, multi-center prospective studies should be carried out to increase the number of diseases and improve the generalization ability of the model. High quality images or videos should be collected as far as possible, a sound data supervision system should be established, and the data should be shared globally under the premise of ensuring data confidentiality and protecting patient privacy. It is necessary to establish a clear accountability system to effectively regulate AI technology, ensure its reasonable and legal application, and minimize the harm caused by AI misdiagnosis to humans. It is worth noting that AI has shown great potential in the field of medical imaging, but this does not mean that radiologists will be replaced by AI. On the contrary, the collaboration between the two can lead to more accurate decisions in the diagnosis and treatment process, thereby improving the efficacy of disease diagnosis and promoting the further development of personalized precision medicine (83).
Conclusions
The current status, limitations, and future perspectives of AI-assisted ultrasonography in GBDs were reported. In the near future, the AI has the potential to be a breakthrough in the diagnosis of GBDs, supporting doctors in improving the diagnostic ability of GBDs with ultrasonography.
Acknowledgments
Funding: None.
Footnote
Reporting Checklist: The authors have completed the PRISMA-ScR reporting checklist. Available at https://tgh.amegroups.com/article/view/10.21037/tgh-24-61/rc
Peer Review File: Available at https://tgh.amegroups.com/article/view/10.21037/tgh-24-61/prf
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tgh.amegroups.com/article/view/10.21037/tgh-24-61/coif). The authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Lam R, Zakko A, Petrov JC, et al. Gallbladder Disorders: A Comprehensive Review. Dis Mon 2021;67:101130. [Crossref] [PubMed]
- Obaid AM, Turki A, Bellaaj H, et al. Detection of Gallbladder Disease Types Using Deep Learning: An Informative Medical Method. Diagnostics (Basel) 2023;13:1744. [Crossref] [PubMed]
- Choi TW, Kim JH, Park SJ, et al. Risk stratification of gallbladder polyps larger than 10 mm using high-resolution ultrasonography and texture analysis. Eur Radiol 2018;28:196-205. [Crossref] [PubMed]
- Kim JH, Lee JY, Baek JH, et al. High-resolution sonography for distinguishing neoplastic gallbladder polyps and staging gallbladder cancer. AJR Am J Roentgenol 2015;204:W150-9. [Crossref] [PubMed]
- Kim JS, Lee JK, Kim Y, et al. US characteristics for the prediction of neoplasm in gallbladder polyps 10 mm or larger. Eur Radiol 2016;26:1134-40. [Crossref] [PubMed]
- Lee JS, Kim JH, Kim YJ, et al. Diagnostic accuracy of transabdominal high-resolution US for staging gallbladder cancer and differential diagnosis of neoplastic polyps compared with EUS. Eur Radiol 2017;27:3097-103. [Crossref] [PubMed]
- Wang LF, Wang Q, Mao F, et al. Risk stratification of gallbladder masses by machine learning-based ultrasound radiomics models: a prospective and multi-institutional study. Eur Radiol 2023;33:8899-911. [Crossref] [PubMed]
- Cheng Y, Wang M, Ma B, et al. Potential role of contrast-enhanced ultrasound for the differentiation of malignant and benign gallbladder lesions in East Asia: A meta-analysis and systematic review. Medicine (Baltimore) 2018;97:e11808. [Crossref] [PubMed]
- Negrão de Figueiredo G, Mueller-Peltzer K, Armbruster M, et al. Contrast-enhanced ultrasound (CEUS) for the evaluation of gallbladder diseases in comparison to cross-sectional imaging modalities and histopathological results. Clin Hemorheol Microcirc 2019;71:141-9. [Crossref] [PubMed]
- Xie XH, Xu HX, Xie XY, et al. Differential diagnosis between benign and malignant gallbladder diseases with real-time contrast-enhanced ultrasound. Eur Radiol 2010;20:239-48. [Crossref] [PubMed]
- Liu LN, Xu HX, Lu MD, et al. Contrast-enhanced ultrasound in the diagnosis of gallbladder diseases: a multi-center experience. PLoS One 2012;7:e48371. [Crossref] [PubMed]
- Yuan Z, Liu X, Li Q, et al. Is Contrast-Enhanced Ultrasound Superior to Computed Tomography for Differential Diagnosis of Gallbladder Polyps? A Cross-Sectional Study. Front Oncol 2021;11:657223. [Crossref] [PubMed]
- Huang YQ, Liang CH, He L, et al. Development and Validation of a Radiomics Nomogram for Preoperative Prediction of Lymph Node Metastasis in Colorectal Cancer. J Clin Oncol 2016;34:2157-64. [Crossref] [PubMed]
- Huang J, Fan X, Liu W. Applications and Prospects of Artificial Intelligence-Assisted Endoscopic Ultrasound in Digestive System Diseases. Diagnostics (Basel) 2023;13:2815. [Crossref] [PubMed]
- McCarthy JJ, Minsky ML, Rochester N. Artificial Intelligence. Research Laboratory of Electronics (RLE) at the Massachusetts Institute of Technology. 1959. Available online: https://dspace.mit.edu/handle/1721.1/52263 (accessed on 3 March 2010).
- McCarthy J, Minsky ML, Rochester N, et al. A proposal for the Dartmouth summer research project on artificial intelligence. AI MAG 2006;27:12.
- Bini F, Pica A, Azzimonti L, et al. Artificial Intelligence in Thyroid Field-A Comprehensive Review. Cancers (Basel) 2021;13:4740. [Crossref] [PubMed]
- Lee KS, Park H. Machine learning on thyroid disease: a review. Front Biosci (Landmark Ed) 2022;27:101. [Crossref] [PubMed]
- Bera K, Schalper KA, Rimm DL, et al. Artificial intelligence in digital pathology - new tools for diagnosis and precision oncology. Nat Rev Clin Oncol 2019;16:703-15. [Crossref] [PubMed]
- Lee KS, Ahn KH. Application of Artificial Intelligence in Early Diagnosis of Spontaneous Preterm Labor and Birth. Diagnostics (Basel) 2020;10:733. [Crossref] [PubMed]
- Kuwahara T, Hara K, Mizuno N, et al. Current status of artificial intelligence analysis for endoscopic ultrasonography. Dig Endosc 2021;33:298-305. [Crossref] [PubMed]
- Kuwahara T, Hara K, Mizuno N, et al. Current status of artificial intelligence analysis for the treatment of pancreaticobiliary diseases using endoscopic ultrasonography and endoscopic retrograde cholangiopancreatography. DEN Open 2024;4:e267. [Crossref] [PubMed]
- Cortes C, Vapnik V. Support-Vector Networks. Mach Learn 1995;20:273-297. [Crossref]
- Breiman L. Random forests. Mach Learn 2001;45:5-32. [Crossref]
- Hosny A, Parmar C, Quackenbush J, et al. Artificial intelligence in radiology. Nat Rev Cancer 2018;18:500-10. [Crossref] [PubMed]
- Cui S, Tseng HH, Pakela J, et al. Introduction to machine and deep learning for medical physicists. Med Phys 2020;47:e127-47. [Crossref] [PubMed]
- Lambin P, Rios-Velazquez E, Leijenaar R, et al. Radiomics: extracting more information from medical images using advanced feature analysis. Eur J Cancer 2012;48:441-6. [Crossref] [PubMed]
- Aerts HJ, Velazquez ER, Leijenaar RT, et al. Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach. Nat Commun 2014;5:4006. [Crossref] [PubMed]
- Lambin P, Leijenaar RTH, Deist TM, et al. Radiomics: the bridge between medical imaging and personalized medicine. Nat Rev Clin Oncol 2017;14:749-62. [Crossref] [PubMed]
- Dilek ON, Karasu S, Dilek FH. Diagnosis and Treatment of Gallbladder Polyps: Current Perspectives. Euroasian J Hepatogastroenterol 2019;9:40-8. [Crossref] [PubMed]
- Choi YS, Seo SW, Lee SE, Do JH. Prevalence and risk factors of gallbladder polypoid lesion in a healthy population. HPB 2016;18:e523. [Crossref]
- Xu A, Hu H. The gallbladder polypoid-lesions conundrum: moving forward with controversy by looking back. Expert Rev Gastroenterol Hepatol 2017;11:1071-80. [Crossref] [PubMed]
- Gallahan WC, Conway JD. Diagnosis and management of gallbladder polyps. Gastroenterol Clin North Am 2010;39:359-67. x. [Crossref] [PubMed]
- Inui K, Yoshino J, Miyoshi H. Diagnosis of gallbladder tumors. Intern Med 2011;50:1133-6. [Crossref] [PubMed]
- Wiles R, Varadpande M, Muly S, et al. Growth rate and malignant potential of small gallbladder polyps--systematic review of evidence. Surgeon 2014;12:221-6. [Crossref] [PubMed]
- Lin WR, Lin DY, Tai DI, et al. Prevalence of and risk factors for gallbladder polyps detected by ultrasonography among healthy Chinese: analysis of 34 669 cases. J Gastroenterol Hepatol 2008;23:965-9. [Crossref] [PubMed]
- Xu A, Zhang Y, Hu H, et al. Gallbladder Polypoid-Lesions: What Are They and How Should They be Treated? A Single-Center Experience Based on 1446 Cholecystectomy Patients. J Gastrointest Surg 2017;21:1804-12. [Crossref] [PubMed]
- Li Q, Dou M, Zhang J, et al. A Bayesian network model to predict neoplastic risk for patients with gallbladder polyps larger than 10 mm based on preoperative ultrasound features. Surg Endosc 2023;37:5453-63. [Crossref] [PubMed]
- Foley KG, Lahaye MJ, Thoeni RF, et al. Management and follow-up of gallbladder polyps: updated joint guidelines between the ESGAR, EAES, EFISDS and ESGE. Eur Radiol 2022;32:3358-68. [Crossref] [PubMed]
- Aziz H, Hewitt DB, Pawlik TM. Critical Analysis of the Updated Guidelines for Management of Gallbladder Polyps. Ann Surg Oncol 2022;29:3363-5. [Crossref] [PubMed]
- Yuan HX, Yu QH, Zhang YQ, et al. Ultrasound Radiomics Effective for Preoperative Identification of True and Pseudo Gallbladder Polyps Based on Spatial and Morphological Features. Front Oncol 2020;10:1719. [Crossref] [PubMed]
- Wennmacker SZ, van Dijk AH, Raessens JHJ, et al. Polyp size of 1 cm is insufficient to discriminate neoplastic and non-neoplastic gallbladder polyps. Surg Endosc 2019;33:1564-71. [Crossref] [PubMed]
- Ganeshan D, Kambadakone A, Nikolaidis P, et al. Current update on gallbladder carcinoma. Abdom Radiol (NY) 2021;46:2474-89. [Crossref] [PubMed]
- Coburn NG, Cleary SP, Tan JC, et al. Surgery for gallbladder cancer: a population-based analysis. J Am Coll Surg 2008;207:371-82. [Crossref] [PubMed]
- Martin E, Gill R, Debru E. Diagnostic accuracy of transabdominal ultrasonography for gallbladder polyps: systematic review. Can J Surg 2018;61:200-7. [Crossref] [PubMed]
- Jeong Y, Kim JH, Chae HD, et al. Deep learning-based decision support system for the diagnosis of neoplastic gallbladder polyps on ultrasonography: Preliminary results. Sci Rep 2020;10:7700. [Crossref] [PubMed]
- Kim T, Choi YH, Choi JH, et al. Gallbladder Polyp Classification in Ultrasound Images Using an Ensemble Convolutional Neural Network Model. J Clin Med 2021;10:3585. [Crossref] [PubMed]
- Choi JH, Lee J, Lee SH, et al. Analysis of ultrasonographic images using a deep learning-based model as ancillary diagnostic tool for diagnosing gallbladder polyps. Dig Liver Dis 2023;55:1705-11. [Crossref] [PubMed]
- Li Q, Zhang J, Cai Z, et al. A Bayesian network prediction model for gallbladder polyps with malignant potential based on preoperative ultrasound. Surg Endosc 2023;37:518-27. [Crossref] [PubMed]
- Yuan HX, Wang C, Tang CY, et al. Differential diagnosis of gallbladder neoplastic polyps and cholesterol polyps with radiomics of dual modal ultrasound: a pilot study. BMC Med Imaging 2023;23:26. [Crossref] [PubMed]
- Chen T, Tu S, Wang H, et al. Computer-aided diagnosis of gallbladder polyps based on high resolution ultrasonography. Comput Methods Programs Biomed 2020;185:105118. [Crossref] [PubMed]
- Miller DD, Brown EW. Artificial Intelligence in Medical Practice: The Question to the Answer? Am J Med 2018;131:129-33. [Crossref] [PubMed]
- Roa I, Ibacache G, Muñoz S, et al. Gallbladder cancer in Chile: Pathologic characteristics of survival and prognostic factors: analysis of 1,366 cases. Am J Clin Pathol 2014;141:675-82. [Crossref] [PubMed]
- Sung H, Ferlay J, Siegel RL, et al. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin 2021;71:209-49. [Crossref] [PubMed]
- Sharma A, Sharma KL, Gupta A, et al. Gallbladder cancer epidemiology, pathogenesis and molecular genetics: Recent update. World J Gastroenterol 2017;23:3978-98. [Crossref] [PubMed]
- Gupta P, Basu S, Rana P, et al. Deep-learning enabled ultrasound based detection of gallbladder cancer in northern India: a prospective diagnostic study. Lancet Reg Health Southeast Asia 2023;24:100279. [Crossref] [PubMed]
- Hundal R, Shaffer EA. Gallbladder cancer: epidemiology and outcome. Clin Epidemiol 2014;6:99-109. [PubMed]
- Gupta P, Dutta U, Rana P, et al. Gallbladder reporting and data system (GB-RADS) for risk stratification of gallbladder wall thickening on ultrasonography: an international expert consensus. Abdom Radiol (NY) 2022;47:554-65. [Crossref] [PubMed]
- Basu S, Gupta M, Rana P, et al. RadFormer: Transformers with global-local attention for interpretable and accurate Gallbladder Cancer detection. Med Image Anal 2023;83:102676. [Crossref] [PubMed]
- Basu S, Gupta M, Rana P, et al. Surpassing the Human Accuracy: Detecting Gallbladder Cancer from USG Images with Curriculum Learning. Ithaca: Cornell University Library, arXiv.org, 2022.
- Xue L, Wang X, Yang Y, et al. Segnet Network Algorithm-Based Ultrasound Images in the Diagnosis of Gallbladder Stones Complicated with Gallbladder Carcinoma and the Relationship between P16 Expression with Gallbladder Carcinoma. J Healthc Eng 2021;2021:2819986. [Crossref] [PubMed]
- Pang S, Ding T, Qiao S, et al. A novel YOLOv3-arch model for identifying cholelithiasis and classifying gallstones on CT images. PLoS One 2019;14:e0217647. [Crossref] [PubMed]
- Lian J, Ma Y, Ma Y, et al. Automatic gallbladder and gallstone regions segmentation in ultrasound image. Int J Comput Assist Radiol Surg 2017;12:553-68. [Crossref] [PubMed]
- Zhou W, Yang Y, Yu C, et al. Ensembled deep learning model outperforms human experts in diagnosing biliary atresia from sonographic gallbladder images. Nat Commun 2021;12:1259. [Crossref] [PubMed]
- Lien TH, Chang MH, Wu JF, et al. Effects of the infant stool color card screening program on 5-year outcome of biliary atresia in Taiwan. Hepatology 2011;53:202-8. [Crossref] [PubMed]
- Serinet MO, Wildhaber BE, Broué P, et al. Impact of age at Kasai operation on its results in late childhood and adolescence: a rational basis for biliary atresia screening. Pediatrics 2009;123:1280-6. [Crossref] [PubMed]
- Rhu J, Jung SM, Choe YH, et al. PELD score and age as a prognostic index of biliary atresia patients undergoing Kasai portoenterostomy. Pediatr Surg Int 2012;28:385-91. [Crossref] [PubMed]
- Harpavat S, Garcia-Prats JA, Shneider BL. Newborn Bilirubin Screening for Biliary Atresia. N Engl J Med 2016;375:605-6. [Crossref] [PubMed]
- Harpavat S, Garcia-Prats JA, Anaya C, et al. Diagnostic Yield of Newborn Screening for Biliary Atresia Using Direct or Conjugated Bilirubin Measurements. JAMA 2020;323:1141-50. [Crossref] [PubMed]
- Hsiao CH, Chang MH, Chen HL, et al. Universal screening for biliary atresia using an infant stool color card in Taiwan. Hepatology 2008;47:1233-40. [Crossref] [PubMed]
- Lertudomphonwanit C, Mourya R, Fei L, et al. Large-scale proteomics identifies MMP-7 as a sentinel of epithelial injury and of biliary atresia. Sci Transl Med 2017;9:eaan8462. [Crossref] [PubMed]
- Yang L, Zhou Y, Xu PP, et al. Diagnostic Accuracy of Serum Matrix Metalloproteinase-7 for Biliary Atresia. Hepatology 2018;68:2069-77. [Crossref] [PubMed]
- Mittal V, Saxena AK, Sodhi KS, et al. Role of abdominal sonography in the preoperative diagnosis of extrahepatic biliary atresia in infants younger than 90 days. AJR Am J Roentgenol 2011;196:W438-45. [Crossref] [PubMed]
- Kim WS, Cheon JE, Youn BJ, et al. Hepatic arterial diameter measured with US: adjunct for US diagnosis of biliary atresia. Radiology 2007;245:549-55. [Crossref] [PubMed]
- Humphrey TM, Stringer MD. Biliary atresia: US diagnosis. Radiology 2007;244:845-51. [Crossref] [PubMed]
- Giannattasio A, Cirillo F, Liccardo D, et al. Diagnostic role of US for biliary atresia. Radiology 2008;247:912-author reply 912-3. [Crossref] [PubMed]
- Zhou L, Shan Q, Tian W, et al. Ultrasound for the Diagnosis of Biliary Atresia: A Meta-Analysis. AJR Am J Roentgenol 2016;206:W73-82. [Crossref] [PubMed]
- Zhan J, Feng J, Chen Y, et al. Incidence of biliary atresia associated congenital malformations: A retrospective multicenter study in China. Asian J Surg 2017;40:429-33. [Crossref] [PubMed]
- Wang Y, Peng J, Liu K, et al. Preoperative prediction model for non-neoplastic and benign neoplastic polyps of the gallbladder. Eur J Surg Oncol 2024;50:107930. [Crossref] [PubMed]
- Kim SY, Cho JH, Kim EJ, et al. The efficacy of real-time colour Doppler flow imaging on endoscopic ultrasonography for differential diagnosis between neoplastic and non-neoplastic gallbladder polyps. Eur Radiol 2018;28:1994-2002. [Crossref] [PubMed]
- Al-Azzawi F, Mahmoud I, Haguinet F, et al. Developing an Artificial Intelligence-Guided Signal Detection in the Food and Drug Administration Adverse Event Reporting System (FAERS): A Proof-of-Concept Study Using Galcanezumab and Simulated Data. Drug Saf 2023;46:743-51. [Crossref] [PubMed]
- Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med 2019;25:44-56. [Crossref] [PubMed]
- Aung YYM, Wong DCS, Ting DSW. The promise of artificial intelligence: a review of the opportunities and challenges of artificial intelligence in healthcare. Br Med Bull 2021;139:4-15. [Crossref] [PubMed]
Cite this article as: Wang X, Zhang H, Bai Z, Xie X, Feng Y. Current status of artificial intelligence analysis for the diagnosis of gallbladder diseases using ultrasonography: a scoping review. Transl Gastroenterol Hepatol 2025;10:12.