Development and External Validation of a Machine Learning Model for Prediction of Lymph Node Metastasis in Patients with Prostate Cancer

Pelvic lymph node dissection (PLND) remains the gold standard for evaluation of lymph node involvement (LNI) and staging for patients with prostate cancer for whom radical prostatectomy is planned. In contemporary cohorts, most patients undergoing the procedure will have pathologically negative nodes, with LNI detected in up to 12% of patients only.

Furthermore, the therapeutic benefits of PLND are not clear and it is associated with nontrivial complications such as lymphocele development, thromboembolic events, and, less commonly, obturator nerve, vascular, and ureteric injury. Consequently, most societal guidelines recommend the procedure only for those with a high risk of LNI, with several tools having been developed to quantify individual patient risk. These tools include the Roach formula, the Briganti nomogram, and the MSKCC nomogram.

Owing to its ability to process large volumes of clinical data and detect complex interactions between variables, machine learning (ML) has emerged as a promising tool in predictive modeling and personalized healthcare. We hypothesized that given sufficient data, a machine learning model could outperform classic prediction tools in assessing patient risk of LNI, thereby sparing more low-risk patients from unnecessary interventions and associated complications. Several ML models are potentially suitable for this task, including decision trees (DTs), random forests (RFs), gradient boosting machines (GBMs), etc. DTs are built using recursive partitioning analysis to find an optimal series of binary splits on the input variables, leading to subsets that are as homogeneous as possible with respect to the target variable. RFs improve on the accuracy of DTs by using an ensemble of trees and averaging their outputs. GBMs are a class of algorithms that build an ensemble of weak learners (typically DTs) in a sequential manner, with the new models in the ensemble training on the errors of the previous models. While there is no guarantee that other models will not have better performance, RFs and GBMs are on average the best algorithms in terms of accuracy when trained on tabular data. XGBoost (Extreme Gradient Boosting) is a version of GBMs and is known to consistently win data science competitions involving tabular data. It expands on GBMs by adding several features that improve their performance, including regularization, cross-validation, etc. We, therefore, decided to build our model using XGBoost. Using Python libraries, we selected our model hyperparameters to optimize discrimination, measured using the area under the curve (AUC) of the receiver operating curve. We then optimized the model’s calibration using ensembling and non-parametric calibration to build a final ensemble model consisting of 10 XGBoost-calibrator pairs.

The model was developed using a multi-institutional dataset, comprising a training cohort of 20,267 patients from one institution and an external validation cohort of 1,322 patients from another institution. We used clinico-pathologic variables similar to those used by the classic tools to ensure that the data needed to obtain predictions is readily available. On external validation, the model outperformed all the reference models, with an AUC of 0.82. It also had improved calibration and the highest net benefit on decision curve analysis. In practical terms, for every 100 patients screened at the 2% threshold, our model’s improved performance is the equivalent of a scenario where our model spares five additional patients from undergoing PLND in comparison to the MSKCC nomogram, without missing any patients with LNI. Implemented as a web-based calculator, our model offers clinicians and patients an easy-to-use tool that improves joint decision-making.

Written by: Osama Mohamad, MD, PhD & Ali Sabbagh, MD, Department of Radiation Oncology, University of California-San Francisco, San Francisco, CA

Read the Abstract

Login