Machine Learning Models to Predict 24 Hour Urinary Abnormalities for Kidney Stone Disease.

To help guide empiric therapy for kidney stone disease, we sought to demonstrate the feasibility of predicting 24-hour urine abnormalities using machine learning methods.

We trained a machine learning model (XGBoost [XG]) to predict 24-hour urine abnormalities from electronic health record-derived data (n=1,314). The machine learning model was compared to a logistic regression model [LR]. Additionally, an ensemble (EN) model combining both XG and LR models was evaluated as well. Models predicted binary 24-hour urine values for volume, sodium, oxalate, calcium, uric acid and citrate; as well as a multiclass prediction of pH. We evaluated performance using area under the receiver operating curve (AUC-ROC) and identified predictors for each model.

The XG model was able to discriminate 24-hour urine abnormalities with fair performance, comparable to LR. The XG model most accurately predicted abnormalities of urine volume (accuracy=98%, AUC-ROC=0.59), uric acid (69%, 0.73) and elevated urine sodium (71%, 0.79). The LR model outperformed the XG model alone in prediction of abnormalities of urinary pH (AUC-ROC of 0.66 vs 0.57) and citrate (0.69 vs.0.64). The EN model most accurately predicted abnormalities of oxalate (accuracy=65%, ROC-AUC=0.70) and citrate (65%, 0.69) with overall similar predictive performance to either XG or LR alone. Body mass index, age, and gender were the three most important features for training the models for all outcomes.

Urine chemistry prediction for kidney stone disease appears to be feasible with machine learning methods. Further optimization of the performance could facilitate dietary or pharmacologic prevention.

Urology. 2022 Jul 16 [Epub ahead of print]

Nicholas L Kavoussi, Chase Floyd, Abin Abraham, Wilson Sui, Cosmin Bejan, John A Capra, Ryan Hsi

Department of Urology, Vanderbilt University Medical Center, Nashville, TN. Electronic address: ., University of South Carolina School of Medicine, Columbia, SC., Department of Biological Sciences, Vanderbilt Genetics Institute, and Center for Structural Biology, Vanderbilt University, Nashville, TN., Department of Urology, Vanderbilt University Medical Center, Nashville, TN., Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN; Bakar Computational Health Sciences Institute at the University of California, San Francisco, CA; Department of Epidemiology and Biostatistics at the University of California, San Francisco, CA.