The adoption of knowledge-based dose-volume histogram (DVH) prediction models for assessing organ-at-risk (OAR) sparing in radiotherapy necessitates quantification of prediction accuracy and uncertainty. Moreover, DVH prediction error bands should be readily interpretable as confidence intervals in which to find a percentage of clinically acceptable DVHs. In the event such DVH error bands are not available, we present an independent error quantification methodology using a local reference cohort of high-quality treatment plans, and apply it to two DVH prediction models, ORBIT-RT and RapidPlan, trained on the same set of 90 volumetric modulated arc therapy (VMAT) plans. Organ-atrisk DVH predictions from each model were then generated for a separate set of 45 prostate VMAT plans. Dose-volume histogram predictions were then compared to their analogous clinical DVHs to define prediction errors from which prediction bias, prediction error variation, and root-mean-square error could be calculated for the cohort. The empirical RMSEpred was then contrasted to the model-provided DVH error estimates. For all prostate OARs, above 50% Rx dose, ORBIT-RT prediction bias and prediction error were comparable to or less than those of RapidPlan. Above 80% Rx dose, prediction bias was less than 1% and prediction error was less than 3-4% for both models. As a result, above 50% Rx dose, ORBIT-RT RMSEpred was below that of RapidPlan, indicating slightly improved accuracy in this cohort. Because the bias is near zero, RMSEpred is readily interpretable as a canonical standard deviation, whose error band is expected to correctly predict 68% of normally distributed clinical DVHs. By contrast, RapidPlan’s provided error band, although described in literature as a standard deviation range, was slightly less predictive than RMSEpred (55–70% success), while the provided ORBIT-RT error band was confirmed to resemble an interquartile range (40–65% success) as described. Clinicians can apply this methodology using their own institutions’ reference cohorts to (a) independently assess a knowledge-based model’s predictive accuracy of local treatment plans, and (b) interpret from any error band whether further OAR dose sparing is likely attainable.