Surgeon heterogeneity significantly affects functional and oncological outcomes after radical prostatectomy in the Swedish LAPPRO trial
Abstract
Objectives
To evaluate how surgeon heterogeneity – the variation in outcomes between individual surgeons – influences functional and oncological outcomes after robot-assisted laparoscopic prostatectomy (RALP) and retropubic radical prostatectomy (RRP), and to assess whether surgeon heterogeneity affects the comparison between RALP and RRP.
Patients and Methods
Laparoscopic Prostatectomy Robot Open (LAPPRO) is a prospective, controlled, non-randomized trial performed at 14 Swedish centres with 68 operating surgeons. A total of 4003 men with localized prostate cancer were enrolled between 2008 and 2011. The endpoints were urinary incontinence, erectile dysfunction (ED) and recurrence at 24 months after surgery. Logistic regression models were built to evaluate surgeon heterogeneity and, secondarily, surgeon-specific factors were added to the models to investigate their influence on heterogeneity and the comparison between RALP and RRP.
Results
Among surgeons who performed at least 20 surgeries during the study period (n=25), we observed statistically significant heterogeneity for incontinence (P = 0.001), ED (P < 0.001) and rate of recurrent disease (P < 0.001). The significant heterogeneity remained when analysing only experienced surgeons with a stated experience of at least 250 radical prostatectomies (n=12). Among all participating surgeons (n=68), differences in surgeon volume explained 42% of the observed heterogeneity for incontinence (P = 0.003), 11% for ED (P = 0.03) and 19% for recurrence (P = 0.01). Taking surgeon volume into account when comparing RALP and RRP had a significant impact on the results. The effect was greatest for functional outcomes, and the additional adjustments for the surgeons' previous experience changed whether the difference between techniques was statistically significant or not. The surgeons’ annual volume had the greatest effect on the recurrence rate.
Conclusions
There was a large degree of heterogeneity among surgeons regarding both functional and oncological outcomes and this had a significant impact on the results when comparing RALP and RRP. Some of the observed heterogeneity was explained by differences in surgeon volume. Efforts to decrease heterogeneity are warranted and variation among surgeons must be accounted for when conducting comparative analyses between surgical techniques.
Abbreviations
-
- ED
-
- erectile dysfunction
-
- RALP
-
- robot-assisted laparoscopic prostatectomy
-
- RP
-
- radical prostatectomy
-
- RRP
-
- retropubic radical prostatectomy
Introduction
Radical prostatectomy (RP) has become one of the most common urological operations [1]. The randomized Scandinavian Prostate Cancer Group trial (SPCG-4) showed a reduction in prostate cancer mortality after 29 years of follow-up, with a mean gain of 2.9 life-years with RP compared with watchful waiting [2]. However, RP is associated with long-term complications such as urinary incontinence and erectile dysfunction (ED) [3], which have a major impact on patients’ quality of life [2, 4, 5]. During the last decades, the traditional surgical approach, retropubic radical prostatectomy (RRP) has been challenged by robot-assisted laparoscopic prostatectomy (RALP) and today RALP is more common in many countries than RRP. The extensive use of RALP, however, is not supported by strong evidence of superior functional and oncological outcomes compared to RRP [6-14]. In addition, the role of the individual surgeons’ experience and skill, in relation to incontinence, impotence and recurrence rate after surgery is poorly described. Previous studies have reported significant heterogeneity among surgeons [15-17], but few studies have reported on underlying factors and, to our knowledge, no study has investigated the effect of surgeon heterogeneity when comparing between RALP and RRP.
The Laparoscopic Prostatectomy Robot Open (LAPPRO) trial is a prospective, controlled, non-randomized multicentre trial comparing outcomes after RALP and RRP. The primary endpoint of urinary incontinence rate was assessed at 12 and 24 months after surgery, showing no statistically significant differences between surgical methods. For ED a small and statistically significant difference in favour of RALP was observed, while no significant differences were observed for oncological outcomes [6, 18]. Analyses were restricted to surgeons with experience of at least 100 RPs, but differences among individual surgeons such as previous experience and annual caseload were not accounted for.
The aim of the present study was to describe heterogeneity among all participating surgeons in relation to functional and oncological outcomes in the LAPPRO trial, with 24 months' follow-up. We also sought to investigate which underlying surgeon-dependent factors were of importance, and how surgeon heterogeneity influenced the results when comparing functional and oncological outcomes between RRP and RALP.
Patients and Methods
Study Design and Participants
The design of the LAPPRO trial has been reported in detail [6, 19]. Fourteen Swedish centres, seven performing RRP and seven RALP, participated during September 2008 to November 2011. This analysis was restricted to patients with the following criteria: age <75 years; clinical tumour stage ≤T3; PSA concentration at baseline <20 ng/mL; and no signs of distant metastasis. The study was approved by the Regional Ethical Review Board in Göteborg (no 277-07). The trial is registered in the Current Controlled Trials database (ISRCTN 06393679).
Data were collected by patient questionnaires before surgery as well as 3, 12 and 24 months postoperatively. Clinical information from outpatient hospital visits was collected on a peri-operative case record form and at the same intervals as mentioned above. Each surgeon’s previous experience (number of either RRPs or RALPs performed) was obtained from the peri-operative case record form on which each surgeon stated their surgical experience in categories (0–49, 50–99, 100–150, or more than 150 procedures). Surgeons who had performed more than 150 procedures were contacted retrospectively and asked for the total number of RPs performed (RRPs or RALPs) before entering the LAPPRO trial.
Outcome Measurements
The endpoints were urinary incontinence, ED and recurrence at 24 months after surgery. We used the same definitions as previously published [18]. Incontinence was defined according to the number of pads used during a typical 24-h period and ED according to question 3 of the International Index of Erectile Function. Men who reported use of intracorporeal or intra-urethral injections of alprostadil were considered impotent. Recurrence (combination of residual disease and biochemical recurrence) was defined as a measurable PSA level >0.25 ng/mL at 3-, 12- or 24-month follow-up, and/or postoperative treatment with radiotherapy, androgen deprivation therapy or chemotherapy. Patients with missing data on functional outcomes at 24 months who reported being potent/continent at 3 and 12 months after surgery, were considered potent/continent after 24 months.
Statistical Analysis
To evaluate the surgeon-specific heterogeneity for each outcome measure, we built logistic regression models and included surgeons with at least 20 surgeries during the study period as a fixed effect. The model was used to create a forest plot and to test for surgeon heterogeneity (likelihood ratio test). For each outcome, we built a model adjusted for baseline patient and tumour characteristics to account for potential differences in the case mix of patients operated on by different surgeons. As a subgroup analyses, analyses were repeated for surgeons with experience of at least 250 RPs. We also analysed whether a correlation between the three outcomes existed using Spearman’s correlation.
To quantify heterogeneity among all participating surgeons we used mixed-effects logistic regression models including surgeon as a random intercept. The model estimated the standard deviation among the surgeons’ outcomes. A large standard deviation indicated dissimilar outcomes among surgeons. We investigated how three different surgeon-dependent factors modified the observed surgeon heterogeneity among the three outcome variables: the degree of nerve-sparing surgery (bilateral, unilateral or no); a surgeon’s experience according to number of RPs performed prior to the current procedure; and the annual caseload of procedures during the study. Each of these factors was added to the base models, and the change as percentage of the standard deviation was recorded. Large changes indicated the factor was related to much of the observed heterogeneity.
For both functional and oncological outcomes, the base models were adjusted for the same potential confounders as in previous LAPPRO publications [6, 18]. For urinary incontinence, the base models were adjusted for age at surgery, incontinence at baseline, body mass index (≥30 vs <30 kg/m2), history of inguinal hernia, history of abdominal surgery, diabetes, history of lung disease, history of mental disease, pathology prostate weight (0–19, 20–39, 40–59, 60–79, ≥80 g), clinical T-stage (cT1, cT2, cT3), preoperative PSA (0–4.4, 4.5–6.1, 6.2–9.1, 9.2–20 ng/mL), biopsy Gleason score (4–7 vs 8–10), and length of cancer in biopsy core (0–3.7, 3.8–7.6, 7.7–15.9, ≥16 mm). ED models were adjusted for age at surgery, baseline potency, diabetes, history of inguinal hernia, smoking (never, former, current), history of cardiovascular disease, relationship status, clinical T-stage, preoperative PSA, biopsy Gleason score, and length of cancer in biopsy core. Models for recurrence included pathology prostate weight, pathology T-stage (pT1, pT2, pT3), preoperative PSA level, and prostatectomy Gleason score. For the subgroup analyses of experienced surgeons we created an additional model where, in addition to adjustments for baseline patient and tumour characteristics, we also adjusted for the surgeons' annual caseload.
We also undertook analyses to assess whether surgeon-dependent factors modified our assessment of RALP vs RRP. The mixed-effects models described above were repeated including a covariate for the type of surgery performed.
All analyses were repeated excluding surgeons who performed fewer than 20 surgeries during the LAPPRO trial to assess whether any reported results were sensitive to including the lowest-volume surgeons. All analyses were conducted using R 3.5 (R Foundation for Statistical Computing, Vienna, Austria) with the lme4 package [20].
Results
Of 4003 patients included in the LAPPRO trial, 3443 were evaluable for the present analyses, 2617 after RALP and 826 after RRP (Fig. 1). Patient and tumour characteristics are shown in Table 1. The RPs were performed by 68 surgeons. Those operating with a robot-assisted technique were less experienced (median [interquartile range] 62 [19–132] vs 148 [99–388] procedures), but had a higher annual caseload (median [interquartile range] 41 [27–61] vs 6 [3–12] cases) than surgeons operating with the open technique. RALP procedures were more often nerve-sparing compared with RRPs (Table 1).

Variable |
RALP N = 2617 |
RRP N = 826 |
P |
---|---|---|---|
Age at surgery, years | 63 (58, 67) | 63 (59, 67) | 0.4 |
Preoperative PSA, ng/mL | 6.0 (4.5, 8.8) | 6.3 (4.5, 9.1) | 0.13 |
Clinical T-stage, n (%) | |||
T1 | 1547 (59) | 547 (66) | <0.001 |
T2 | 994 (38) | 250 (30) | |
T3 | 76 (2.9) | 29 (3.5) | |
T4 | 0 (0) | 0 (0) | |
Biopsy Gleason score, n (%) | |||
4–7 | 2455 (94) | 777 (94) | >0.9 |
8–10 | 150 (5.8) | 48 (5.8) | |
Unknown | 12 | 1 | |
Pathological T-stage, n (%) | |||
T1 | 0 (0) | 0 (0) | 0.6 |
T2 | 1852 (72) | 603 (74) | |
T3 | 696 (27) | 210 (26) | |
T4 | 10 (0.4) | 4 (0.5) | |
Unknown | 59 | 9 | |
Pathological Gleason score, n (%) | |||
4–7 | 2402 (93) | 772 (94) | 0.3 |
8–10 | 179 (6.9) | 47 (5.7) | |
Unknown | 36 | 7 | |
Path. prostate weight, g | 42 (34, 53) | 44 (36, 54) | |
Unknown | 32 | 12 | <0.001 |
Nerve-sparing status, n (%) | |||
None | 813 (31) | 336 (41) | <0.001 |
Unilateral | 1012 (39) | 194 (24) | |
Bilateral | 790 (30) | 294 (36) | |
Unknown | 2 | 2 | |
BMI, kg/m2 | 26 (24, 28) | 26 (24, 28) | |
Unknown | 346 | 117 | 0.042 |
Smoking status, n (%) | |||
Never | 914 (40) | 302 (42) | 0.3 |
Former | 1165 (51) | 357 (50) | |
Current | 224 (9.7) | 59 (8.2) | |
Unknown | 314 | 108 | |
Cardiovascular disease, n (%) | 800 (35) | 247 (35) | |
Unknown | 319 | 111 | >0.9 |
- BMI, body mass index; RALP, robot-assisted laparoscopic prostatectomy; RRP, retropubic radical prostatectomy.
- Values are presented as median (interquartile range), unless otherwise indicated.
Surgeon Heterogeneity and Outcomes
Among participating surgeons with at least 20 surgeries during the study period (n=25) the incontinence rate varied from 5% to 30%, representing statistically significant heterogeneity (P = 0.001). The rate of ED varied from 61% to 93% (P < 0.001) and recurrent disease from 4% to 35% (P < 0.001; Fig. 2a–c).

For surgeons who had performed more than 250 RPs (n=12), statistically significant heterogeneity was found for rate of incontinence (P = 0.008), rate of ED (P < 0.001) and recurrence rate (P = 0.03). By adding adjustment for annual caseload, statistically significant heterogeneity was found for incontinence rate (P = 0.009) and rate of ED (P < 0.001), whereas heterogeneity for recurrence rate was no longer significant (P = 0.8).
There were no statistically significant correlations between the outcomes (incontinence and ED, rs = −0.001, P > 0.9; incontinence and recurrence rate, rs = −0.25, P = 0.2; ED and recurrence rate, rs = 0.23, P = 0.3).
Surgeon-Dependent Factors
The surgeons’ prior experience, annual caseload and degree of nerve-sparing were analysed separately and in combination, and the change was recorded as a percentage of the standard deviation.
Surgeons’ experience accounted for 42% of observed heterogeneity regarding incontinence (P = 0.003) and 11% regarding ED (P = 0.03), but did not significantly influence recurrence. The degree of nerve-sparing explained 5% of heterogeneity in both incontinence and ED (P = 0.002 and P ≤0.001, respectively), but did not significantly change recurrence heterogeneity. Annual caseload did not significantly influence either incontinence or ED (nonsignificant) but accounted for 19% of heterogeneity regarding recurrence (P = 0.01).
To assess whether the comparison of surgical technique was affected when the surgeons’ previous experience, annual caseload and degree of nerve-sparing surgery were taken into account, analyses were repeated including a covariate for the type of surgery performed (Fig. 3a–c).

In the base model, only adjusting for differences in patient and tumour characteristics, the difference between techniques in incontinence rate was statistically significantly lower after RRP than after RALP (Fig. 3a). Adjusting for annual caseload increased the difference between techniques, whereas adjustments for previous experience reduced the difference in incontinence rate to a statistically nonsignificant level.
Regarding ED there was no statistically significant difference between techniques at baseline (Fig. 3b), but adjusting for previous experience of surgeons resulted in a statistically significant difference in favour of RALP. Adjustment for annual caseload or nerve-sparing made the techniques more similar.
Regarding recurrence rate, no statistically significant difference was seen between surgical techniques in the base model (Fig. 3c), and the additional adjustments did not change this.
The results remained largely unchanged when analyses were repeated and surgeons who performed fewer than 20 surgeries during the study period were excluded (nonsignificant).
Discussion
In the present study, we analysed surgeon heterogeneity from three different perspectives: (i) how it affected functional and oncological outcomes; (ii) which underlying factors connected with the surgeons were of importance; and (iii) how surgeon heterogeneity affected the comparison between RALP and RRP. We found large and statistically significant variation among individual surgeons regarding both functional and oncological outcomes. For functional outcomes the most important factor influencing heterogeneity was the surgeons' previous experience, while annual caseload had the greatest impact on the oncological outcome. Adjusting for these surgeon volume-related factors had a statistically significant effect on the results when comparing RALP and RRP.
LAPPRO is a prospective, multicentre trial comparing outcomes after RRP and RALP, including 68 operating surgeons with varying experience. To investigate surgeon heterogeneity, we analysed patients operated on by all surgeons in the LAPPRO cohort as well as a subgroup of patients operated on by experienced surgeons with a previous caseload of more than 250 RPs. The definition of experienced surgeons was based on a previous report indicating that the learning curve for open surgery plateaus at 250 RRPs [21]. We found a considerable and statistically significant variation among individual surgeons’ case mix-adjusted outcomes for all surgeons but also for the subgroup of experienced surgeons. For functional outcomes the most important factor influencing the observed heterogeneity was previous experience, while annual volume had the greatest impact on heterogeneity for recurrence rate.
While heterogeneity in functional and oncological outcomes has been described earlier [15-17, 22-24], underlying factors explaining such heterogeneity have not been reported in detail for RALP or RRP. In 2010, Bianco et al. [17] described variations among experienced surgeons in cancer control after RRP in a study from four high-volume centres. They found statistically significant heterogeneity in prostate cancer recurrence rate independent of surgeon experience. In a single-centre study, Vickers et al. [15] described a significant between-surgeon variation for potency and urinary continence 1 year after RRP in 1910 patients who were treated by 11 different surgeons. A Swedish population-based study looking at the effects of surgeon’s (n = 9) variability on oncological and functional outcomes, also reported large heterogeneity in continence rate after RRP, but not for ED or recurrence rate [16]. In 2018, Huynh et al. [24] showed a 10-fold variation in 3-month continence rate when comparing five surgeons. Surgeon volume as an important factor for outcomes after RP was first described by Begg et al. [23] and has been commented on in follow-up papers [25] and in subsequent studies on learning curve [21, 26-29]. Taken together, our results support most previous findings that surgeon heterogeneity and surgeon volume significantly affect long-term outcomes after RP. However, even though volume factors are important, most of the observed heterogeneity remained after taking these factors into account. This means that experience and high volume is no guarantee of a favourable outcome. Although not fully explored, many other factors may contribute, including paying attention to every detailed step in the procedure and the set-up of the surgical training.
Since the primary aim of LAPPRO was to assess differences in outcomes by type of surgery, we also undertook analyses to evaluate how surgeon-dependent factors associated with surgeon heterogeneity affected the comparison between RALP and RRP. The additional adjustments for the surgeons' previous experience and annual caseload had significant effects on the comparison for all outcomes. The impact was greatest for functional outcomes and changed whether the difference between techniques was statistically significant or not.
Using national register data, Hu et al. [7] compared the effects of minimally invasive RP and RRP and reported that the adding of surgeon volume to the case mix-adjusted base model did not affect functional outcomes, which is in contrast to our findings. Other cohort studies comparing RRP and RALP did not adjust for differences between the surgeons’ individual results [6, 8, 10-13, 18]. In the only randomized trial comparing RRP and RALP published to date, similar functional outcomes were reported, whereas there was a difference regarding recurrence in favour of RALP (3% vs 9%) at 24-month follow-up [9]. However, the authors recommended caution in interpretation of the oncological outcomes because of the lack of standardization in postoperative management. Furthermore, with only one surgeon in each randomization arm, external validity was low and it was not possible to evaluate surgeon heterogeneity. Our results clearly show that surgeon volume significantly impacts long-term outcomes after RP and that it affects the comparison between techniques. Detailed knowledge is needed, not only of the cohort, but also of the surgeons, to minimize the risk of analysing differences between surgeons rather than true differences between the surgical techniques. There is a need for additional studies to better understand how the surgeon's individual experience and skills affect the results when comparing different surgical techniques such as RALP and RRP.
Efforts to decrease the wide heterogeneity in outcomes are warranted, irrespective of surgical approach, and can be facilitated by continuously reporting and monitoring surgical outcomes in quality registers. Organizing training of new surgeons, defining basic skills criteria as well as a minimum of annual cases performed by an individual surgeon, and peer-to-peer observation in the operating room with feedback is critical.
Strengths of the present study include the large number of patients included (n = 3443), the high response rate to questionnaires [6, 18] as well as the multicentre design and the large number of surgeons (n = 68), which is a prerequisite for the validity of outcomes and for an investigation of surgeon heterogeneity. The prospective nature of the data collection, the detailed data on surgeons’ experience as well as on peri-operative details such as nerve-sparing are additional strengths. The present study is unique in that we have explored the impact of different underlying factors explaining surgeon heterogeneity which have not been reported earlier in detail in a large prospective study.
The study is limited by the non-randomized design of the trial. The results represent a nationwide cohort including hospitals of different size and surgeons with different training background, which can be a potential limitation in terms of generalizability to other surgeons and settings around the world. In previous reports, we have described differences between the RALP and RRP cohorts regarding the frequency and various degrees of nerve-sparing [6]. We have also reported that stratification of the cohorts based on D'Amico risk groups affected outcome analyses [30]. Such differences between the RALP and RRP groups also indicate that direct comparison between the surgical methods must be carried out with caution. Results from previous subgroup analyses are not quite comparable with the data presented here because surgeons with less experience are also included in the present analyses.
In conclusion, we have demonstrated a large and statistically significant variation between individual surgeons in functional and oncological outcomes after RP. Although surgeons’ previous experience and annual caseload influenced heterogeneity significantly, a large degree of heterogeneity remained after taking these volume-related factors into account. Importantly, adjusting for surgeon volume affected whether there was an advantage associated with RALP or RRP, which indicates that studies comparing different surgical procedures should be interpreted with caution. Strategies to decrease surgeon heterogeneity must be prioritized.
Acknowledgements
This study was supported by research grants from the Swedish Cancer Society (2008/922, 2010/593, 2013/497, 2016/362), the Swedish Research Council (2012-1770, 2015-02483), Region Västra Götaland, Sahlgrenska University Hospital (ALFBGB grants 13875, 146201, 4307771; HTA–VGR 6011; agreement concerning research and education of doctors), the Mrs Mary von Sydow Foundation, and the Anna and Edvin Berger Foundation. Drs Martin Nyberg and Anders Bjartell were supported by research grants from Region Skåne, Sweden and Lund University (ALF 42202/2018). The work of Drs Sigrid Carlsson and Dan Sjoberg on this paper was supported in part by a Cancer Centre Support Grant from the National Cancer Institute made to Memorial Sloan Kettering Cancer Centre (P30-CA008748). Dr Sigrid Carlsson’s work was also supported in part by funds from the Sidney Kimmel Centre for Prostate and Urologic Cancers, National Institutes of Health/National Cancer Institute Transition Career Development Award (K22-CA234400).
Conflicts of interest
None declared.