Smart Surgery

Subduing Surgical Variations: How Big Data Is Improving Patient Outcomes (1/2)

12 Sep 2018

Hospitals lose millions of dollars every year as a result of complications that arise in the OR. Re-admissions and longer hospital stays are a drain on hospital resources. They are also frequently avoidable and are the result of surgical variations. Until now, it has been difficult to provide surgeons constructive feedback and opportunities for continuing education that can further hone their skill and refine their technique. HISTalk sat down with caresyntax® Senior Adviser and Sound Physicians Chief Clinical Officer Dr. John  Birkmeyer to discuss the recent developments that is changing all this.

Tell me about yourself and the company.

I’m a general surgeon and a health services researcher by training. I spent most of my scholarly life focusing on the phenomenon of variation in surgical performance and outcomes.

I am chief clinical officer of Sound Physicians, which is a national physician practice focusing on hospital-based position practices. I also serve on the advisory board for caresyntax, which is a technology company that specializes in big data integration and offers a variety of tools for helping improve the performance of operating surgeons.

What causes surgical variation how much does it affect outcomes?

If you think about it, there’s no reason to be surprised that surgeons would vary in their performance, skill, and ultimately outcomes any more than tennis players, golfers, or musicians. It’s a pretty fine skill. Surgeons just vary in the degree to which they ultimately master it.

If you look at the scientific literature, depending on what procedure and what specialty you’re talking about, there is, give or take, a three- to five-fold spread in surgeon outcomes and costs. At the end of the day, that has enormous implications for both public health and healthcare costs, particularly as you consider that 40 or 50 million surgical procedures get done in the US alone every year. There’s a very deep and complex body of research that aims to understand what drives observed variation in surgeon outcomes.

Part of it, depending on the procedure, is driven by environmental factors and attributes of the hospital at which a surgeon is practicing. Certainly there’s aspects of the team — the skill and competence of anesthesia and critical care — that ultimately drive how well a surgeon’s patients do. However, my own work, as well as that of others, has shown that a lot of that variation is driven by the intrinsic ability of the operating surgeon. While technical skill and proficiency isn’t the only type of surgeon attribute that varies, it’s the most important and the most obvious.

My hospital experience is that surgeons are fiercely autonomous and aren’t all that interested in having others get involved in their work. How much of the issue of variation is based on surgeon psychology?

There’s no doubt that there’s a stereotype associated with surgeons, which is partly true and partly reinforced by how important surgeons are to the economics and to the smooth running of any hospital. I think part of what you’re describing about surgeons is something that is not specific to surgeons, but it’s a paradigm that’s applies to all physicians. There’s this general assumption that if you’re smart and if you do four,  five, or up to seven years of post-medical school training, then you’re good to go. You’re at the flat part of the curve with regards to your abilities in your mastery of the craft.

Given how complex surgery is, and even given the scientific literature, it’s clear that surgeons continue on the learning curve for many, many years after they finish their training. My belief is that surgeons could be so much better than they are if they adapted a philosophy of deliberate practice and continuous learning and if they increasingly started to harness some of the empirical tools that are being brought to bear in many other disciplines.

Your video study of procedures found that some surgeons have easily observed poor technique, yet no surgeon thinks they are a less-than-average performer. How much of the surgical process is based on defensible, concrete standards?

Perhaps it’s not a surprise, given the stereotype associated with surgeons, that most surgeons think they’re above average. There’s no doubt that part of what made my own research feasible was the willingness of surgeons to supply videos of themselves operating, probably under the assumption that their peers could learn from watching them. We all know that it’s just a fact that in any sample, that half of all the members will be average or below average.

The things that surprised me about that particular study in The New England Journal of Medicine were, number one, just how stark the differences were in both technique and skill. Number two, it was amazing to me just how immediately obvious those variations in skill were. Not just to professional observers — surgeons watching each other operate — but if you show those 20 videos to lay observers who don’t know anything about surgery, they can almost just as easily segregate the best from the worst. In fact, there’s great research that’s recently been published showing that crowdsourcing by lay observers gets you basically to the same ratings as professional ratings by surgeon peers. Finally, I was really shocked by just how powerfully related surgeon skill was to various outcomes that are relevant either to patient outcomes or to cost.

As I watch all of those videos, as somebody who’s himself a practicing bariatric surgeon, there was not a single surgeon whose technique was outside of the standard of care. Nobody was violating accepted professional standards for how to do that procedure. It just speaks to the fact that our standards are fairly loosey goosey, to the extent that we have a very imprecise estimate of what’s optimal technique and what’s not. It also speaks to the fact that it’s not so much the technique that a surgeon deploys as it is the fidelity or the precision in the skill by which that technique is deployed.

The surgeons who contributed their videos were self-selected, which probably means that you were not seeing the worst surgeons in the US. Beyond observing voluntarily donated videos, what data elements or analysis would allow assessment of all surgeons?

You’re absolutely right that in my study, that was a self-selected group of surgeons. But it was also a group surgeons that had the luxury of being able to choose their best case. Nobody sent me videotapes of cases gone sour. They basically sent me what they thought was typical in sometimes their best work. Imagine what it would look like if it was just a random sample of everybody in all cases.
I’m sure that, for many procedures, if you really did have the universe and the entire library of all of their cases, that there’s a significant minority of surgeons that half the peers would say, “This person should not be operating or should not be doing procedures as complex as this.”

The second part of your question was about what’s a scalable strategy for vetting and providing feedback to all surgeons, not just this highly selected group of volunteers. That’s what’s attractive to me about technology approaches. Such a high percentage of surgical procedures these days, particularly those that are most complex and are the highest stakes from the perspective of patients, are done videoscopically, which means that there’s a real-time video recording of what’s going on in the surgical field and at the tips of the surgeon’s instruments.

What’s really exciting to me is to leverage all of that rich data infrastructure and convert the real-time video information to digital, empirical information that gives surgeons real-time feedback about how they’re doing relative to techniques and maneuvers that ultimately lead to the best outcomes. Google and Uber may ultimately get us to a self-driving car — with all of the externalities, in all of the craziness that has to be accounted for — and can help the car or the driver make better decisions.

I don’t think it’s a huge stretch, given how reproducible certain types of procedures are, that machine learning based on digital video-based information could do the same thing. With regard to not only providing digital analysis and giving a surgeon a report card about how well he or she did with that case that just ended, but also giving real-time information that could help those procedures be better in the first place. Like the angle of attack, how much random motion there is, the amount of force that’s being applied either to the instrument or to the tissue. All of these things that we measured holistically and by human judgment in my study could, in my belief, very readily be replicated in a much more powerful way using the data technology.

Every surgeon wants to do a good job, but nobody likes to judge or be judged by peers. Doctors are competitive enough to want their numbers to look good. Will the procedure data be acted on through self-policing or will hospitals need to get involved?

I think the answer is both. At the end of the day, there needs to be more rigorous procedures for doing two things. One, identifying and policing that small subset of surgeons that really should not be operating, or at least should be operating with a less-complex scope of practice. Number two, finding ways to make all surgeons better. In other words, not just worrying about the bad apples on one tail of the distribution, but finding a way to shift that whole performance curve to the right and make everybody better via the data-informed practice.

With regards to self-policing, there’s a whole bunch of discussion underway about the role of the American Board of Surgery and similar boards for using that as a part of the board certification. Hospitals are increasingly insisting that new surgeons submit videotapes of themselves operating as part of their hospital credentialing process. Those are all fairly important but low-tech approaches to identifying that small number of surgeons who just are not ready for prime time.

What’s most exciting to me is how you make everybody better. Certainly there are practical and sociological barriers to making everybody better purely via a paradigm of person-to-person coaching. Not just because that’s expensive, because surgeon time is expensive, but also because a lot of surgeons just are reluctant to be taught or coached by their peers. They think they’re done and it’s an admission of inferiority to accept that kind of coaching when you’re well-established in your practice.
That’s what’s so appealing to me about the more anonymous, confidential, data-driven performance feedback that I believe is eminently feasible now with both robotic surgery and other types of videoscopic surgery. There still is a lot of work to be done in terms of exactly what that feedback would look like and how to get that feedback in real time to surgeons as they’re operating in a way that does not distract them from what they’re doing, but improves what they’re doing. I think it’s really exciting. I don’t think that it’s 15 years from now. I think we’re getting very close.


New call-to-action
This article was originally published on June 11, 2018, by HIStalk. The original article can be found here.

John D. Birkmeyer, M.D.

Written by John D. Birkmeyer, M.D.

Dr. Birkmeyer is a senior adviser on the caresyntax advisory board. As Chief Clinical Officer at Sound Physicians, he is focused on clinical quality, risk management, patient experience and cost-efficiency, and helps providers succeed in the value-based payment ecosystem. His research focuses on measuring and improving hospital outcomes, physician performance and cost-efficiency. Dr. Birkmeyer is a graduate of Harvard Medical School, and he completed his General Surgery residency and a fellowship in health services research at Dartmouth.

Recent News

New call-to-action