false
Catalog
Guidelines Development Workshop
Session 3: Review example studies and terminology ...
Session 3: Review example studies and terminology and rate the quality of the evidence
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
Hello. Welcome, everyone, to the second session of our workshop for the American Association of Clinical Endocrinology. My name is Jonathan Treadwell and my colleague and I, Stacey Yule, will be going over two main topics today. The first will be a grade session where I walk through a complete example of applying grade to a specific clinical topic. And that will take about an hour. And then I'll hand it over to Stacey, who will talk about the other aspect of grade, which is crafting recommendations and rating the strength of recommendations. But before we get to that, I was told that I used a few terms yesterday that are not widely people familiar with. So I'm just going to spend a couple of slides here defining those. And these were the terms relative risk and odds ratio. You might be wondering, well, what are they? Are they the same thing? The answer is they are different. The relative risk is the easy one. That's just the ratio of two percentages. So if the rate of some event in one treatment group is 10% and the other group is 5%, the relative risk is 2. The odds ratio is slightly different. It's actually the ratio of the odds of the two events, which is mathematically a little different. If we had a 10% and 5% rate, those odds are 9 to 1 and 19 to 1. And the ratio of 19 and 9 is 2.11, so slightly different. A few little clarifications. As event rates approach either really rare near zero or really common near one, these two quantities get closer together. The odds ratio is always further from one that is more extreme than the relative risk. People often misinterpret the odds ratio to be the relative risk. And when you see the phrasing X times more likely, that's a relative risk. So with that very fun mathematical digression, we can now move on to the rest of the session. I'll start out by walking through this clinical example. So let's suppose, as we did in the previous session, we have a key question. For obese patients with metabolic syndrome, what is the impact of dietary nutrient composition on weight loss and cardiovascular disease? Let's suppose we had our professional information specialist do a thorough literature search of Medline, Embase, etc. They gave us a bunch of studies and we looked through them all and we only decided that two of them addressed this key question. One by McLaughlin in 2006 and one by Muzio in 2007. The PDFs of these two should have been made available to you and hopefully you had time to look through them before you viewed this talk. If not, I encourage you to go do that before moving onward. I don't want to spill the beans. But what we're going to do is we're going to use the evidence from these two studies to see if we can actually make a recommendation. And if so, how strong is that recommendation? I'll give you just a brief overview of these two studies. The first one, McLaughlin, was a randomized trial in the US. It's an academic medical center. They enrolled 57 patients with obesity who all also had insulin resistance. None of them had diabetes. They excluded people that actually were diagnosed with diabetes. The randomized trial compared two diets. Both were a deficit of 750 kilocalories. And they differed with respect to nutrient composition. So the first diet, you could consider a higher carb diet, 60% carb and 25% fat. And the other one we'll call the low carb diet, only 40% carb, 45% fat. And they were the same on the protein. That's percentage of energy. And they reported data on four month weight loss, that's having been on the diet for four months. What do you weigh at the end of the four months? And they also reported numerous lab values that are correlated with the development of cardiovascular disease, such as blood pressure, cholesterol, etc. The other study by Amuzio is quite similar. It was an RCT conducted in Italy at an academic medical center. They enrolled 100 patients with obesity who had metabolic syndrome. And also they excluded people who had diabetes. They also compared two diets. This calorie deficit was a little less extreme difference in your calories. They compared as well a high carb diet to a low carb diet, slightly different percentages than the other study, but basically close enough. They reported data on five month weight loss, as well as numerous lab values, a lot of the same lab values that were reported in the other study. So pretty similar studies, both addressing our key question. So what we're going to do is going to go through all these various steps in grade, looking at these two studies with our key question in mind. So just to review from the last session, we're going to have a starting grade. RCTs are high, non-RCTs are low, opinion is very low. We are then going to see if we want to downgrade the evidence for any of these various domains. And then possibly, do we want to upgrade for any of the other upgrade domains? We're going to do that for both of our outcomes, downgrades and upgrades. And then once we're done with both outcomes, we're going to see across outcomes, take the lowest grade among the critical ones. So here is a very nicely structured grade table that we're going to be filling in based on our focus of these two studies. As you can see, we've got a column for body weight, sorry, a row for body weight and a separate row for cardiovascular disease. If you recall in our key question, those are the two outcomes we're focusing on. And then we have different columns for the different types of downgrades. We have the final grade and then we have data, what the actual findings were in the various studies right alongside the grades. I'll get into why there's no upgrades listed in this table. So starting grade, both of these were randomized trials, so that's quite easy. The starting grade is high. And then we go to the downgrade. So the first downgrade we could consider is study limitations. Here, if you remember, this is where, do we have any concerns about the believability of the results? So we have, we're comparing two diets. And the question is, if the outcomes were different at the end of the day, are we really sure that it was the diet difference that caused the outcome difference? And there were numerous concerns that I had about these studies. I would say mostly minor concerns, but definitely concerns. I was the only one to look through these two for the purpose of this session. Normally, there would be at least two people would independently assess study limitations for each study. But just for this exercise, it was just me. I noticed neither study reported that they used computer randomization, neither reported whether they used concealment of allocation. That's a jargon you might not know. That essentially refers to the use of sealed envelopes so that the person assigning people to groups doesn't know what the next assignment is going to be. They won't be able to tell which group the person in front of them is going to be assigned to until the envelope is opened. Patient blinding. This is the idea that patients might have preconceived ideas about one diet versus another, and that might actually influence either their adherence to the diet or their eventual outcomes. This was not reported in one study and it's not done in the other study. Measurement blinding. This is the idea that the person taking the measurements did not know which diet the person got. For example, someone weighing, a study staff person weighing people, they might be slightly biased in the measurement of weight. I mean, weight is an objective outcome. It would be kind of hard to bias that much, but nevertheless, we would like to see that. Both studies had pretty good balance at baseline in terms of the weights people had as they went into the study. Comorbidities. There was one variable at baseline that was a bit different than one of the study's baseline insulin level. Treatment adherence. Now, this is a really important one when it comes to diets. The idea is you want to make sure they're actually following the diet that they've been assigned to. One of the studies used food diaries and they had weekly check ins with the dietician and had people rate the adherence to the diet, which was pretty good, 7.6 out of 10. So that adherence seemed pretty good in that study. The other one, unfortunately, they only met once a month in a group. And they just at the end of the study, they asked people to rate their perception of their own adherence. So that seems to be a problem. All the outcomes were objectively measured and attrition was pretty low in general. So, you know, taking a look at all these various issues, I've circled the ones in red that are probably, for me, were more concerning. To me, it seemed like this would be a one level downgrade for study limitations. So we fill that in over here on the left side here, I've filled in serious limitations for both outcomes. But I want to draw your attention to the red A right there, which is a footnote. Where we say why we have downgraded study limitations. Here's all the text I wrote, sort of justifying or explaining why there is that downgrade. So here we have an example of transparency that GRADE really encourages users to do something in a structured way and to be transparent about why certain things were done. So that's study limitations. Let's move on to the next one, indirectness. So if you remember, this is one of the harder ones that had lots of facets. So our key question, if you remember, was about patients with metabolic syndrome. And so, thankfully, both studies, one study had patients with insulin resistance, another had metabolic syndrome, very similar concepts. Both studies excluded anyone who actually had diabetes. So both studies seem quite direct when it comes to patients. How about the treatments and comparators? Well, they each had a weight loss diet program, obviously a deficit of calories, slightly different, 1500, 1750. The diets did seem to me to be representative of typical low carb options or high carb options. So I don't really see an issue here either with directness. They seem quite direct in addressing the question we're interested in. What about outcomes? So they both measured weight directly. That's pretty easy. They did, though, use lab values such as blood pressure, cholesterol, triglycerides as a surrogate for cardiovascular disease. So that's a concern, that latter part, that we're not actually measuring cardiovascular disease, we're measuring surrogates. Thus, we are only indirectly addressing the outcome in the key question. Finally, time points. Follow up was four months in one study and five months in the other. We were hoping for six months, but it's, you know, it's almost there. So what do we think about the downgrade for indirectness? So in my judgment, I would say that for weight loss, it's not downgrade for indirectness, but I would for cardiovascular disease. And again, we have a footnote, now it's a red B. And the reason for that downgrade is that they measured cardiovascular disease using surrogate outcomes. And also, length of follow-up was a little bit low. For weight loss, follow-up was also the same, but it wasn't enough for me to call it a serious problem. All right, let's turn to inconsistency. Here, remember, it's about the inconsistency in results, not in the mix. For weight loss, the McLaughlin study found a between-group difference of only about one kilogram, slightly favoring the low-carb group. I think both groups lost six or seven kilograms, which I think is about 15 pounds. And then the Muzio study, they didn't report a between-group difference, but both groups lost about nine kilograms, about 20 pounds, and that's over four or five months. So pretty similar results in weight loss, maybe slightly favoring the low-carb group. And then for cardiovascular disease, if you look at these two studies, this is a rather daunting one to examine because both of them used 10, 15 measures, all lab values, all these various surrogate outcomes for cardiovascular disease. But when I looked through them all, if there was a signal in it, it was that the data seemed to slightly favor the low-carb group. And it depended on the specific measure, blood pressure, triglycerides, et cetera. Some were statistically significant, some were not, but there did seem to be a little trend towards better lab values in those that had received the low-carb diet. So in terms of inconsistency, I didn't see any major concerns, so no downgrades for that. What about imprecision? So remember, this is more of the math and the statistics. Was the study large enough to pin down the diet difference precisely? Both of these studies were fairly small. We had a 57 total in McLaughlin and a 100 in DeGio. McLaughlin reported a confidence interval for weight loss is quite wide, and if you remember, that study had very slightly favored the low-carb group, but the confidence interval's so wide that really there's not much one could say about that actual difference. Cardiovascular disease, again, several outcomes were significant, several were not. In general, the ends were quite small. So I decided to downgrade for imprecision for both outcomes. And as usual, have a footnote for why, and I mentioned the study and this reported confidence interval. So where are we now? So remember, we started at high because these are RCCs. For body weight, we went down one here and down another for imprecision, so now we're at low for body weight. Cardiovascular disease, we've had three downgrades, so we're all the way down to very low for that one. We have one more downgrade to consider, and that's publication bias. So there are various things to consider when you're thinking about whether to downgrade for publication bias. One is study funding. The notion is that some funders might be less interested in certain study results and less likely to publish results that weren't of interest to the funder. These were pretty benign levels of funding. One was funded by NIH. Both studies did say that authors did not have conflicts of interest. Had I had more time, I might have looked at clinicaltrials.gov. That's a repository for clinical trials that have been planned. The notion there is that there do exist trials that have been started, but the results are never published. That might call into question, maybe there are lots of other studies out there on the same question that aren't published, and then you might want to downgrade, in that case, for publication bias. In this case, I didn't have any red flags for this issue, so no concerns there. All right, and then what about upgrades? Well, if you remember from yesterday, the grade system upgrades are not applied to randomized trials, they're only applied to non-randomized studies, so nothing to do there. Now we have our final grades for these two outcomes. We ended up at a low certainty of evidence for body weight and a very low certainty of evidence for cardiovascular disease. Also, in this table, we can fill in the data. I mentioned the amount of weight loss was six or seven kilograms in one study and 10 kilograms in both groups in the other study. Here, I filled in an example lab value. This is systolic blood pressure, very slightly favoring one group in one study and more substantially a difference of seven in the other study. As I said, many of the lab values, I didn't want to fill them all in here. This gives you kind of an idea of the small effect being observed. So now we're going to consider the outcomes together. Remember, the way this works is that we take all the critical outcomes and we look at the lowest rating among them. So our critical outcomes were weight loss, which we call a low, and cardiovascular disease, which we call a very low. So the overall quality of the evidence for this question is going to be very low. All right. So, wow, I've been much faster than I thought I would be for this section, but I did want you to notice all the subjectivity that was throughout that. There were judgments being made. This would normally be done by at least two evidence reviewers assessing each domain independently. And furthermore, notice the structure that that table provided. You're constantly being reminded of where you were in the process, and there is a risk that people might forget to assess some aspect of the evidence, such as directness. Because of the subjectivity, there is a high need for transparency, and you saw that in the tables where I would put footnotes for the reasons for the various guidelines. So that's the end of my planned presentation. I invite questions. John, you talked about downgrading RCTs, but is there a situation where you ever upgrade for RCTs? Yeah, good question. The GRADE group has said that upgrades have never, they've never seen a case where it's compelling to upgrade evidence from randomized trials. And I think the logic there is that randomized trials are going to start at high, and then you start downgrading to moderate to low. You've noticed problems in the evidence. And those problems would mean that they could actually be the reason why you would be tempted to upgrade. So they might be the reason for a large magnitude of effect. So it's a little bit circular in that you, if you've already downgraded, you kind of, you sort of lost your chance to believe the evidence a lot, and so no upgrades. Okay. And in the case of systematic reviews, what do you do? Do you, do you assess and grade each individual study that comprises that systematic review? Well, certainly in the study limitations, we're going to look at all of the various studies. But when we're thinking about an outcome, we're going to take all the studies on that outcome that reported the comparison of interest, such as the two diets we're comparing here, and we're going to really look at all the studies together. So while we do look at them one at a time for examining study limitations and directness, when we're looking at that overall grade, we're going to, we're going to see them together. So that might take the form of a meta-analysis, which is essentially a way to quantitatively combine the data from different studies. That's a way to get a single number that kind of summarizes all of the data in one number. I did not do that here. I could have actually, but I did not do that here just for simplicity. Thank you. Any other questions from anyone? Yes, John, you, when you mentioned publication bias, and you said if you had had more time, you would have looked at clinicaltrials.gov or some other repository. Could you expand on that a bit regarding publication bias and what you might have been looking for? Sure. So clinicaltrials.gov has been around for quite a while, and it's, I believe that any federally funded randomized trial is required to register in that, in that repository. And so the idea of that, in assessing publication bias, we're concerned that there are trials out there that are not in the published literature. And it might be that they were selectively suppressed because the results weren't what the authors wanted. They weren't what journal editors thought was interesting, etc. But from the perspective of guideline committees and systematic reviewers, it shouldn't be about what we find interesting, it should be about what the studies actually show. So going to repositories such as clinicaltrials.gov will give us a broader picture of what actually might have been done. So if I were to go to clinicaltrials.gov and I do a search for obesity, metabolic syndrome, and dietary composition, and these studies were published in 06 and 07, let's suppose that I saw there's actually six or seven other trials, and none of my searches ever found these trials, whatever happened to them. That might give me pause, that the two that were published, maybe were an unrepresentative subset, maybe they were, they found effects, and so they were published. So it's really the selection mechanism and deciding to publish based on results that would give us perhaps a reason to downgrade for publication-wise. Okay, thank you. Also, one more question in terms of the meta-analyses and systematic reviews. I know at one point a meta-analysis of RCTs was considered the highest level of evidence, is that still true today, and does a meta-analysis edge out a systematic review, or do you think for guideline panels they should be looking more at individual studies across the board? Very good question. So back in the day, yeah, you would see these hierarchies where the highest level was a meta-analysis of randomized trials. The way I think about meta-analysis, it is just the way to do the math to get all the data in one place. The studies that underlie it haven't changed. All the biases they have are still there. So I myself would not put, I would still have the same concerns about study limitations in whether or not they've done a meta-analysis. That's really just a way of doing the math so you can see one effect for all the studies together. So you asked, is that still believed generally? I think no. I think the GRADE group does very much encourage examining each study because any given guideline panel is going to have some slightly different wrinkle in a key question, and whether or not, for example, the directness of a study to address that question may vary. Great, thank you. That's helpful. No, it's not. It's not generally believed. Great, thanks. Any other questions? Okay, well then I will turn it over to Stacey, and we can talk about the other type of GRADE rating of strength of recommendation.
Video Summary
In this video, Jonathan Treadwell and Stacey Yule discuss the process of applying the GRADE framework to a specific clinical topic. The video begins with Jonathan giving a brief overview of the topics they will cover: applying GRADE to a clinical topic and crafting recommendations. Jonathan then spends some time defining the terms "relative risk" and "odds ratio" for the audience.<br /><br />After the definitions, Jonathan goes on to present a clinical example of applying GRADE. The example focuses on the impact of dietary nutrient composition on weight loss and cardiovascular disease in obese patients with metabolic syndrome. He mentions two studies, one by McLaughlin in 2006 and one by Muzio in 2007, that will be used to make recommendations. Jonathan walks through the various steps of GRADE, such as assessing study limitations, indirectness, inconsistency, imprecision, and publication bias, for both outcomes.<br /><br />Based on the assessments, the final grades for body weight and cardiovascular disease are determined to be low and very low, respectively. Jonathan highlights the importance of transparency in the GRADE process and the need for subjective judgments in the assessments. He also mentions that upgrades are not applied to randomized trials.<br /><br />The video concludes with Jonathan answering some questions from the audience, including the significance of meta-analyses in the GRADE framework and whether a meta-analysis outweighs individual studies in terms of evidence. Jonathan explains that meta-analyses are just a way to gather the data in one place, and the biases of the individual studies still exist. He emphasizes the importance of examining each study separately and tailoring the assessment to the specific key question. The video then transitions to Stacey, who will discuss the rating of strength of recommendations in the GRADE framework.
Keywords
GRADE framework
clinical topic
applying GRADE
dietary nutrient composition
weight loss
cardiovascular disease
×
Please select your language
1
English