高级搜索
立即登录 | 免费注册
当前位置 >   首页 > 医药资讯 >  药品动态  > 药品资讯内容

模拟随机研究:“银”标准?

Go for the Silver? Using Simulated Randomized Trials

2010-08-03 【发表评论】
中文 | ENGLISH | 打印| 推荐给好友


Although randomized clinical trials remain the “gold” standard for medical knowledge, they share with the precious metal limitations of cost and rarity. With the dramatic push for outcomes-driven medicine and initiatives based on quality of care, there may not be enough “gold” to go around. Is it time for a “silver” standard to supplement the system?

Simulated randomized trials might be that silver standard, said Dr. Eugene H. Blackstone of the Cleveland Clinic Foundation in an interview. He and other researchers are determined to craft an alternative set of trials to supplement randomized clinical trials.

Some randomized efforts are too expensive, too lengthy, or too lacking in potential patient volunteers to ever be accomplished, and simulated trials could compensate for the gaps in medical knowledge. The first paper dealing with the subject, according to Dr. Blackstone, was a seminal article in the early 1980s by Dr. Paul Rosenbaum of the University of Wisconsin, Madison, and Dr. Donald Rubin at the University of Chicago (Biometrika 1983;70:41-55). However, it was not until the 1990s that interest in using an expanded set of statistical tools for comparisons between nonrandomized patients really took off.

The concept of simulated clinical trials is to utilize powerful statistical methods to analyze clinical databases and completed trials for unmined riches. A propensity score is used to sort and segregate patients in a clinical or trial database into analytical silos, where they can be functionally treated as a series of smaller, randomized trial populations or as matched cases.

How does propensity scoring work? Dr. Blackstone explained it this way: When sorting apples and oranges, the propensity score of each piece of fruit provides an estimate of its propensity toward (probability of) belonging to one group (apples) vs. another (oranges) (J. Thorac. Cardiovasc. Surg. 2002;123:8-15).

This is first done by propensity modeling: picking a set of variables that incorporates “everything recorded that may relate to either systematic bias or simply bad luck.” An example is a prospective, nonrandomized observational cohort study of the benefits of aspirin use in patients with coronary artery disease performed by Dr. Patricia A. Gum, Dr. Blackstone, and their colleagues at the Cleveland Clinic Foundation. In this study, the aspirin-treated patients were older, more likely to be men, and more likely to have a history of hypertension and other comorbidities than were the non–aspirin users. The aspirin users also took more ancillary drugs, such as beta-blockers and angiotensin-converting enzyme inhibitors. All these differences probably resulted from physician bias as to which patients should be treated with aspirin for particular conditions (JAMA 2001;286:1187-94).

One question for which a randomized clinical trial would be optimum, but not available, is “which known or suspected coronary artery disease patients have lower all-cause mortality with aspirin treatment?“

Simple univariate analysis showed no association between aspirin use and mortality. However, adjustment for a wide variety of demographic and clinical variables showed that aspirin use was significantly associated with reduced mortality (P = .002), with a hazard ratio (HR) of.67. Further analysis of the characteristics of patients who most benefited from aspirin could not be conducted in a significant fashion.

Because aspirin use was not randomly assigned in this patient population, the research team knew they had to account for potential confounding factors and selection bias that made the patients appear to be apples and oranges, not easily compared.

Their solution was to create a propensity score for the population, which modeled the likelihood of any particular patient being treated with aspirin. This was done by selecting variables that appeared to be associated with aspirin treatment: age, sex, clinical history, medication use, cardiovascular assessment, and exercise capacity. Once the final model was created, it contained 34 covariates.

Each patient was “run through” the model to determine his or her propensity score (i.e., the likelihood, based on demographic and clinical variables, that the person would have been treated with aspirin, independent of whether actual treatment had occurred). For example, Patient A, a nonsmoking woman in her early 50s with a fairly good exercise capacity, would have a lower likelihood of getting aspirin treatment. Patient O, a man over age 60 and a smoker with poor exercise capacity, would have a high likelihood of getting aspirin treatment. It doesn’t matter to the method that the cohort data show that both patients actually got aspirin treatment. The important thing for propensity scoring is their statistical likelihood of having received it.

Once a propensity score is obtained, comparisons can be made in any of three ways, according to Dr. Blackstone. First is simple matching. Dr. Blackstone explained the approach in the Journal of Thoracic and Cardiovascular Surgery: “A patient is selected from the control group whose propensity score is nearest to that of a patient in the case group. If multiple patients are close in propensity scores, optimal selection among these candidates can be used. Remarkably, problems of matching on multiple variables disappear by compressing ‘everything known about the patient’ into a single score!” (2002;123:8-15).

The second way comparisons can be made is stratification, or subclassification into roughly equal-sized groups, according to Dr. Blackstone. In the aspirin example, stratified patients could be divided into five subgroups, or quintiles, based on their calculated propensity scores. In the given scenario, patient A would probably be assigned to the first quintile and patient O to the last. Within each group there would be an important division: patients who actually received aspirin treatment vs. those who did not. But within each quintile, other patient characteristics – from sex to comorbidities to medications – would be the same because of the matched propensity scores, in effect making each quintile a mock randomized trial.

This method of comparison has one obvious problem, said Dr. Blackstone. Since the propensity score was derived from the likelihood of receiving aspirin, there will obviously be more patients in the higher quintiles who received aspirin than in the lower quintiles. Balance in patient characteristics was obtained at the expense of balance in the number of patients receiving each treatment in each quintile. In the study by the Cleveland Clinic group, there were 113 patients who received aspirin in the first quintile vs. 1,092 who did not, compared with 1,045 patients in the fifth quintile who received aspirin vs. 261 who did not.

According to Dr. Blackstone, this means that although an analysis can treat the patients in each quintile as if they were a randomized population according to their lack of statistically significant differences in pertinent clinical or demographic characteristics, the populations have to be considered unbalanced by size (as if a clinical trial were originally crafted for a randomization of 2:1 of treatment A to treatment B). It is within each of these quintiles that comparison of outcome (in this case, all-cause mortality) for the chosen variable (aspirin use) is made.

With such analysis in the original paper, Dr. Gum, Dr. Blackstone, and their colleagues determined that aspirin use was significantly associated with a lower risk of death (HR .56; P less than .001).

And the third way propensity scores can be used is in a multivariable analysis of outcomes. “Such an analysis includes both the comparison variable of interest [age, sex, etc.] and the propensity score,” Dr. Blackstone explained. “The propensity score adjusts the apparent influence of the comparison variable of interest for patient selection differences not accounted for other variables in the analysis.”

In this way, the Cleveland Clinic team was able to determine that the primary characteristics associated with the greatest aspirin-related reduction in mortality were older age, known coronary artery disease, and impaired exercise capacity – much more clinically relevant information than obtained from the original multivariate analysis.

“How does this differ from what we have always done? Most of the time you find that those two types of analyses [multivariate risk analysis and propensity score analysis] give similar results,” said Dr. Blackstone in the interview. “But 15%-20% of the time they don’t give the same results. We’ve never in the past had a good way to figure out have we been fooled or not fooled [by the results], but now we can do both kinds of analyses just to see if there is consistency.”

Simulated clinical trials based on propensity analysis are not a replacement for randomized trials or traditional hazard analysis, but are an important adjunct for mining of registry and trial data to obtain clinically relevant information that might not otherwise be available, Dr. Blackstone concluded.

He said that he had no conflict of interest related to any of the studies or his comments in the interview. Dr. Gum, Dr. Blackstone, and their colleagues reported no financial conflict of interest in their study.

Copyright (c) 2010 Elsevier Global Medical News. All rights reserved. This material may not be published, broadcast, rewritten, or redistributed.

尽管临床随机研究仍然是蕴藏医学知识的金矿(“标准)但其也具有如同贵重金属一样的既昂贵又罕见的局限性。随着对良好治疗转归和治疗质量需求的急剧增加,或许没有那么多的能够满足这种需求。现在是不是到了开发银矿”(“标准)来补充金矿的时候了呢?
 
克利夫兰诊所基金会的Eugene H. Blackstone博士在接受采访时说随机模拟研究或许就是那个银标准。 他与其他研究者决心设计其他研究来作为临床随机研究的补充。
 
一些随机研究因费用过高、耗时过长或缺乏志愿患者而永远无法完成,模拟研究能够弥补这种不足。Blackstone博士表示,20世纪80年代初,威斯康星大学麦迪逊分校的Paul Rosenbaum博士和芝加哥大学的Donald Rubin博士发表了一篇富有开创意义的论文,这是首篇与该主题相关的论文(Biometrika 1983;70:41-55)。然而,直至20世纪90年代,研究者才真正使用大量统计工具来进行非随机患者之间的比较。
 
临床模拟研究的概念在于使用强有力的统计方法来对临床数据库和已完成的研究进行分析,从而发掘出更多的信息。在模拟研究中,研究者根据倾向评分来对临床数据库或研究数据库中的患者进行分类并将其分入分析集中。在这些分析集中,患者可被当做多组小规模的随机研究人群样本或被当做匹配病例进行处理。
 
倾向评分是如何进行的呢?Blackstone 博士举了如下的例子来进行解释:当对苹果和橙子进行分类时,每片水果的倾向评分就代表了这片水果归属于一组(苹果)或另一组(橙子)的倾向性(可能性)(J. Thorac. Cardiovasc. Surg. 2002;123:8-15)
 
这首先需选择一组变量(这些变量包含了与系统偏倚或单纯坏运气相关的所有记录”)来建立倾向模型。克利夫兰诊所基金会的Patricia A. Gum博士、Blackstone博士及其同事在一项前瞻性、非随机观察性队列研究中,对冠心病患者使用阿司匹林的益处进行了分析。在该研究中,与非阿司匹林使用者相比,阿司匹林使用者的年龄偏大、多为男性且多数有高血压和其他合并症病史。阿司匹林使用者服用的辅助性药物(β受体阻滞剂和血管紧张素转换酶抑制剂)也更多。所有这些差异可能是由医生的治疗决定(即哪些患者应接受阿司匹林治疗)的偏倚引起的(JAMA 2001;286:1187-94)
 
需要解答的一个问题是 哪些已知或可疑冠心病患者在接受阿司匹林治疗后全因病死率会降低呢?”(临床随机研究通常可很好地回答这一问题)
 
简单的单因素分析显示,阿司匹林治疗与病死率无关。然而,对各种人口学变量和临床变量进行校正后分析发现,阿司匹林治疗与病死率降低显著相关[P = 0.002;危险比(HR) 0.67]。无法以显著性检验方法对阿司匹林最大受益者的特征进行进一步分析。
 
由于该患者人群并不是随机接受阿司匹林治疗,因此研究者知道,他们必须对与归属(苹果或橙子)倾向相关的且导致不易对患者进行比较的潜在混杂因素和选择偏倚进行说明。
 
研究者的解决方案是以正接受阿司匹林治疗的任何特定患者的倾向性为基础来为该人群建立倾向评分模型,即通过选择与阿司匹林治疗相关的变量来建立模型,这些变量包括:年龄、性别、临床病史、用药情况、心血管评估结果以及运动能力。最终建立的模型包括34个协变量。
 
将每例患者导入模型中,确定其倾向评分(即基于人口学变量和临床变量,该患者接受阿司匹林治疗的可能性,不管实际治疗是否如此)。例如,患者A是一名运动能力相当好的50岁出头的不吸烟女性,其接受阿司匹林治疗的可能性就比较低。患者O是一名运动能力差的60多岁的吸烟男性,其接受阿司匹林治疗的可能性就较高。即使研究资料显示这两例患者实际上均接受阿司匹林治疗,这对于倾向评分法并不重要。对于倾向评分法而言,重要的是患者接受阿司匹林治疗的统计学概率。
 
Blackstone博士表示,一旦获得倾向评分,就能够以3种方式进行比较。第1种是简单匹配。Blackstone 博士在《胸心血管外科杂志》(Journal of Thoracic and Cardiovascular Surgery)中对该方法进行了解释:从对照组和病例组中各选择1例倾向评分最相近的患者。如果多例患者的倾向评分接近,则可在这些候选患者中进行优化选择。值得注意的是,将患者的所有已知变量压缩成单一评分,可消除基于多变量进行匹配所遇到的问题。”(2002;123:8-15)
 
Blackstone博士表示,第2种方式是通过分层或进一步细分成人数基本相等的组。以上述阿司匹林研究为例,基于算得的倾向评分,可将分层后的患者进一步分成5个亚组或五分位数。在这种情况下,患者A将可能被分入第一个五分位数,而患者O将可能被分入最后一个五分位数。在每组中,还将进行以下重要区分:实际接受阿司匹林治疗的患者 未接受阿司匹林治疗的患者。但在每个五分位数内,由于倾向评分匹配,因此其他患者特征(从性别至合并症至用药情况)均相同,这使得每个五分位数事实上就是一项模拟的随机研究。
 
Blackstone博士表示,这种比较方法存在一个明显问题。由于倾向评分是基于接受阿司匹林治疗的概率而计算得出的,因此高五分位数中接受阿司匹林治疗的患者人数将明显多于低五分位数。在每个五分位数中,患者特征的平衡是以牺牲接受每种治疗的患者数量的平衡而获得的。在克利夫兰诊所研究小组进行的该研究中,第一个五分位数中接受和不接受阿司匹林治疗的患者数量分别为113例和 1,092例,而第五个五分位数中接受和不接受阿司匹林治疗的患者数量分别为1,045例和 261例。
 
Blackstone博士指出,这意味着,尽管根据相关临床特征或人口学特征无显著性差异的结果,我们能够将每个五分位数中的患者当做随机人群一样进行分析处理,但还必须将这些人群看成是在数量方面存在不平衡的人群(如同临床研究最初在设计上将治疗A与治疗B的随机化比率设成2:1一样)。在每个五分位数中,对所选择变量(阿司匹林治疗)的结局指标(在此为全因病死率)进行比较。
 
在原始论文中,Gum博士、Blackstone博士及其同事通过此分析发现,阿司匹林治疗与低死亡风险显著相关(HR 0.56P <0.001)
 
3种,可将倾向评分用于对结局指标进行的多因素分析中。Blackstone 博士解释说:这种分析同时纳入了相关的比较变量(年龄、性别等)和倾向评分。倾向评分校正了相关比较变量对患者选择差异的明显影响(在分析中未考虑其他变量)
 
克利夫兰诊所研究小组通过这种方式得以明确,与阿司匹林治疗相关最大病死率降幅有关的主要特征是老龄、已知冠心病和运动功能减退,这一信息比原始多因素分析所获得的信息更具有临床相关性。
 
Blackstone博士在接受采访时说:这与我们通常所做的有何不同?在大多数情况下,你会发现这两种分析(多因素风险分析和倾向评分分析)所得出的结果相似。但在15%~20%的情况下,这两种分析的结果并不一样。过去我们都没有一套好的方法来验证分析结果是否可靠,但现在我们能够同时进行这两种分析,以确定结果是否一致。
 
Blackstone 博士总结说,基于倾向评分的临床模拟研究不会取代随机研究或传统风险分析,但临床模拟研究是一种重要的补充手段,其可通过分析注册数据和研究数据来获得通过其他方式无法获取的临床相关信息。
 
Blackstone 博士声明其没有与上述研究或受访内容相关的利益冲突。Gum博士、Blackstone博士及其同事声明,在他们的研究中无任何经济利益冲突。
 
爱思唯尔 版权所有

Subjects:
general_primary, cardiology, endocrinology, diabetes, neurology, gastroenterology, pulmonology
学科代码:
内科学, 心血管病学, 内分泌学与糖尿病, 神经病学, 消化病学, 呼吸病学

请登录后发表评论, 点击此处登录。

疾病资源中心  疾病资源中心
 病例分析

 王燕燕 王曙

上海交通大学附属瑞金医院内分泌科

患者,女,69岁。2009年1月无明显诱因下出现乏力,当时程度较轻,未予以重视。2009年3月患者乏力症状加重,尿色逐渐加深,大便习惯改变,颜色变淡。4月18日入我院感染科治疗,诉轻度头晕、心慌,体重减轻10kg。无肝区疼痛,无发热,无腹痛、腹泻、腹胀、里急后重,无恶性、呕吐等。入院半月前于外院就诊,查肝功能:ALT 601IU/L,AST 785IU/L,TBIL 97.7umol/L,白蛋白 41g/L,甲状腺功能:游离T3 30.6pmol/L,游离T4 51.9pmol/L,心电图示快速房颤。
 

医学数据库  医学数据库



友情链接:中文版柳叶刀 | MD CONSULT | Journals CONSULT | Procedures CONSULT | eClips CONSULT | Imaging CONSULT | 论文吧 | 世界医学书库 医心网 | 前沿医学资讯网

公司简介 | 用户协议 | 条件与条款 | 隐私权政策 | 网站地图 | 联系我们

 互联网药品信息服务资格证书 | 卫生局审核意见通知书 | 药监局行政许可决定书 
电信与信息服务业务经营许可证 | 京ICP证070259号 | 京ICP备09068478号

Copyright © 2009 Elsevier.  All Rights Reserved.  爱思唯尔版权所有