Evaluation of ChatGPT as a Reliable Source of Medical Information on Prostate Cancer for Patients: Global Comparative Survey of Medical Oncologists and Urologists
Abstract
Introduction:
No consensus exists on performance standards for evaluation of generative artificial intelligence (AI) to generate medical responses. The purpose of this study was the assessment of Chat Generative Pre-trained Transformer (ChatGPT) to address medical questions in prostate cancer.
Methods:
A global online survey was conducted from April to June 2023 among > 700 medical oncologists or urologists who treat patients with prostate cancer. Participants were unaware this was a survey evaluating AI. In component 1, responses to 9 questions were written independently by medical writers (MWs; from medical websites) and ChatGPT-4.0 (AI generated from publicly available information). Respondents were randomly exposed and blinded to both AI-generated and MW-curated responses; evaluation criteria and overall preference were recorded. Exploratory component 2 evaluated AI-generated responses to 5 complex questions with nuanced answers in the medical literature. Responses were evaluated on a 5-point Likert scale. Statistical significance was denoted by P < .05.
Results:
In component 1, respondents (N = 602) consistently preferred the clarity of AI-generated responses over MW-curated responses in 7 of 9 questions (P < .05). Despite favoring AI-generated responses when blinded to questions/answers, respondents considered medical websites a more credible source (52%-67%) than ChatGPT (14%). Respondents in component 2 (N = 98) also considered medical websites more credible than ChatGPT, but rated AI-generated responses highly for all evaluation criteria, despite nuanced answers in the medical literature.
Conclusions:
These findings provide insight into how clinicians rate AI-generated and MW-curated responses with evaluation criteria that can be used in future AI validation studies.
Please view the PDF to review the figure or table in this Insight, if applicable. Please use the comment function within the Insight section in the Edit view if you have feedback or corrections.
UPJ Insight
-
Study Need and Importance
-
The amount of published scientific data is an increasing burden for those seeking up-to-date information to guide clinical decision making. Generative artificial intelligence (GenAI), such as Chat Generative Pre-trained Transformer (ChatGPT), can summarize medical information, but performance standards reflecting domains relevant to subject matter experts are still evolving. This study aimed to assess ChatGPT-generated responses to questions about prostate cancer and establish best practices for response evaluation.
-
What We Found
-
Oncologists and urologists were invited to participate in a 2-part, blinded, online survey between April and July 2023. In component 1, participants (n = 602) compared ChatGPT-generated and medical writer (MW)-curated responses to novice-level, intermediate-level, and expert-level questions in a discrete-choice–style experiment. Participants rated clarity, accuracy, relevancy, completeness, and overall preference on a 5-point Likert scale. Participants had a similar preference or preferred ChatGPT-generated responses for 8 of 9 questions (Figure) and universally preferred the clarity of ChatGPT-generated responses. In component 2, ChatGPT-generated responses to complex questions without established answers were rated “very”/“extremely” across all domains by > 70% of participants (n = 98). In a separate survey section, < 20% of participants rated ChatGPT as “very”/“extremely” credible, compared with 52% to 70% for Cancer.net, Cancer.org, and Medscape, reflecting an overall lack of trust in GenAI.
-
Limitations
-
The survey was voluntary, and some participants dropped out or failed quality assurance. GenAI content may change with model updates; ChatGPT 4.0 was used for the analysis.
-
Interpretation for Patient Care
-
These results demonstrate that while GenAI can meet or exceed the quality of MW-curated content, GenAI is still not necessarily trusted by clinicians. To fully harness the potential of GenAI to inform patient care, bias against GenAI will need to be addressed and performance standards will need to be implemented. Download PPT
In oncology, published scientific data have increased exponentially as more research is conducted and more therapeutics are developed. An unmet need for health care professionals (HCPs) and patients is a tool to rapidly distill important aspects of this burgeoning volume of contemporary medical literature to inform decision making and improve quality of treatment.
One mechanism to address information overload in oncology is generative artificial intelligence (GenAI). GenAI may potentially improve the efficiency and accuracy of information gathering and provide clear, reliable, and precisely summarized information.1,2 There are a variety of GenAI platforms, including Chat Generative Pre-trained Transformer (ChatGPT; https://openai.com/blog/chatgpt), Gemini (https://gemini.google.com/), and Med-PaLM 2 (https://sites.research.google/med-palm/), each with unique properties, which are trained to generate natural language conversations in various domains and scenarios.2,3 ChatGPT was built on transformer architecture using self-attention mechanisms to learn from a large corpus of text from various publicly available sources and fine-tuned using Reinforcement Learning from Human Feedback.3,4
Since its launch in November 2022, ChatGPT has been widely used5,6; however, there is limited evidence on its application to provide correct and relevant scientific information on prostate cancer (PCa). The objective of this study was to use a blinded global online survey of medical oncologists and urologists to evaluate the reliability of ChatGPT as a source of medical information on PCa for patients.
Methods
Study Participants and Setting
A global web-based survey was conducted through SERMO (Cambridge, Massachusetts; www.sermo.com), with physicians from the United States, Canada, and larger European countries (France, Germany, Italy, Spain, and the United Kingdom) recruited to participate in a 15-minute (component 1) or 10-minute (component 2) online survey. Screening questions identified physicians specializing in medical oncology or urology and gathered additional information about the participants’ clinical practice. Participants were required to be proficient in English and treat ≥ 10 patients (component 1) or ≥ 25 patients (component 2) with PCa per week. For both components, an overall target of ≥ 35% physicians who participated in clinical trial research during the last 3 years was applied.
Survey Methodology
Questions for both survey components were chosen by the authors (J.G., L.S.). The same queries were used for both AI responses and medical writer (MW) searches to provide a direct evaluation of how each source meets the needs of HCPs under similar conditions. Physician authors ensured that questions (n = 3) were coded into appropriate categories: (1) novice—basic level of scientific and medical knowledge needed to answer; (2) intermediate—mid-level scientific and medical knowledge; and (3) expert—high-level scientific and medical knowledge. Component 1 explored PCa questions where responses were composed from publicly available medical literature by both MWs and ChatGPT-4.0 (AI-generated) to present 2 independent responses for each question (Figure 1, A; Supplementary Material, https://www.urologypracticejournal.com). MW-curated responses were sourced with minimal content modifications from well-established medical websites including Cancer.net (American Society of Clinical Oncology), Cancer.org (American Cancer Society), and Medscape (WedMD Health Professional Network) and evaluated by physician authors for accuracy, whereas AI-generated text was presented to respondents in its raw form (see Supplementary Material, https://www.urologypracticejournal.com). The target audiences for Cancer.net and Cancer.org are patients and caregivers, whereas clinicians and medical scientists are the target audiences for Medscape. All responses were assessed between March 9 and 17, 2023. Readability for all responses was evaluated using Clear-AI (Pfizer Inc.), a health-literacy AI tool designed to assess plain language qualities of written text.
Download PPTParticipants were randomly exposed to and blinded to a response from 1 of 2 categories (AI-generated vs MW-curated) for each of the 9 questions (Figure 1, A; Supplementary Material, https://www.urologypracticejournal.com). Participants evaluated responses on a 5-point semantic differential scale (“not at all” to “extremely”) using the following criteria: clarity, accuracy, relevancy, and completeness. After each question-response stimuli, participants were exposed to the other response and asked to compare the initial response with the alternative in a discrete-choice–style experiment. The prompt for the discrete choice was “Which of these responses is of higher quality overall?” The last survey question evaluated source credibility using a 5-point semantic differential scale (“not at all” to “extremely”).7
In component 2, participants were exposed to only AI-generated responses for 5 questions (Figure 1, B; Supplementary Material, https://www.urologypracticejournal.com). Participants evaluated responses usin the same criteria as component 1.
Analysis
For component 1, data analysis compared AI-generated vs MW-curated responses for percentage of respondents who selected “very”/“extremely” on the semantic differential scale for each evaluation criteria and overall preference. The statistical tests for the 4 semantic measures were 2-sample binomial tests of significance, whereas the overall preference ratings were a 1-sample binomial test of significance. Both tests have a null hypothesis that the 2 values are equal (eg, no difference between measures for AI-generated vs MW-curated response). For component 2, evaluation criteria ratings were analyzed descriptively.
For component 1, a power analysis was conducted to estimate the sample size needed to detect differences in outcome measures. In this analysis, the effect size represented the absolute percentage difference in respondent ratings on the semantic differential scale and absolute percentage difference in preference for AI-generated vs MW-curated responses. This analysis indicated that for an effect size of 10%, a total sample of 776 respondents was needed. However, a feasibility study by SERMO indicated that a panel size of 600 was attainable. Therefore, the survey was underpowered to detect all differences, but in many cases, the effect size was large enough that statistically significant differences were identified.
Additional methods on data quality are included in the Supplementary Material (https://www.urologypracticejournal.com).
Results
Respondent Demographics
The survey for component 1 was conducted between April 28 and July 7, 2023. Overall, 1850 HCPs were invited to participate; 982 (53%) qualified through the screener, with 602 (33%) passing quality assurance (Supplemental Figure 1A, https://www.urologypracticejournal.com). Responses from a total of 602 medical oncologists (60%) or urologists (40%) are described. Respondents were equally split between North America and Europe (Table). Most of the respondents practiced in an academic/university setting; the mean years in practice and mean number of patients with PCa seen per week were 17 and 32, respectively. Over 50% of respondents had clinical trial experience in the past 3 years.
Component 1 (n = 602) | Component 2 (n = 98) | |
Specialty, % | ||
Medical oncologist | 60 | 52 |
Urologist | 40 | 48 |
Yr in practice, mean ± SD | 17 ± 8.0 | 17 ± 7.5 |
Patients with prostate cancer seen per wk, No., mean ± SD | 32 ± 34 | 56 ± 47 |
Clinical trial experience in the previous 3 y, % | 52 | 35 |
Geographic location, %a | ||
United States | 43 | 54 |
Italy | 13 | 7 |
United Kingdom | 13 | 8 |
Spain | 10 | 7 |
Germany | 9 | 7 |
Canada | 8 | 11 |
France | 4 | — |
Clinical practice setting, % | ||
Academic/university | 43 | 49 |
Community hospital | 28 | 26 |
Private practice | 20 | 19 |
Outpatient cancer care center | 5 | 2 |
Cancer clinic | 4 | 4 |
The survey for component 2 was conducted separately between May 22 and July 7, 2023. Of 713 HCPs who were invited to participate, 171 (24%) qualified through the screener and 98 (14%) passed data quality assurance (Supplemental Figure 1B, https://www.urologypracticejournal.com). Responses from 98 medical oncologists (52%) or urologists (48%) are described. Similar to component 1, most of the respondents for component 2 were from the United States and practiced in an academic/university setting (Table). Although the mean years in practice (17 years) was similar to component 1, as expected the mean number of patients seen per week was higher in component 2 (n = 56). Thirty-five percent of respondents had clinical trial experience in the past 3 years.
Component 1: Comparison of AI-Generated and MW-Curated Responses
Respondents identified Medscape (67%) as a “very”/“extremely” credible source of information, followed by Cancer.org (63%) and Cancer.net (52%; Figure 2, A). By contrast, only 14% considered ChatGPT or other large language models (LLMs) to be “very”/“extremely” credible sources (Figure 2, A).
Download PPTComponent 1 evaluated questions with varying levels of complexity. The MW-curated responses and AI-generated responses for component 1 scored similarly for readability (college level for difficulty) with a Flesch-Kincaid score of 49.5 and 33.1, respectively.8 For questions 2 and 3 on screening and diagnosis, clarity was significantly better (question 2: 87% vs 80%; P = .02; question 3: 81% vs 67%; P < .01) for AI-generated responses vs MW-curated responses. For question 2, more respondents had an overall preference for the AI-generated response compared with the MW-curated response (P = .01; Figure 3, A).
Download PPTFor intermediate question 4 on risk groups, all evaluation criteria favored the MW-curated response (completeness and accuracy, P < .01; relevance, P = .04) except for clarity, which favored the AI-generated response (83% vs 68%; P < .01; Figure 3, B). Significantly more respondents had an overall preference for the MW-curated response compared with the AI-generated response (60% vs 40%; P < .01; Figure 3, B). For question 5 on initial treatment regimens, completeness (85% vs 77%; P = .01) favored the MW-curated response, whereas clarity favored the AI-generated response (83% vs 67%; P < .01). Clarity also favored the AI-generated response compared with the MW-curated response (83% vs 72%; P < .01) for question 6 on observation/active surveillance.
For expert question 8 on the grade grouping system, compared with MW-curated responses, all criteria significantly favored AI-generated responses: completeness (83% vs 75%; P = .01), relevance (89% vs 82%; P = .01), accuracy (87% vs 78%; P < .01), and clarity (83% vs 69%; P < .01). Similarly, for question 9 on the role of genomic assays, completeness (81% vs 63%; P < .01), relevance (80% vs 64%; P < .01), and clarity (75% vs 48%; P < .01) significantly favored AI-generated responses. For all 3 expert questions, significantly more respondents had an overall preference for the AI-generated responses vs the MW-curated responses (questions 7 and 9, both P < .01; question 8, P = .04; Figure 3, C).
Exploratory Component 2: AI-Generated Responses
Similar to component 1, respondents from component 2 identified Medscape (70%) as a “very”/“extremely” credible source of information, followed by Cancer.org (69%) and Cancer.net (56%; Figure 2, B). Only 19% of respondents considered ChatGPT or other LLMs a “very”/“extremely” credible information source (Figure 2, B).
Component 2 was an exploratory part of the study where more experienced physicians (≥25 patients/week) evaluated AI-generated responses to 5 complex questions without established answers in the medical literature. Across all questions, the percentage of respondents rating the AI-generated responses as “very”/“extremely” ranged from 71% to 83% for completeness, 82% to 89% for relevance, 74% to 80% for accuracy, and 78% to 84% for clarity, with the exception of 70% for question 5 (Figure 4).
Figure 4. Component 2—AI-generated responses to complex questions with nuanced answers in the medical literature. Stacked bar graphs provide response evaluations for 5 question/response pairs, with 95% CIs for responses provided in the tables. Likert criteria were combined to facilitate visual interpretation. The GenAI platform used to provide AI-generated responses was ChatGPT 4.0. AI indicates artificial intelligence; ChatGPT, Chat Generative Pre-trained Transformer; GenAI, generative artificial intelligence.
Download PPTDownload PPTDiscussion
This study provides evidence that LLMs can meet or exceed the quality of MW-generated content in certain contexts. The ability of GenAI systems to generate comparable and acceptable content with minimal human intervention is a promising aspect for health care systems, particularly in terms of scalability and cost. This observation aligns with the growing interest in how AI can streamline information management processes in clinical settings and reduce the burden on HCPs.
In this study, respondents identified medical websites as being more credible sources of information compared with ChatGPT 4.0. Despite this, in component 1, AI-generated responses were preferred for specific questions over MW-curated responses regardless of question complexity. For overall preference, respondents ranked AI-generated responses similarly (4/9 questions) or significantly better (another 4/9 questions) compared with MW-curated responses. For the expert questions, there was a significant overall preference for AI-generated responses. AI-generated responses were also rated highly in component 2, where ratings of “very”/“extremely” were given by ≥ 70% of HCPs across all domains. If unblinded, we would anticipate HCPs would trust AI-generated responses less owing to bias, although this was not evaluated in our study. For GenAI solutions to be used appropriately and at scale, bias around GenAI will need to be addressed by the health care community.
In component 1, respondents universally preferred the clarity of AI-generated responses vs MW-curated responses, which reached statistical significance for 7/9 questions; other evaluation criteria were more variable. Overall, ChatGPT 4.0 did well in the clarity domain, likely because it is a language model and not a knowledge model.3 For expert responses, this is particularly interesting because as science becomes increasingly complex, the ability of GenAI to distill these complexities for patients will become increasingly important.
Although assessment of AI in the health care setting is still in its infancy, a few studies have evaluated AI-generated responses for accuracy with mixed results on the performance of LLMs.9-12 Another study evaluated the ability of 5 LLMs, including ChatGPT, to answer 22 questions with varying knowledge levels on patient education guidelines for PCa.13 According to the criteria of accuracy, comprehensiveness, patient readability, humanistic care, and stability, 3 urologists determined that the accuracy of the AI-generated responses was > 90%. Of note, high accuracy was observed with more basic questions, but no comparisons with human subject matter and summaries were provided.
There is no consensus on performance standards for testing GenAI within the medical community. LLM performance is linked to the data it can ingest, and currently, many research articles are not accessible to LLMs. Despite this limitation, most of our AI-generated responses were rated highly for accuracy, but it is unclear whether 75% to 80% accuracy is sufficient for medical questions. An evaluation of multiple cancers demonstrated that one third of treatment recommendations by GenAI were at least partially discordant with treatment guidelines.14 Evaluating GenAI technologies in the context of real-world utilization and even randomized clinical trials may be needed but will be costly and time-consuming. Therefore, guidelines and standard domains of evaluation must be established before GenAI outputs can be interpreted with confidence. Currently, authorities in the United States and Europe are addressing regulation of GenAI.15 It is essential to verify AI-generated responses for accuracy, include the version number and update/training date update alongside outputs, and quantitatively assess model performance over time. Notably, later versions of ChatGPT did not necessarily improve compared with earlier versions.16 Intermediate question 4 on risk groups for localized PCa provides an example from our study where the ChatGPT 4.0–generated response was factually incorrect and ChatGPT 3.5 generated a better result (Supplementary Material, https://www.urologypracticejournal.com).
To our knowledge, this is the largest study using multiple evaluation criteria to assess responses to PCa questions at varying knowledge levels from 1 LLM, that is, ChatGPT 4.0, which provides important information for future validation studies. One limitation is that a large percentage of surveys failed quality assurance. However, most disqualifications resulted from incomplete surveys or failure to meet predefined demographic quotas (see Supplementary Materials, Supplemental Figure 1, https://www.urologypracticejournal.com). The high dropout rate can largely be attributed to the voluntary nature of the survey, which may lead to incomplete participation if respondents are unable to complete the survey or realize that they do not meet the inclusion criteria. Another limitation is that AI-generated responses are continually being refined and updated as the models adapt; therefore, responses to the same question may change. In addition, MWs curated the responses from 3 trusted medical websites rather than PCa experts; however, MWs are often used to develop publications across the medical industry. The use of GenAI tools does not replace clinical visits or dialog between patients and physicians. Future studies are planned to examine patient-derived or patient-advocate–derived questions from the clinic and compare responses from medical experts to GenAI approaches.
Conclusions
This large, global, online survey of medical oncologists and urologists evaluated the performance of ChatGPT, a GenAI tool, vs MW-curated responses to answer questions about PCa. Despite AI responses being favored in the component 1 survey, the respondents considered website-based sources of information to be more credible than GenAI for both components. The clarity of AI-generated responses was consistently preferred over MW-curated responses. The findings of this survey provide direct insight into how clinicians view GenAI-generated responses, using evaluation criteria that can be included in future GenAI studies.
Acknowledgments
The authors would like to acknowledge Ella Kasanga, who curated the responses from medical websites to questions in component 1. With thanks to the physicians who participated in this study. Medical writing and editorial support were provided by Julie B. Stimmel, PhD CMPP; Valerie Moss, PhD CMPP; Allison Alwan TerBush, PhD; and Rosie Henderson, MSc, all of Onyx (a division of Prime, London, UK), funded by Pfizer Inc. The authors were involved in collection and interpretation of information provided in the manuscript, and ultimate responsibility for opinions and conclusions lies with the authors.
References
- 1. . Artificial intelligence and machine learning in clinical medicine, 2023. N Engl J Med.2023; 388(13):1201-1208. doi: 10.1056/NEJMra2302038 Crossref, Medline, Google Scholar
- 2. . Generative AI in medicine and healthcare: promises, opportunities and challenges. Future Internet.2023; 15(9):286. doi: 10.3390/fi15090286 Crossref, Google Scholar
- 3. . Human-like problem-solving abilities in large language models using ChatGPT. Front Artif Intell.2023; 6:1199350. doi: 10.3389/frai.2023.1199350 Crossref, Medline, Google Scholar
- 4. . Using artificial intelligence (Watson for Oncology) for treatment recommendations amongst Chinese patients with lung cancer: feasibility study. J Med Internet Res.2018; 20(9):e11087. doi: 10.2196/11087 Crossref, Medline, Google Scholar
- 5. . Benefits, limits, and risks of GPT-4 as an AI Chatbot for medicine. Reply. N Engl J Med.2023; 388(25):2400. doi: 10.1056/NEJMc2305286 Crossref, Medline, Google Scholar
- 6. . ChatGPT in medicine: an overview of its applications, advantages, limitations, future prospects, and ethical considerations. Front Artif Intell.2023; 6:1169595. doi: 10.3389/frai.2023.1169595 Crossref, Medline, Google Scholar
- 7. . Resolving the 50-year debate around using and misusing Likert scales. Med Educ.2008; 42(12):1150-1152. doi: 10.1111/j.1365-2923.2008.03172.x Crossref, Medline, Google Scholar
- 8. . How to write plain English. Accessed November 30, 2023. Google Scholar
- 9. . Evaluasting artificial intelligence responses to public health questions. JAMA Netw Open.2023; 6(6):e2317517. doi: 10.1001/jamanetworkopen.2023.17517 Crossref, Medline, Google Scholar
- 10. . Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern Med.2023; 183(6):589-596. doi: 10.1001/jamainternmed.2023.1838 Crossref, Medline, Google Scholar
- 11. . Can ChatGPT, an artificial intelligence language model, provide accurate and high-quality patient information on prostate cancer?. Urology.2023; 180:35-58. doi: 10.1016/j.urology.2023.05.040 Crossref, Medline, Google Scholar
- 12. . Evaluating the performance of ChatGPT in answering questions related to benign prostate hyperplasia and prostate cancer. Minerva Urol Nephrol.2023; 75(6):729-733. doi: 10.23736/S2724-6051.23.05450-2 Crossref, Medline, Google Scholar
- 13. . Can the ChatGPT and other large language models with internet-connected database solve the questions and concerns of patient with prostate cancer and help democratize medical knowledge?. J Transl Med.2023; 21(1):269. doi: 10.1186/s12967-023-04123-5 Crossref, Medline, Google Scholar
- 14. . Use of artificial intelligence chatbots for cancer treatment information. JAMA Oncol.2023; 9(10):1459-1462. doi: 10.1001/jamaoncol.2023.2954 Crossref, Medline, Google Scholar
- 15. . The challenges for regulating medical use of ChatGPT and other large language models. JAMA.2023; 330(4):315-316. doi: 10.1001/jama.2023.9651 Crossref, Medline, Google Scholar
- 16. . How is ChatGPT’s behavior changing over time?. ArXiv. 2023;
10.48550/arXiv.2307.09009 . Google Scholar
Funding/Support: The study was supported by Pfizer Inc.
Conflict of Interest Disclosures: Dr Sboner is a consultant to Astellas Pharma, AstraZeneca, Ipsen Pharma, Karl Storz AG, Johnson and Johnson Innovative Medicine (formerly Janssen), Pfizer, and Roche; receives royalties from patents: patent A290/99 (implantable incontinence device), patent AT00/00001 (C-Trap, implantable device to treat urinary incontinence), patent 2019/8223 (risk prediction of renal cell carcinoma using proportional subtype assignments), patent PCT/EP2020/056398 (method for determining renal cell carcinoma subtypes II); membership on Board of Directors or advisory committees for the European Association of Urology and European Cancer Organisation. Dr Armstrong has received fees from Astellas Pharma, Pfizer (formerly Medivation), Bayer, Forma, Novartis, Dendreon, Johnson and Johnson Innovative Medicine (formerly Janssen), Merck, AstraZeneca, Bristol Myers Squibb, Exelixis, Epic Sciences, and Sumitomo Pharma America, Inc (formerly Myovant Sciences Inc); institutional research funding from Astellas Pharma, Pfizer (formerly Medivation), Bayer, Forma, Novartis, Dendreon, Johnson and Johnson Innovative Medicine (formerly Janssen), Merck, AstraZeneca, Roche/Genentech, Bristol Myers Squibb, and Amgen; and travel expenses from Astellas Pharma. Drs Habr, Ghith, Schuler, Serfass, Garas, and Chari are employees and shareholders of Pfizer Inc. Dr Walz is a consultant to Astellas, Johnson and Johnson Innovative Medicine (formerly Janssen), Anna/C-TRUS, Blue Earth Diagnostics, 3D-Biopsy, Telix, Lightpoint Medical, and A3P. Dr Gleave reports stock or ownership interest in OncoGenex Technologies Inc, Sustained Therapeutics Inc, and Sikta Biopharma; is a consultant to Astellas Pharma Inc, AstraZeneca, Bayer, Genova Diagnostics (GDx), Johnson and Johnson Innovative Medicine (formerly Janssen), Pfizer Inc, Roche, Sanofi, and TerSera Therapeutics LLC; and holds patents for OGX-011, OGX-427, ST-CP, and ST-POP. Dr Truman is an employee and shareholder of MedThink, Inc, which was a paid consultant in connection with the development of this manuscript. Dr Sternberg has served as a consultant for Janssen-Cilag, Johnson and Johnson Innovative Medicine (formerly Janssen Biotech), Astellas Pharma, Sanofi-Genzyme, Novartis, Bayer, Pfizer, Merck, Merck Sharp and Dohme, AstraZeneca, Gilead, Bristol Myers Squibb, Johnson and Johnson Innovative Medicine (formerly Janssen), Foundation Medicine, UroToday, Medscape, and GLG; and has received prior institutional funding from Johnson and Johnson Innovative Medicine (formerly Janssen and Cougar Biotechnology), Pfizer (formerly Medivation), Clovis Oncology, and Roche-Genentech. All other authors have nothing to disclose.
Ethics Statement: This study was deemed exempt from Institutional Review Board review.
Author Contributions:
Conception and design: Armstrong, Stenzl, Sternberg, Habr, Chari, Ghith, Truman, Schuler, Gleave, Garas.
Data analysis and interpretation: Sboner, Armstrong, Habr, Chari, Rogers, Ghith, Walz, Truman, Schuler, Serfass, Garas.
Drafting the manuscript: Armstrong, Stenzl, Ghith, Walz, Schuler, Garas.
Critical revision of the manuscript for scientific and factual content: Sboner, Armstrong, Stenzl, Sternberg, Habr, Chari, Rogers, Ghith, Walz, Truman, Schuler, Serfass, Gleave, Garas.
Statistical analysis: Sternberg, Habr, Truman, Schuler, Garas.
Supervision: Sboner, Armstrong, Stenzl, Habr, Chari, Rogers, Ghith, Walz, Truman, Schuler, Serfass, Gleave.
Data Availability: Upon request and subject to review, MedThink info@medthink.