You are prohibited from using or uploading content you accessed through this website into external applications, bots, software, or websites, including those using artificial intelligence technologies and infrastructure, including deep learning, machine learning and large language models and generative AI.
You have accessJournal of UrologyReview Articles1 Jul 2022

Diagnosing with a TWIST: Systematic Review and Meta-Analysis of a Testicular Torsion Risk Score

View All Author Information



The Testicular Workup for Ischemia and Suspected Torsion (TWIST) score is a 7-point tool to evaluate acute scrotal pain. Parameters include testicular swelling (2 points), hard testis (2), high-riding testis (1), absent cremasteric reflex (1) and nausea/vomiting (1). This review aimed to determine the diagnostic utility of TWIST and its role in risk stratification.

Materials and Methods:

A systematic review and meta-analysis of diagnostic test accuracy was conducted. Five risk stratification systems were explored, including the Barbosa (0–2, 3–4, 5–7) and Sheth (0, 1–5, 6–7) scoring systems, to obtain sensitivity, specificity and area under the receiver operating curve.


Thirteen studies were identified, 9 prospective studies proceeded to meta-analysis of diagnostic test accuracy and 5 pediatric studies (1,060 patients, 199 torsions) were included in the primary analysis. The most accurate risk stratification system was Barbosa (0–2, 3–4, 5–7), with an AUC of 0.924 (95% CI: 0.865, 0.956). Barbosa showed favorable sensitivity in low-risk patients (0.984), facilitating rule out of torsion, and favorable specificity (0.975) in high-risk patients, facilitating urgent surgical exploration. Sensitivity and specificity in intermediate-risk patients were 0.922 and 0.682, respectively, indicating a need for further workup with ultrasound. Using this stratification, 65.2% of patients were low-risk, 19.9% were intermediate-risk and 14.9% were high-risk. Per 100 presentations of acute scrotum, there was a missed torsion rate of 1.6/100, ultrasound rate of 19.9/100 and negative exploration rate of 2.5/100.


TWIST is an effective tool for suspected testicular torsion and is appropriate for widespread adoption. The Barbosa scoring system is reliable and reduces reliance on scrotal ultrasound.

Abbreviations and Acronyms


area under the receiver operating curve


meta-analysis of diagnostic test accuracy


testicular torsion


Testicular Workup for Ischemia and Suspected Torsion

Testicular torsion (TT) occurs when the testis twists along the spermatic cord, compromising blood supply.1 It is a urological emergency that requires prompt intervention. The duration of torsion is a key factor in testicular salvage rate as surgical detorsion is most successful within 6–8 hours of symptom onset.2 Time in the emergency department is especially important, as patients often present several hours after the event.3

Classically, TT presents with unilateral, sudden-onset, acute scrotal pain that can be associated with scrotal swelling, nausea and vomiting.4 Diagnosis is often challenging as the clinical features can overlap with other causes of acute scrotum, including scrotal trauma, epididymo-orchitis, torsion of the testicular appendage, strangulated hernia and torsion-detorsion syndrome.4,5

The indications for scrotal imaging vary between institutions. Scrotal Doppler ultrasound is relatively accurate for diagnosing TT. Previous meta-analyses have shown a sensitivity and specificity of 0.86 and 0.95,6 respectively, increasing to 0.92 and 0.997 with a positive “whirlpool sign” (spiral-like appearance of the twisted spermatic cord). However, awaiting and obtaining imaging constitutes a time delay, which may prolong ischemic time and reduce testicular viability.2 Furthermore, the accuracy of ultrasound is operator dependent and results can vary depending on sonographer proficiency.6 As a result, clinical scoring tools may assist in decision making and promote judicious use of imaging, while also not delaying surgery in those with a high likelihood of torsion.

Barbosa et al introduced the “Testicular Workup for Ischemia and Suspected Torsion” (TWIST) score for acute scrotal presentations in 2013.8 This was a 7-point tool comprised of testicular swelling (2 points), hard testis (2), high-riding testis (1), absent cremasteric reflex (1) and nausea/vomiting (1). Patients were stratified into 3 groups: low (0–2 points), intermediate (3–4) and high risk (5–7), with recommendations for rule out, ultrasound and surgical exploration, respectively. This was subsequently disputed by Sheth et al,9 who suggested that stratifying by 0, 1–5 and 6–7 points was more appropriate.

Thus far, TWIST has only been studied in single cohorts with heterogeneous risk stratification. Pooling raw scores from individual studies would allow us to test the diagnostic accuracy of each risk score. We conducted a systematic review and meta-analysis to test the diagnostic utility of TWIST and identify the ideal system for risk stratification.


This systematic review and meta-analysis followed PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses)-DTA (extension for Diagnostic Test Accuracy) guidelines and was prospectively registered with PROSPERO (CRD42020203134).10

Search Strategy and Eligibility Criteria

Literature search was performed in August 2020 and updated in August 2021. Databases including PubMed®, Embase®, Scopus®, the Cochrane Library and Web of Science™ were searched. Gray literature was explored using Google Scholar and ResearchGate. The following keywords were used: “torsion,” “testis,” “testicular torsion,” “acute scrotum,” “TWIST,” “score” and “testicular workup for ischemia and suspected torsion.” The search was limited to human studies with no limitations on publication language. Conference abstracts were included.

Two reviewers (KQ and LQ) performed independent literature searches and screening of studies. Comparative studies, prospective and retrospective cohort studies were included. Review articles and case reports were excluded. References within and citations of all included articles were screened for completeness. Full texts were evaluated against predetermined criteria: 1) original research of 2) adult and pediatric males presenting with acute scrotum who were 3) evaluated using TWIST and 4) received a diagnosis of positive or negative TT. Disagreements over study eligibility were resolved by discussion between the reviewers.

Data Extraction

Data including study design, population demographics, incidence of TT and distribution of TWIST scores were extracted. The proportion of positive and negative TT was then tabulated for each score level. Corresponding authors were contacted to retrieve elements missing from their manuscripts.

Quality Assessment

Two reviewers (KQ and LQ) independently evaluated study quality using the QUADAS-2 tool for diagnostic studies (University of Bristol, Bristol, UK).11 Disagreements between reviewers were resolved by discussion and consensus. The quality of studies was rated as good, fair or poor. Studies rated poor were excluded from subsequent analyses.

Outcome Measures

The primary outcomes of interest were 1) the pooled sensitivity and specificity of individual TWIST scores, and 2) the risk stratification system which achieved the highest overall accuracy. To reduce expected heterogeneity, the primary analysis was performed only for studies of prospective design with similar patient population and TWIST assessor roles.

Statistical Analysis

Meta-analysis of diagnostic test accuracy (MADTA) was performed for prospective studies based on a multilevel random effects model described by Steinhauser et al.12 This was conducted on raw scoring data for each discrete TWIST score and for each risk stratification system. Choice of linear mixed models for intercepts and slopes was based on joint criteria of identifying the smallest restricted maximum likelihoods and the simplest model available, minimizing total parameters for estimation. The model utilized inverse variance weighting and the logistic distribution assumption was also applied. Statistical analysis was performed using the “diagmeta” package13 in R v4.0.5 (R Foundation, Vienna, Austria) and GraphPad Prism v9.1 (GraphPad Software, San Diego, California).

After results from prospective studies were modeled, pooled sensitivity and specificity were estimated from model parameters for each TWIST score level. A similar process for meta-analysis was utilized for determining the optimal system for risk stratification for the included prospective studies. Comparative meta-analysis models were generated for each of the scoring systems proposed by Barbosa et al (0–2, 3–4, 5–7)8 and Sheth et al (0, 1–5, 6–7),9 as well as 3 alternatives (0–1, 2–5, 6–7; 0–1, 2–4, 5–7; 0, 1–4, 5–7). The pooled sensitivity and specificity for each individual risk group and area under the receiver operating curve (AUC) for each scoring system were reported.

Sensitivity Analyses

Sensitivity analyses were performed by including retrospective studies in the MADTA. Additional analyses examined the inclusion of studies that differed by TWIST examiner (physician or emergency technician) and patient population (boys or all males).


In total, 160 abstracts were identified and 104 were screened (Fig. 1). Of these, 87 were excluded as they did not study TWIST. Full texts were reviewed for 17 articles. Four were removed as they did not report diagnostic outcomes,14 evaluated the same cohort as another included study,15 conducted a narrative review16 and used TWIST in an unsuitable cohort of middle-aged males.17 The remaining 13 articles met the inclusion criteria and were included in the final analysis.8,9,18–28

Figure 1.PRISMA flow diagram.

Figure 1. PRISMA flow diagram.

Characteristics of included studies are summarized in Table 1. There were 7 prospective studies,9,18,20,21,23,25,26 4 retrospective studies,19,24,27,28 and 1 study with prospective and retrospective phases (treated as separate studies in meta-analysis).8 Five corresponding authors were contacted for raw data;19,21,22,27,28 this was successful in 4 instances.19,21,27,28 Studies were published between 2013 and 2020, and documented 2,400 patients in 9 countries. There were 527 cases of TT with an unadjusted torsion rate of 22%. Ten studies were restricted to boys,8,9,19–23,26–28 1 study focused on males >16 years18 and 2 studies included males of any age group.24,25 TWIST was administered by physicians including urologists,8,28 general surgeons,18 pediatric surgeons,22,26 emergency physicians20,27 and surgical residents.23,25 One study measured its utility among emergency medical technicians.9

Table 1. Study and patient demographics of included studies

Study Study Design Country Enrollment Time Population No. Pts No. TT TT Rate (%) Age (yrs) Examiner(s) Risk Groups
Barbosa et al, 20138 Prospective phase Boston, U.S. January 2009–January 2012 Males 3 mos–18 yrs 338 51 15.1 Mean 11.6, range 0.3–18 Urologists 0–2, 3–4, 5–7
Barbosa et al, 20138 Retrospective phase Boston, U.S. January 2007–January 2011 Males 3 mos–18 yrs 116 37 31.9 Mean 11.6, range 0.3–18 Urologists 0–2, 3–4, 5–7
Sheth et al, 20169 Prospective Dallas, U.S. March 2013–March 2015 Males 1 mo–21 yrs 128 44 34.4 Mean 11.3, range 0.1–21 Emergency medical technicians 0, 1–5, 6–7
Frohlich et al, 201720 Prospective Boston, U.S. January 2013–December 2015 Males 3 mos–18 yrs 258 19 7.4 Mean 9.8, SD 0.3, range 0.3–18 Emergency physicians 0–2, 3–4, 5–7
Manohar et al, 201824 Retrospective Bengaluru, India October 2007–February 2016 All males 118 45 38.1 Mean 16.6, range 8–28 Not reported 0–2, 3–4, 5–7
Moosajee, 201725 Prospective Nairobi, Kenya October 2016–June 2017 All males 60 44 73.3 Mean 23, SD 9 Surgical residents 0–2, 3–4, 5–7
Jabbar et al, 201821 Prospective Baghdad, Iraq June 2015–October 2016 Males 1 mo–13 yrs 53 18 34.0 Range 0.1–13 Not reported 0–2, 3–4, 5–7
Bašković et al, 201919 Retrospective Zagreb, Croatia October 2013–September 2018 Males 0–18 yrs 280 56 20.0 Mean 14.7, range 2–18 Not reported 0–2, 3–4, 5–7
Barco-Castillo et al, 202028 Retrospective Bogota, Colombia January 2018–December 2018 Males 0–18 yrs 33 5 15.2 Median 13, IQR 10–15 Urologists 0–2, 3–4, 5–7
Barbosa et al, 202018 Prospective São Paulo, Brazil June 2018–February 2020 Males >16 yrs 68 34 50.0 Median 24.9, IQR 19–42.7 General surgeons 0–2, 3–4, 5–7
Klinke et al, 202022 Retrospective Hamburg, Germany January 2013–March 2019 Males 1–17 yrs 460 48 10.4 Mean 9.4, SD 4.1 Pediatric surgeons 0–2, 3–4, 5–7
Roberts et al, 202027 Retrospective Mobile, U.S. December 2017–June 2019 Males 3 mos–18 yrs 77 15 19.5 Mean 9.2, SD 5.2 Emergency physicians 0–2, 3–4, 5–7
Pan, 202026 Prospective Jabalpur, India May 2017–April 2019 Males 0–18 yrs 96 68 70.8 Mean 10.1, SD 3.8 Principal investigator (pediatric surgeon) 0–2, 3–4, 5–7
Lim et al, 202023 Prospective Singapore January 2016–December 2018 Males 0–16 yrs 315 43 13.7 Mean 10.1, SD 3.7 Surgical residents 0–2, 3–4, 5–7
Totals 2,400 527 22.0

A summary of TWIST score distribution among prospective pediatric studies is shown in Table 2. Supplementary Table 1 ( displays a summary of all studies.

Table 2. TWIST score distribution and incidence of TT

Study Score
0 1 2 3 4 5 6 7
Barbosa et al, 2013(prospective phase)8 0 95 0 50 0 89 3 40 9 13 17 0 9 0 13 0
Frohlich et al, 201720 1 72 1 37 3 67 2 36 3 20 3 4 2 3 4 0
Jabbar et al, 201821* 0 0 0 0 0 8 0 5 0 16 2 6 6 0 10 0
Pan, 202026 0 0 0 11 0 10 6 5 7 2 15 0 18 0 22 0
Lim et al, 202023 1 115 1 12 8 110 6 12 11 15 7 7 8 1 1 0
Totals (%) 2 282 2 110 11 284 17 98 30 66 44 17 43 4 50 0
% Pts with score 26.8 10.6 27.8 10.8 9.1 5.8 4.4 4.7

Prospective, pediatric studies with physician examiners only (5 studies, 1,060 patients, 199 torsions).

Authors were contacted for raw data.

Quality Assessment

Quality assessment of prospective studies was conducted using QUADAS-2 (Table 3). All studies were deemed to have low risk of bias and were included in subsequent meta-analyses. Minor applicability concerns were identified in patient selection of 2 studies, both of which included adult patients. One study included males of all ages25 and another was restricted to adults.18

Table 3. Quality assessment of prospective studies using QUADAS-2 criteria

Study Pt Group Reference Standard Risk of Bias Applicability Concerns
Pt Selection Index Test Reference Standard Flow and Timing Pt Selection Index Test Reference Standard
Barbosa et al, 2013 (prospective phase)8 Males 3 mos–18 yrs Scrotal ultrasound Low Low Low Low Low Low Low
Sheth et al, 20169 Males 1 mo–21 yrs Scrotal ultrasound Low Low Low Low Low Low Low
Frohlich et al, 201720 Males 3 mos–18 yrs Scrotal ultrasound Low Low Low Low Low Low Low
Moosajee, 201725 All males Scrotal ultrasound Low Low Unclear Unclear High Low Low
Jabbar et al, 201821 Males 1 mo–18 yrs Scrotal ultrasound Low Low High Low Low Low Low
Barbosa et al, 202018 Males >16 yrs Scrotal ultrasound Low Low Low Low High Low Low
Pan, 202026 Males 0–18 yrs Scrotal ultrasound Low Low Unclear Unclear Low Low Low
Lim et al, 202023 Males 0–16 yrs Scrotal ultrasound Low Low Low Low Low Low Low

MADTA of Individual TWIST Scores

Five studies with 1,060 patients and 199 cases of TT were included in the primary MADTA.8,20,21,23,26 These were all prospective in design, with pediatric patients and physician-reported TWIST scores (not emergency technicians).

Pooled sensitivity and specificity values were estimated for each TWIST cutoff score after a different-intercept common-slope model was fit to the observed data (Table 4). A TWIST score of 0 corresponded to a pooled sensitivity of 0.993 and specificity of 0.069 in the prediction of TT. A score of 7 corresponded to a pooled sensitivity of 0.276 and specificity of 0.994. The model AUC was 0.912 (95% CI: 0.847, 0.951).

Table 4. Pooled sensitivity and specificity for each individual TWIST score

Score Sensitivity (95% CI) Specificity (95% CI)
0 0.993 (0.973, 0.998) 0.069 (0.029, 0.155)
1 0.984 (0.946, 0.995) 0.184 (0.083, 0.358)
2 0.963 (0.892, 0.988) 0.407 (0.214, 0.634)
3 0.917 (0.795, 0.969) 0.676 (0.445, 0.845)
4 0.827 (0.640, 0.928) 0.864 (0.699, 0.946)
5 0.673 (0.444, 0.841) 0.951 (0.869. 0.983)
6 0.469 (0.259, 0.691) 0.983 (0.950, 0.995)
7 0.276 (0.129, 0.494) 0.994 (0.981, 0.998)

Analysis of prospective, pediatric studies with physician examiners only (5 studies, 1,060 patients, 199 torsions).8,20,21,23,26

MADTA of Risk Stratification Scoring Systems

Using the 5 prospective studies from the primary MADTA, risk stratification systems were analyzed to assess for the system with greatest diagnostic test accuracy by AUC (Fig. 2 and Table 5).

Figure 2.Comparison of area under the curve for TWIST score risk stratification systems. Prospective studies only. AUC 95% CIs are shown above each plot. The CIs for all 5 risk stratification systems overlapped between 0.87 and 0.90.

Figure 2. Comparison of area under the curve for TWIST score risk stratification systems. Prospective studies only. AUC 95% CIs are shown above each plot. The CIs for all 5 risk stratification systems overlapped between 0.87 and 0.90.

Table 5. Comparison of TWIST score risk stratification systems

Risk Stratification Systems Low Risk Intermediate Risk High Risk AUC (95% CI)
Sensitivity (95% CI),Specificity (95% CI) Pts at Low Risk (%) Missed Torsion Rate (per 100 presentations) Sensitivity (95% CI), Specificity (95% CI) Pts at Intermediate Risk (%) Ultrasound Rate (per 100 presentations) Sensitivity (95% CI),Specificity (95% CI) Pts at High Risk (%) Neg Exploration Rate (per 100 presentations)
0–2, 3–4, 5–7 (Barbosa et al, 2013) 0.984 (0.881, 0.998), 0.107 (0.033, 0.292) 65.2 1.6 0.922 (0.737, 0.980), 0.682 (0.411, 0.854) 19.9 19.9 0.696 (0.401, 0.887), 0.975 (0.904, 0.994) 14.9 2.5 0.924 (0.865, 0.956)
0, 1–5, 6–7 (Sheth et al, 2016) 0.994 (0.977, 0.998), 0.003 (0.001, 0.007) 26.8 0.6 0.928 (0.843, 0.968), 0.299 (0.203, 0.417) 64.0 64.0 0.499 (0.252, 0.747), 0.984 (0.947, 0.996) 9.2 1.6 0.863 (0.809, 0.896)
0–1, 2–5, 6–7 (Qin #1) 0.993 (0.976, 0.998), 0.007 (0.003, 0.016) 37.4 0.7 0.924 (0.832, 0.968), 0.450 (0,364, 0.539) 53.4 53.4 0.496 (0.248, 0.747), 0.990 (0.973, 0.997) 9.2 1.0 0.889 (0.858, 0.910)
0–1, 2–4, 5–7 (Qin #2) 0.992 (0.966, 0.998), 0.017 (0.009, 0.033) 37.4 0.8 0.947 (0.852, 0.982), 0.408 (0.294, 0.534) 52.3 52.3 0.715 (0.409, 0.901), 0.964 (0.900, 0.988) 14.9 3.6 0.911 (0.870, 0.934)
0, 1–4, 5–7 (Qin #3) 0.993 (0.970, 0.998), 0.009 (0.004, 0.017) 26.8 0.7 0.949 (0.863, 0.982), 0.289 (0.191, 0.411) 58.3 58.3 0.717 (0.424, 0.897), 0.950 (0.874, 0.981) 14.9 5.0 0.895 (0.850, 0.922)

Analysis of prospective, pediatric studies with physician examiners only (5 studies, 1,060 patients, 199 torsions).8,20,21,23,26

Low-risk patients are ruled out with no further intervention. High-risk patients are ruled in and proceed to surgical exploration. Intermediate-risk patients require further workup with ultrasound. As all intermediate-risk patients receive ultrasound, “Pts at Intermediate Risk (%)” and “Ultrasound Rate (per 100 presentations)” are equivalent.

Barbosa et al described a risk stratification system of 0–2, 3–4, 5–7.8 Meta-analysis of prospective studies only generated an AUC of 0.924 (0.865, 0.956). Sensitivity in the low-risk group was 0.984; per 100 presentations of acute scrotum, 1.6 cases of TT would be missed. Specificity in the high-risk group was 0.975; per 100 presentations of acute scrotum, there would be 2.5 instances of negative exploration.

Sheth et al proposed an alternative risk stratification scoring system of 0, 1–5, 6–7.9 Meta-analysis generated an ROC AUC of 0.863 (0.809, 0.896). Sensitivity in the low-risk group was 0.994; per 100 presentations of acute scrotum, 0.6 cases of TT would be missed. Specificity in the high-risk group was 0.984; per 100 presentations of acute scrotum, there would be 1.6 instances of negative exploration.

Additional risk stratification scoring systems were proposed for testing alongside Barbosa8 and Sheth9 et al. Risk stratification of 0–1, 2–5, 6–7 generated an AUC of 0.889 (0.858, 0.910), 0–1, 2–4, 5–7 generated an AUC of 0.911 (0.870, 0.934) and 0, 1–4, 5–7 generated an AUC of 0.895 (0.850, 0.922).

Sensitivity Analysis

Sensitivity analyses were performed to test the robustness of the results by varying study inclusion (supplementary Table 2,

MADTA of prospective and retrospective studies excluding nonphysician examiners and adult patients was performed, consisting of 9 studies.8,19–21,23,26–28 The summary ROC AUC was 0.912 (0.867, 0.942). A TWIST score of 1 corresponded to a sensitivity of 0.981 and a specificity of 0.241, while a score of 7 corresponded to a sensitivity of 0.204 and specificity of 0.996.

MADTA of prospective studies including nonphysician examiners was performed, consisting of 6 studies.8,20,21,23,25,26 The AUC was 0.921 (0.870, 0.954). A score of 1 corresponded to a sensitivity of 0.987 and specificity of 0.219, while a score of 7 corresponded to a sensitivity of 0.286 and specificity of 0.993.

MADTA of prospective studies including adult patients was performed, consisting of 7 studies.8,18,20,21,23,25,26 The AUC was 0.918 (0.869, 0.951). A score of 1 corresponded to a sensitivity of 0.992 and specificity of 0.150, while a score of 7 corresponded to a sensitivity of 0.275 and specificity of 0.992.

For all sensitivity analyses, comparison of risk stratification systems consistently demonstrated that the Barbosa stratification produced the highest AUC; however, CIs overlapped across systems.


Acute scrotum is a common emergency presentation to surgical and emergency units. Diagnosis of TT is challenging and time critical. Many presentations are diagnostically equivocal and may result in unnecessary investigation, potentially delaying surgical management. Recent studies have indicated delays of 48–119 minutes when ultrasound was performed.29–31 Furthermore, overreliance on imaging stresses hospital radiology services where ultrasonography often remains a limited resource. TWIST aims to streamline this decision-making process and allow for rapid triaging based on physical findings. In this meta-analysis, we identified 13 studies using TWIST and pooled raw data to determine the ideal risk stratification system.

Testing of 5 risk stratification systems was performed, including those proposed by Barbosa et al (0–2, 3–4, 5–7),8 Sheth et al (0, 1–5, 6–7)9 and 3 alternatives (0–1, 2–5, 6–7; 0–1, 2–4, 5–7; 0, 1–4, 5–7). While all systems performed favorably with AUCs above 0.8, 2 systems achieved an AUC above 0.9: Barbosa’s original at 0.924 and 0–1, 2–4, 5–7 at 0.911. There was, however, notable overlap in CIs across systems (Fig. 2). Specifically, the CIs of all 5 systems spanned 0.870 to 0.896, suggesting that no individual system was clearly superior, and different systems could be adopted according to institutional priorities. For example, Sheth’s proposal yielded an AUC of 0.834, the lowest of the 5 tested;9 however, this system achieved its desired effect of maximizing sensitivity in the low-risk group (0.995), to not miss a critical diagnosis.

This review suggests that the Barbosa stratification demonstrates the greatest overall reliability, as shown by the ROC AUC (Fig. 2). However, Sheth exhibits both reduced low-risk missed torsion rate (0.6 vs 1.6 per 100 presentations) and high-risk negative surgical exploration rate (1.6 vs 2.5 per 100 presentations; Table 5). This difference is owing to the size of the intermediate-risk group (3–4 vs 2–5). Using Barbosa, in our prospective cohort of 1,060 patients, 65.2% of patients would be ruled out, 19.9% of patients would receive ultrasound and 14.9% would undergo emergent exploration. Using Sheth, 26.8% would be ruled out, 64.0% would receive ultrasound and 9.2% would undergo emergent exploration. Therefore, the Sheth stratification would necessitate 3 times the number of ultrasounds as Barbosa, and is potentially contrary to the goal of TWIST as a tool to expedite decision making and reduce reliance on ultrasound. However, Sheth may better suit institutions/surgeons who have fast and convenient access to ultrasound or seek to minimize missed torsion rate.

“Perfect” sensitivity could not be achieved in the low-risk group as there were 17 cases of TT among 785 presentations. This raises medicolegal concerns regarding the potential for missed torsion. Orchiectomy secondary to misdiagnosed torsion is an active area for litigation and one of the most common malpractice suits among young males.32 It is unlikely that any single test or clinician can achieve 100% sensitivity, as exceptional cases are unavoidable. There were 2 cases of TT with scores of 0,20,23 suggesting an absence of physical signs apart from scrotal pain. TWIST sensitivity (0.984), however, is still notably higher than that achieved by scrotal ultrasound (0.86).6 Furthermore, TWIST sensitivity may increase as clinicians become more experienced. For now, individual clinicians will need to decide whether examination using TWIST criteria alone is sufficient to exclude low-risk patients.

A major advantage of TWIST is its speed, simplicity and the relatively objective nature of its components. With only 5 variables to measure, this score represents a simple examination that junior clinicians may perform. Once well established, TWIST may allow for clear communication of risk between emergency clinicians and surgical units. Furthermore, implementation of TWIST into practice is uncomplicated. A recent Australian study showed that the simple placement of educational posters around the emergency department increased TWIST documentation from 49% to 63% over 3 months.14

Our review should be considered in the context of the following limitations. First, all studies were observational. This could be related to difficulty in obtaining ethics approval for experimental studies considering the unacceptable risk of missed torsion. Second, while there were multiple pediatric studies in our sample, there were only 2 prospective studies including adults.18,25 As a result, we are unable to confirm if TWIST was equally accurate in adults and children. Similarly, the existing literature has not validated TWIST in neonatal/infantile TT, where examination findings differ. Future reviews should consider separate analyses of neonatal/infant and adult males when studies become available. Third, there is a paucity of data concerning TWIST performance for alternative causes of acute scrotum. Torsion-detorsion syndrome is of particular interest as patients may present repeatedly with inconsistent TWIST scores, potentially putting them at risk for missed torsion. Finally, TWIST does not incorporate the quality and severity of pain or the duration of symptoms into its algorithm, a limitation also highlighted by Sheth et al.9 These factors might influence the accuracy of the score, and assessing their impact could be a direction for further research. Future studies may also examine alternative allocation of points to different components of the TWIST score.


This study suggests TWIST is a useful decision-making tool for acute scrotal presentations. The original Barbosa stratification demonstrates appropriate diagnostic accuracy while reducing reliance on ultrasound. Low-risk patients should have TT ruled out, while high-risk patients should proceed to surgery. Patients at intermediate risk will benefit from scrotal ultrasound.


  • 1. : Acute scrotal pain. Aust Fam Physician 2013; 42: 790. Google Scholar
  • 2. : A systematic review of testicle survival time after a torsion event. Pediatr Emerg Care 2019; 35: 821. Google Scholar
  • 3. : Factors influencing rate of testicular salvage in acute testicular torsion at a tertiary pediatric center. West J Emerg Med 2015; 16: 190. Google Scholar
  • 4. : Clinical and sonographic features predict testicular torsion in children: a prospective study. BJU Int 2013; 112: 1201. Google Scholar
  • 5. : Testicular torsion–detorsion and potential therapeutic treatments: a possible role for ischemic postconditioning. Int J Urol 2016; 23: 454. Google Scholar
  • 6. : The role of ultrasound imaging in adult patients with testicular torsion: a systematic review and meta-analysis. J Med Ultrason2001; 46: 325. Google Scholar
  • 7. : The ultrasonographic "whirlpool sign" in testicular torsion: valuable tool or waste of valuable time? A systematic review and meta-analysis. Emerg Radiol 2018; 25: 281. Google Scholar
  • 8. : Development and initial validation of a scoring system to diagnose testicular torsion in children. J Urol 2013; 189: 1859. LinkGoogle Scholar
  • 9. : Diagnosing testicular torsion before urological consultation and imaging: validation of the TWIST score. J Urol 2016; 195: 1870. LinkGoogle Scholar
  • 10. : Preferred reporting items for systematic review and meta-analysis of diagnostic test accuracy studies (PRISMA-DTA): explanation, elaboration, and checklist. BMI 2020; 370: m2632. Google Scholar
  • 11. : QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med 2011; 155: 529. Google Scholar
  • 12. : Modelling multiple thresholds in meta-analysis of diagnostic test accuracy studies. BMC Med Res Methodol 2016; 16: 97. Google Scholar
  • 13. diagmeta: Meta-analysis of diagnostic accuracy studies with several cutpoints. 2021. Available at Accessed October 1, 2022. Google Scholar
  • 14. : Improving testicular examinations on paediatric patients in the emergency department: a quality improvement study to improve early diagnosis of testicular torsion. Asian J Urol 2021; Google Scholar
  • 15. : MP40-05 Does near infrared spectroscopy improve TWIST score in the diagnosis of testicular torsion?J Urol, suppl., 2015; 193: e464. LinkGoogle Scholar
  • 16. : BET 2: TWIST score in cases of suspected paediatric testicular torsion. Emerg Med J 2018; 35: 574. Google Scholar
  • 17. : Efficacy of twist score system for different diagnosis in acute scrotum. Int J Urol 2018; 25: 374. Google Scholar
  • 18. : Validation of the TWIST score for testicular torsion in adults. Int Urol Nephrol 2021; 53: 7. Google Scholar
  • 19. : Validation of a TWIST score in diagnosis of testicular torsion—single-center experience. Klin Padiatr 2019; 231: 217. Google Scholar
  • 20. : Prospective validation of clinical score for males presenting with an acute scrotum. Acad Emerg Med 2017; 24: 1474. Google Scholar
  • 21. : Evaluation of TWIST score in predicting testicular torsion in children. Prensa Med Argent 2018; 104: 2. Google Scholar
  • 22. : The BAL-score almost perfectly predicts testicular torsion in children: a two-center cohort study. Front Pediatr 2020; 8: 601892. Google Scholar
  • 23. : Revisiting testicular torsion scores in an Asian healthcare system. J Pediatr Urol 2020; 16: 821.e821. Google Scholar
  • 24. : Evaluation of testicular workup for ischemia and suspected torsion score in patients presenting with acute scrotum. Urol Ann 2018; 10: 20. Google Scholar
  • 25. : A validation of the “twist” score in diagnosis of acute testicular torsion in the acute scrotum in Kenyatta National Hospital. Univ Nairobi Res Archive 2017; Google Scholar
  • 26. : Validation of the testicular workup for ischemia and suspected torsion (TWIST) score in the diagnosis of testicular torsion in children with acute scrotum. Indian Pediatr 2020; 57: 926. Google Scholar
  • 27. : Testicular workup for ischemia and suspected torsion in pediatric patients and resource utilization. J Surg Res 2020; 257: 406. Google Scholar
  • 28. : Performance of the TWIST score in patients with testicular torsion that present to the emergency department. RUC 2020; 29: 225. Google Scholar
  • 29. : Identifying systems delays in assessment, diagnosis, and operative management for testicular torsion in a single-payer health-care system. J Pediatr Urol 2019; 15: 251.e251. Google Scholar
  • 30. : The use of Doppler ultrasound for suspected testicular torsion: lessons learned from a 15-year multicentre retrospective study of 2922 patients. Eur Urol Focus 2022; 8: 105. Google Scholar
  • 31. : Ultrasound use in suspected testicular torsion: an association with delay to theatre and increased intraoperative finding of non-viable testicle. N Z Med J 2021; 134: 50. Google Scholar
  • 32. : Malpractice litigation and testicular torsion: a legal database review. J Emerg Med 2015; 49: 849. Google Scholar

Study was prospectively registered with PROSPERO (

See Editorial on page 6.