Trial methodology guide

Primary vs Secondary vs Exploratory Endpoints

Why endpoint hierarchy exists, how multiple-comparison inflation works, and what the standard outcome measures in peptide metabolic trials actually measure.

Peptides Research Hub Editorial Team Published May 22, 2026 Last reviewed May 22, 2026 10 min read

The short version

Every clinical trial designates one primary endpoint before enrollment begins. That single outcome determines whether the trial succeeds or fails for regulatory purposes. Secondary and exploratory endpoints are measured alongside it but carry less statistical weight. The hierarchy exists for a concrete mathematical reason: testing many outcomes on the same dataset inflates the probability of a false positive result. Knowing where an outcome sits in the hierarchy tells you how much confidence to place in it.

The endpoint hierarchy

Endpoints in a clinical trial are arranged in a fixed hierarchy specified in the protocol and statistical analysis plan before any data are collected:

Primary endpoint. The single prespecified outcome on which the trial is powered and against which the primary hypothesis is tested. Statistical significance is evaluated at the full alpha level (conventionally two-sided p less than 0.05). A failed primary endpoint means the trial did not demonstrate efficacy for regulatory purposes, regardless of results on any other outcome.^[1]
Key secondary endpoints. A prespecified, ordered list of secondary outcomes tested in strict sequence (sometimes called a hierarchical testing procedure). Once the primary endpoint passes, the first key secondary is tested at the full alpha level. If it passes, the next is tested, and so on. Testing stops (for formal inferential purposes) at the first failure in the sequence; all outcomes below the failure point are reported as descriptive only.
Secondary endpoints. Measured outcomes not in the key secondary hierarchy. Results are reported but are explicitly descriptive; no formal statistical inference is drawn.
Exploratory endpoints. Hypothesis-generating measures that may include mechanistic biomarkers, sub-group analyses, or novel outcomes of interest. Used to generate ideas for future trials, not to support label claims.
Post-hoc analyses. Analyses not specified in the original protocol, conducted after data unblinding. These are the lowest-confidence outputs of a trial; they are subject to data-dredging risk and should be treated as hypothesis-generating only.

Alpha inflation and multiple comparisons

When a statistical test is conducted at the alpha = 0.05 level, there is a 5% probability of a false positive (Type I error) under the null hypothesis. If you conduct 20 independent tests at this threshold, you expect one false positive by chance alone. Clinical trials measure dozens of outcomes, so without correction, a statistically significant secondary finding has a much higher probability of being a false positive than the p value alone suggests.

The hierarchical testing procedure described above is one approach to controlling the family-wise error rate (FWER): the probability of making at least one false positive decision across all the tests in the trial. Other approaches include the Bonferroni correction (divide alpha by the number of tests) and the Hochberg or Holm step-up / step-down procedures. The choice of approach is prespecified in the statistical analysis plan.^[1]^[2]

The practical consequence for readers: a secondary endpoint result with p = 0.03 in a trial where the primary endpoint also succeeded carries reasonable inferential weight if it was the first item in a hierarchical key-secondary list. The same p = 0.03 result on an outcome that appears 10th in a non-hierarchical list of secondary endpoints, or in a post-hoc subgroup, carries very little inferential weight.

Common endpoint types in peptide metabolic trials

HbA1c reduction

Glycated haemoglobin (HbA1c) reflects average plasma glucose over the preceding 8-12 weeks. A 1 percentage point reduction in HbA1c is associated with a meaningful decrease in microvascular complications: approximately a 14% reduction in myocardial infarction risk, 21% reduction in diabetes-related mortality, and 37% reduction in microvascular events in the landmark UKPDS analysis. HbA1c is the standard primary endpoint for type 2 diabetes drug approval because it is reliably measured, highly reproducible, and causally linked to patient-relevant outcomes through decades of epidemiological data.^[3]

Limitations: HbA1c can be unreliable in haemoglobinopathies, haemolytic anaemia, or after recent blood transfusion. It reflects mean glucose but not glycemic variability or hypoglycemia burden.

Percent body weight loss

For obesity trials, the FDA requires both a mean weight reduction greater than 5% compared with placebo and a categorical responder analysis: a significantly higher proportion of patients in the active arm must achieve at least 5% weight loss compared with placebo.^[4] The 5% threshold was established as a clinically meaningful marker because weight losses at this magnitude reliably improve cardiometabolic risk factors (blood pressure, triglycerides, HDL cholesterol, insulin sensitivity) in obese populations.

Higher thresholds (10%, 15%, 20%) are used as key secondary responder endpoints in obesity trials to characterize the distribution of response. In SURMOUNT-1, tirzepatide at 15 mg produced mean weight loss of 20.9% at 72 weeks, with 57% of participants achieving at least 20% weight loss.

MACE: major adverse cardiovascular events

MACE is a composite endpoint used in cardiovascular outcomes trials. The standard three-component MACE definition includes: cardiovascular death, non-fatal myocardial infarction, and non-fatal stroke. Some trials use a four-component MACE that adds hospitalization for unstable angina. The specific components are always defined in the protocol; readers should check which version was used before comparing results across trials.

MACE in T2DM cardiovascular outcomes trials is analyzed as a time-to-first-event outcome using a Cox proportional hazards model, producing a hazard ratio (HR) with a 95% confidence interval. The FDA's 2008 CV safety guidance requires that the upper bound of the two-sided 95% CI for the HR be less than 1.30 (non-inferiority criterion) for T2DM drugs; trials aiming for superiority target an upper bound below 1.0.^[3]^[5]

LEADER showed that liraglutide produced an HR of 0.87 (95% CI 0.78-0.97) for three-component MACE, demonstrating both non-inferiority and superiority to placebo.^[5] SUSTAIN-6 showed semaglutide HR 0.74 (95% CI 0.58-0.95) for three-component MACE.^[6]

All-cause mortality

All-cause mortality is the hardest possible endpoint in clinical medicine: it cannot be misclassified, is universally meaningful, and is not subject to adjudication disputes. Its limitation in drug trials is statistical: mortality events are relatively rare in the trial populations used for metabolic drug development (T2DM or obesity populations with 3-5 year follow-up), requiring extremely large trials or long follow-up to achieve adequate statistical power. MACE is used as a composite surrogate because it provides more events per unit time.

Composite endpoints: advantages and pitfalls

Composite endpoints combine multiple events into a single outcome, usually defined as time to first occurrence of any component. Advantages:

More events per patient-year of follow-up, reducing required sample size and trial duration.
Captures the overall disease burden rather than a single manifestation.
Avoids the problem of non-fatal events "competing" with mortality in statistical analysis.

Pitfalls:

Heterogeneity of components. A composite that includes CV death and hospitalization for unstable angina conflates outcomes of very different clinical severity. A treatment that primarily reduces hospitalizations but not CV death will produce a positive composite result that may overstate clinical meaningfulness.
Component-level interpretation. A significant composite result should always be accompanied by component-level data. Readers should check whether all components trend in the same direction or whether one dominant component is driving the composite while others move in the opposite direction.
Adjudication. Composite endpoints require independent adjudication committees to classify events consistently. Differential adjudication across treatment arms is a potential source of bias in open-label trials.

Hard endpoints versus surrogate endpoints

A hard endpoint is a directly patient-relevant outcome: death, myocardial infarction, stroke, renal failure requiring dialysis, blindness. A surrogate endpoint is a biomarker or physiological variable that is on the causal pathway to a hard endpoint and that changes predictably in response to effective treatment.

Surrogate endpoints accelerate drug approval because they can be measured over shorter trials in smaller populations than hard endpoints require. The risk is surrogate-endpoint failure: a drug may favourably change the surrogate without changing the hard outcome, or may change the surrogate while worsening a different hard outcome. The 2008 FDA guidance on cardiovascular risk in T2DM drugs was prompted by evidence that rosiglitazone improved HbA1c while increasing cardiovascular events, illustrating that a validated surrogate for one outcome class does not necessarily predict safety across all outcome classes.^[3]

When reading a peptide trial, the most useful question to ask is: "Is this primary endpoint a hard outcome, a validated surrogate, or a less-established surrogate?" HbA1c in T2DM and percent weight loss in obesity sit in the validated-surrogate category. Novel biomarkers or imaging endpoints in early-phase trials may be promising but not yet validated as surrogates for patient-relevant outcomes.

Patient-reported outcomes and quality of life

Patient-reported outcomes (PROs) capture the patient's direct perception of their symptoms, function, and quality of life without interpretation by a clinician. In metabolic peptide trials, PROs typically appear as secondary or exploratory endpoints and include:^[7]

Gastrointestinal tolerability scales (e.g., the Nausea Subscale of the Diabetes Treatment Satisfaction Questionnaire, or bespoke GI diaries) used to characterize the dose-dependent nausea and vomiting that are the principal tolerability concern for GLP-1 receptor agonists.
Diabetes distress and treatment satisfaction questionnaires (e.g., DTSQ, PAID) measuring the psychosocial burden of diabetes management.
SF-36 / SF-12, EQ-5D, and condition-specific health-related quality-of-life instruments.
Weight-related quality of life instruments (e.g., IWQOL-Lite) in obesity trials.

For a PRO to support a label claim in the US, it must be developed, validated, and administered according to FDA PRO guidance. PRO instruments must have demonstrated content validity (developed with patient input), psychometric reliability, and a defined minimal clinically important difference (MCID) to interpret what a numeric score change means to patients.

Limitations of the evidence

Regulatory thresholds for specific endpoints (e.g., weight-loss responder criteria, CV non-inferiority margins) are set by agency guidance that is periodically revised; the values cited here reflect guidance current as of the last review date. Clinical meaningfulness thresholds are based on expert consensus and population-level studies; individual patient responses vary.

References

Citations are annotated with an evidence tier reflecting study design and replication. See Methodology for criteria.

1.
International Council for Harmonisation (ICH) · ICH E9 Statistical Principles for Clinical Trials · 1998
Validated
2.
International Council for Harmonisation (ICH) · ICH E9(R1) Addendum on Estimands and Sensitivity Analysis in Clinical Trials · 2019
Validated
3.
U.S. Food and Drug Administration · Guidance for Industry: Diabetes Mellitus, Evaluating Cardiovascular Risk in New Antidiabetic Therapies to Treat Type 2 Diabetes · 2008
Validated
4.
U.S. Food and Drug Administration · Guidance for Industry: Developing Products for Weight Management · 2007
Validated
5.
Marso SP, Daniels GH, Brown-Frandsen K, et al. · Liraglutide and Cardiovascular Outcomes in Type 2 Diabetes (LEADER) · New England Journal of Medicine · 2016
PMID 27295427DOI 10.1056/NEJMoa1603827NCT01179048Validated
6.
Marso SP, Bain SC, Consoli A, et al. · Semaglutide and Cardiovascular Outcomes in Patients with Type 2 Diabetes (SUSTAIN-6) · New England Journal of Medicine · 2016
PMID 27633186DOI 10.1056/NEJMoa1607141NCT01720446Validated
7.
U.S. Food and Drug Administration · Guidance for Industry: Patient-Reported Outcome Measures, Use in Medical Product Development to Support Labeling Claims · 2009
Validated
8.
Friedman LM, Furberg CD, DeMets DL, Reboussin DM, Granger CB · Fundamentals of Clinical Trials, 5th ed. · Springer · 2015
Validated