pex
All resources
Experiments

How to run your first experiment in Apex

From blank canvas to shipped A/B test in 15 minutes — the belief, the variants, the stats, the decision.

Apr 2, 20268 min read

Most teams start an A/B test by picking a change. Change the button color. Change the headline. Change the pricing grid. Then they design a variant, split traffic, wait two weeks, and stare at a graph hoping for a line to go up.

Apex flips that. An experiment starts with a belief — a specific, falsifiable claim about how your users behave. The variants are how you test the belief. The stats engine tells you when you can trust the answer. And the answer updates your belief graph so the next experiment compounds on top of it.

This guide walks you through running your first experiment end to end. By the end you'll have shipped an A/B test, read the results, and promoted a winner — and your belief graph will be one edge richer.

1. Start from a belief

A belief is a subject-claim-confidence triple. It's what your team privately thinks is true about growth, written down so you can argue with it. The belief is the hypothesis; the experiment is the test.

Open the Intelligence tab, click New Belief, and write the sharpest version of the claim you can. Sharp means falsifiable — a specific surface, a specific audience, a specific expected effect.

{
  "subject": "Pricing page — Annual toggle default",
  "claim": "Defaulting the toggle to \"Annual\" increases trial-to-paid conversion by 10%+ for new SMB visitors",
  "confidence": 0.55,
  "evidence": [
    "Competitor teardown — 7 of 10 peers default to annual",
    "Finance hunch: users who pick monthly churn 2.3x faster"
  ],
  "audience": "first_visit AND company_size <= 200",
  "owner": "sarah@acme.co"
}

2. Define the variants

Click Run Experiment on the belief. Apex proposes a variant shape based on the subject — for a pricing page toggle it suggests a web_ab_test with two variants. Adjust the variant definitions to match what you'd actually ship.

{
  "type": "web_ab_test",
  "surface": "/pricing",
  "variants": [
    {
      "id": "control",
      "name": "Monthly default",
      "weight": 0.5,
      "patch": { "pricing.defaultCadence": "monthly" }
    },
    {
      "id": "treatment",
      "name": "Annual default",
      "weight": 0.5,
      "patch": { "pricing.defaultCadence": "annual" }
    }
  ],
  "primary_metric": "trial_to_paid_conversion",
  "guardrail_metrics": ["refund_rate_30d", "support_ticket_rate"]
}

Notice the patch field. Apex experiments are code, not runtime DOM hacks — the variant is a structured diff your frontend reads from the SDK. This is the difference between an experiment that evaporates in a dashboard and one that becomes a pull request when you promote the winner.

3. Pick the stats engine

Apex ships two decision models. Pick based on what kind of question you're asking, not based on what sounds smart in a meeting.

  • Frequentist (fixed-horizon): use when the cost of running the experiment is bounded and you want a clean p-value at a pre-specified sample size. Good for pricing changes, onboarding copy, anything where "wait two weeks, then decide" is acceptable.
  • Thompson sampling (multi-armed bandit): use when exposure is expensive and you want traffic to shift to the winner as evidence accumulates. Good for paid creative, email send-time optimization, push-notification content — anywhere each impression has a real cost.

4. Pick the surface

A belief can be tested on more than one surface. The pricing-toggle belief lives most naturally on the web — but "annual framing increases commitment" could also be tested in onboarding emails or in-app upgrade banners. Apex treats the surface as a first-class choice.

  • Web A/B — the apex.js snippet serves the variant and tracks exposure.
  • Communications — email or push variants with subject lines, send times, or body content.
  • Mobile — SDK-driven variants inside iOS and Android apps, including paywalls and feature flags.
  • Feature flag — server-rendered variant with full type safety from the SDK.

5. Launch

Hit Launch in the UI, or ship it programmatically from your IDE. Either way the experiment lands in the Lab with a running status and starts assigning variants to new sessions within the audience you specified.

curl -X POST https://api.apex.inc/api/experiments \
  -H "x-api-key: $APEX_API_KEY" \
  -H "content-type: application/json" \
  -d '{
    "name": "Pricing — Annual default for SMB",
    "belief_id": "bel_8a1f0c",
    "type": "web_ab_test",
    "surface": "/pricing",
    "engine": "frequentist",
    "target_power": 0.8,
    "mde": 0.1,
    "variants": [
      { "id": "control",   "name": "Monthly default", "weight": 0.5 },
      { "id": "treatment", "name": "Annual default",  "weight": 0.5 }
    ],
    "primary_metric": "trial_to_paid_conversion",
    "guardrail_metrics": ["refund_rate_30d", "support_ticket_rate"]
  }'

The response includes an exp_id and a link to the Lab view. Apex computes the required sample size from your target power, MDE, and baseline conversion rate, and surfaces a countdown on the experiment card so you know when to stop looking.

6. Read the results

In the Lab, open the experiment and click the Results tab. Three sections matter, in order:

  1. Exposure — does each variant have the traffic you expected? If treatment got 38% instead of 50%, you have a sample-ratio mismatch and something in the assignment layer is broken. Stop before reading further.
  2. Primary metric — the lift, the confidence interval, and the significance verdict. Apex shows both a verbal call ("Treatment wins with 94% probability") and the underlying math so a stats-literate reviewer can check the work.
  3. Guardrails — refunds, support volume, rage clicks. If any guardrail shifted the wrong way by more than the pre-committed threshold, Apex blocks the winner call even if the primary metric is a runaway.

7. Promote the winner

When the experiment reaches its stopping condition, Apex shows a Promote Winner button. Promoting does three things at once:

  • Rolls the winning variant to 100% of matching traffic and closes the experiment.
  • Opens a pull request (if your repo is connected) that applies the variant's patch to the source code. The runtime override disappears; the change lives in your codebase.
  • Updates the source belief: confidence moves toward 0 or 1, the experiment is attached as evidence, and any dependent beliefs are flagged for review.

That last point is the one most teams miss. The belief graph is how one experiment compounds into the next. A won experiment raises the confidence of its parent belief and the beliefs that IF-chain from it. A lost experiment lowers confidence and automatically surfaces conflicting evidence in the Intelligence tab.

What you just did

A belief went in. A variant shipped. Stats decided. The winner became code. The belief graph updated. That's a closed loop — the first one your team has ever captured as a durable artifact instead of a Slack thread.

The point of the first experiment isn't the win. It's the loop. Once you run it once, you can run it a thousand times, and every one gets smarter than the last.

Now do it again. Pick the next belief from the graph — Apex ranks them by value-of-information — and run it. Within a quarter your backlog will be prioritized by the beliefs that would move the business most if they were resolved, not by whichever teammate spoke loudest in the planning meeting.