Experimentation Is More Than A/B Testing

June 5, 2025

Article by Michael Ferranti

feature-experimentation

When people hear “experimentation” in software, they often think of A/B testing—comparing two versions of a webpage or application screen to see which one performs better. But that narrow framing misses a much bigger truth:

Every line of code is an experiment.

If you’re shipping software, you’re testing a hypothesis—whether you realize it or not. You believe that a new feature, a UI tweak, or a backend optimization will improve an outcome: faster load times, higher conversion, fewer support tickets. That’s what quality is—the degree to which a change delivers its intended impact.

But unless you define what success looks like—and instrument your system to detect departures from that goal—you’re just guessing.

The Scientific Method Applies to Software

Real experimentation requires more than multiple versions. It requires:

A clear hypothesis- Example: “This change will reduce checkout drop-off by 10%.”
Impact metrics to evaluate your hypothesis

You need to observe whether your code meaningfully affects the behavior you care about. These are your Impact Metrics.

They might represent:

The Voice of the Business: conversions, revenue, retention
The Voice of the Customer: satisfaction scores, NPS, customer sentiment
The Voice of Engineering: latency, CPU usage, error rates

You should know where these metrics stand before the change, and where they land after.

These are the signals that tell you whether a change is truly valuable, not just whether it shipped. As engineers, we care about quality, and Impact metrics across a range of variables is how we can tell if we have it.

But let’s be honest—not everything we do is easy to measure with a crisp metric. And that’s okay.

The scientific method still applies, even when our tools or context don’t allow for clean, statistically significant validation. Just like modern physics, where some theories (like string theory) await experimental frameworks that don’t exist yet, we sometimes operate at the edge of what we can observe.

Take something like cleaning up a product’s main menu. We may not expect it to shift NPS or drop support volume tomorrow. But we do it anyway—because it’s the right thing for the user experience, and because we believe small improvements compound.

We still wrap these changes in feature flags—not necessarily to prove something, but to protect against unintended consequences. And we monitor—not always to validate a big win, but to catch regressions, spot unexpected side effects, or just confirm that nothing broke. Sometimes the feedback is loud—metrics dip, tickets spike. Sometimes it’s silent—and that silence is the data.

This doesn’t make the work unscientific. It makes it iterative. It’s still experimentation—it’s just happening with judgment, craft, and a long view.

From UI Tweaks to Full-Stack Impact

In addition to hearing A/B testing when discussing experimentation, most people associate experimentation with frontend changes—buttons, colors, layouts. But meaningful software changes rarely stop at the UI.

A “Share” button might increase clicks. But what if it also adds weight to the page through additional javascript and external calls, increases backend load, or slows checkout down the line?

True experimentation means looking at the entire stack. You’re not just testing what users see—you’re measuring the system’s behavior end-to-end.

You Don’t Need Two Versions to Experiment

A common myth is that experimentation requires variants. But most software teams aren’t running hundreds of concurrent A/B tests. They’re shipping feature requests, addressing bugs, and responding to customer and internal feedback.

And each of those changes is an experiment, whether or not there’s a specific control group.

The real challenge isn’t the number of variants. It’s defining what “better” looks like.

Does engineering think it’s better? (Latency was reduced.)
Does marketing think it’s better? (Conversion improved.)
Does customer support think it’s better? (Tickets declined.)

If you have a hypothesis and a way to measure success across those domains, you’re experimenting. If not, you’re just hoping.

Define “Better” Before You Ship

Before enabling a feature, ask: What does “better” look like?

If nothing changes, is that a win? If conversion drops but performance improves, is the tradeoff worth it?

Being clear about what you’re trying to improve—and how you’ll know—transforms ordinary software development into a learning process.

Where Unleash Fits Into Experimentation Culture

If this is how you want to work—hypothesis-driven, data-informed, full-stack—you need tools that support that mindset. That’s where Unleash comes in.

Unleash enables:

Gradual, targeted rollouts to test changes safely
Real-time monitoring to catch regressions early
Rollbacks without redeployments
Impact metrics that span engineering, business and customer, wherever those metrics live in your stack

Unleash is a platform built for teams that want to move fast—but also want to know what’s working.

Experimentation isn’t a step in the process. It is the process.

And with the right mindset—and the right tools—every change becomes a chance to make the product better.

Interested in implementing full-stack experimentation for yourself? Try Unleash for free.

Share this article

Experimentation Is More Than A/B Testing

The Scientific Method Applies to Software

From UI Tweaks to Full-Stack Impact

You Don’t Need Two Versions to Experiment

Define “Better” Before You Ship

Where Unleash Fits Into Experimentation Culture

Explore further

Full-stack experimentation: How not to screw it up

Agentic Software Development: The Hard Part Is Leadership

Unleash 7 Webinar Recap: In Case You Missed It