Unleash

Full-stack experimentation: How not to screw it up

Research shows clear relationships between performance measures like Core Web Vitals and business metrics. Amazon found that a 100ms latency increase results in a 1% sales decrease, while Walmart discovered that a 1-second load time improvement leads to a 2% conversion increase. Google’s research revealed that when page load time increases from 1 second to 3 seconds, bounce rates increase by 32%.

These aren’t outliers. Multiple studies confirm that slower sites convert less. And standard experimentation programs track business metrics but ignore critical performance indicators including Largest Contentful Paint (LCP) changes, Interaction to Next Paint (INP) impact, Cumulative Layout Shift (CLS) degradation, bundle size increases, server response time changes, and the performance impact of the testing tool itself.

And yet, as one of our engineers observed:

“The truth is, hardly anyone takes web performance budgeting seriously — it’s often treated as a second-class citizen.” – Mateusz Kwasniewski

Without this data, you can’t determine if conversion changes result from your feature, from performance side effects, or from the experimentation methodology itself.

The experimentation tax

Experimentation platforms often degrade Core Web Vitals through several mechanisms. DOM manipulation occurs when visual editors modify page content after load, causing layout shifts. JavaScript overhead happens as testing libraries increase bundle size and execution time. Multiple requests are made through additional API calls to determine test variants. Content flicker appears as brief display of original content before applying test variations.

These performance impacts can be significant enough to affect conversion rates independently of the features being experimented on. Engineering teams see what marketing teams often miss: experimentation overhead accumulates. Each additional experiment adds more JavaScript to execute, additional network requests, more potential layout shifts, and other side effects.

A single experiment might add 50ms to page load time. Ten concurrent experiments might not increase this time by 10x, but it will increase it by some additional amount. This cumulative degradation often goes unmeasured in individual experiment analyses.

So how do you keep track of your experimentation program’s impact on performance?

Establish experimentation program budgets

Set performance budgets for your experimentation program by defining page-level budgets that include maximum acceptable LCP increase of 200ms, maximum CLS degradation of 0.05, and maximum bundle size increase of 10%. Additionally, establish program-level budgets covering total experimentation overhead across all concurrent experiments, maximum number of simultaneous experiments per page, and performance impact limits for different page types.

Measure cross-team impact

Track metrics that matter to both teams by focusing on conversion rates, revenue impact, and experiment velocity for marketing teams, while monitoring Core Web Vitals, error rates, and page load times for engineering teams. Both teams should pay attention to user satisfaction scores, bounce rates, and task completion rates.

Set performance baselines

Before testing, measure LCP, INP, and CLS for all test pages, along with bundle sizes, server response times, and error rates. This baseline data provides the foundation for understanding performance impact during experiments.

Measure experimentation tool overhead

Compare Core Web Vitals between pages with testing scripts loaded versus clean pages, users enrolled in tests versus users not in any tests, and different testing platform implementations. Document the performance baseline cost of your testing infrastructure to understand the inherent overhead.

Monitor during experiments

Track Core Web Vitals by test variant using code like this:

// Track LCP with test variant context
new PerformanceObserver((entryList) => {
  for (const entry of entryList.getEntries()) {
    analytics.track('performance_metric', {
      metric: 'LCP',
      value: entry.startTime,
      test_variant: getCurrentVariant(),
      user_id: getUserId()
    });
  }
}).observe({ entryTypes: ['largest-contentful-paint'] });

Establish performance thresholds

Set automatic flags – for example, LCP degradation greater than 2.5s, INP increase greater than 200ms, and CLS increase greater than 0.1. These thresholds help identify when experiments are causing significant performance issues.

Google Analytics setup

You can also create custom dimensions for test variants and performance metrics. For example:

// Send Core Web Vitals with test context
gtag('config', 'GA_MEASUREMENT_ID', {
  custom_map: {
    'custom_parameter_1': 'test_variant',
    'custom_parameter_2': 'performance_cohort'
  }
});

// Track conversion with performance context
gtag('event', 'conversion', {
  'test_variant': getTestVariant(),
  'lcp_value': getCurrentLCP(),
  'performance_cohort': getPerformanceCohort()
});

Resolving the marketing-engineering tension

If you’re an engineer reading this, you know exactly what we’re talking about. If you’re a marketer, you probably are concerned about issues like these slowing down the velocity of your testing program. Here are some ways around this.

Shared success metrics

Define success criteria that align both teams by focusing on revenue per visitor (which accounts for both conversion rates and traffic quality), customer lifetime value (which measures long-term impact of user experience), task completion rates (which balance optimization with usability), and Net Promoter Score (which captures user satisfaction with performance and features).

Experimentation governance framework

Establish processes that balance experimentation velocity with performance through an experiment approval process that requires performance impact assessment for all experiments, engineering review for experiments affecting Core Web Vitals, automatic performance monitoring for all running experiments, and rollback procedures for performance degradation.

Resource allocation should include engineering time allocated for experimentation infrastructure optimization, marketing budget that includes performance monitoring tools, and shared accountability for overall site performance metrics.

Performance-first experimentation strategy

Create a priority framework for experiment selection where high priority experiments have minimal performance impact and high conversion potential, medium priority experiments have moderate performance trade-offs but strong business cases, and low priority experiments require significant performance compromises.

This creates alignment by making performance impact visible in experimentation decisions.

Using feature flags to monitor variant performance

Feature flags let you control experiment rollouts while monitoring performance. Take this pseudo-code, for example:

const testVariant = unleash.getVariant('checkout-redesign', {
  userId: user.id
});

if (testVariant.name === 'new_design') {
  loadNewCheckoutFlow();
  trackPerformanceMetrics('checkout_new');
} else {
  loadCurrentCheckoutFlow();
  trackPerformanceMetrics('checkout_control');
}

This approach lets you roll back immediately if performance degrades without waiting for deployment cycles.

Experiments that ignore performance data provide incomplete pictures of user experience impact

By measuring Core Web Vitals alongside conversion metrics, you can avoid launching changes that appear successful but actually hurt business outcomes.

The marketing-engineering divide over experimentation programs stems from measuring different outcomes. Marketing teams see conversion improvements from individual experiments. Engineering teams see cumulative performance degradation from experimentation infrastructure. Both perspectives are valid and necessary.

Performance isn’t separate from conversion—it’s a direct driver of conversion. Teams that measure both make better decisions and build better user experiences. Organizations that align marketing and engineering around shared performance-conversion metrics build sustainable competitive advantages.

Share this article