Browser tests that fail only sometimes are annoying enough when the root cause is a race condition or a locator problem. They become much harder when the failure is tied to state the browser keeps between runs, especially service workers, Cache Storage, and offline-like behavior. A test may pass on a clean profile, then fail on the second or third run because the browser reuses an asset cached by a previous session, a service worker intercepts a request, or a page thinks it is offline even though the machine has network access.

These failures are common in teams that run test automation against modern web apps with progressive web app features, aggressive caching, or data-heavy frontends. They are also a frequent source of confusion in continuous integration systems where browsers are reused for speed, profiles persist across retries, or parallel jobs interfere with shared test fixtures. If you are investigating flaky browser tests service workers, the goal is not just to make the current test pass. The goal is to understand which layer is hiding the real behavior of the app.

This guide focuses on practical debugging techniques for SDETs, QA engineers, frontend engineers, and test infrastructure owners. It explains how to identify service worker interference, how to inspect cached assets and offline state, and how to make browser automation more deterministic without disabling useful production behavior everywhere.

Why these failures feel different from ordinary flakiness

Most flaky UI tests are caused by timing. The element is not ready yet, the API response was slower than usual, or the test clicked before the page settled. Cache and service worker problems feel different because the test often fails in ways that look logically impossible:

  • The page loads, but shows stale data.
  • A request appears to succeed, but the UI never updates.
  • The app works in a fresh browser context, then breaks on rerun.
  • A test passes in headed mode, but fails headless in CI.
  • A page is reported offline even though the browser can reach the network.

If a browser test changes behavior when you reuse the same profile, suspect storage or service worker state before you suspect the app code.

The reason is that browser runtime state is layered. Your test interacts with page JavaScript, but the network request may be intercepted by a service worker, the response may come from a cache, and the page’s own offline indicator may derive from navigator.onLine, a failed fetch, or some custom app state. A single failed assertion can be the result of a chain of decisions made long before your test action.

The three state layers that usually cause trouble

1. Service workers

A service worker runs separately from the page and can intercept requests, cache responses, and serve assets offline. That makes it useful in production, but risky in tests if you do not control registration and updates.

Common failure patterns:

  • The first test run registers a worker and caches assets.
  • The second test run loads the page from the worker’s cached response, not the network.
  • A code change has been deployed, but the worker still serves an old shell or old API response.
  • A worker update is available, but the test page still uses the active old worker until the next navigation.

2. Cache Storage and HTTP cache

Cache Storage is a browser API often used by service workers. It is distinct from the normal HTTP cache, but both can affect what your test sees. An app might read assets or JSON from Cache Storage, while the browser itself may satisfy a fetch from its HTTP cache before the request ever reaches the network.

This matters when your assertions depend on:

  • request counts,
  • updated response bodies,
  • cache invalidation after login or logout,
  • request headers like authentication tokens,
  • per-test mock data.

3. Offline or offline-like state

Browsers support explicit offline simulation in automation, but apps also infer offline state from failed network calls, timeouts, or service worker behavior. You can end up with a test that is not truly offline, yet the page logic thinks it is.

This can happen when:

  • a service worker falls back to an offline shell,
  • a proxy or browser context blocks requests,
  • network stubbing is incomplete,
  • the app has retry logic that masks the original failure,
  • the browser profile preserves a stale navigator.onLine-adjacent app state after a reload.

Start by proving which layer is responsible

Before changing code, isolate the failure mode. Do not assume the service worker is the culprit just because the app is a PWA.

Check whether the failure depends on browser profile reuse

Run the same test in these modes:

  • fresh browser context each run,
  • same browser but a new context,
  • same user data directory or profile across runs.

If the failure only appears when the profile is reused, storage or service worker state is likely involved. If it appears only after the first navigation, the worker may be registering during the test and affecting later requests.

Compare a clean profile with a dirty profile

A simple diagnostic is to run once on a clean profile, then intentionally rerun against the same profile without deleting browser storage. If the second run fails, you have a reproducibility clue.

Useful artifacts to compare:

  • network log,
  • console messages,
  • application storage snapshot,
  • response bodies for the same URL,
  • navigator.serviceWorker.controller state,
  • cache entries under Cache Storage.

Confirm whether the browser is offline or just behaving like it is

If the app shows offline UI, distinguish these cases:

  • the browser context is explicitly offline,
  • the fetch request is blocked or mocked,
  • the service worker intercepted the request and returned fallback content,
  • the app’s own connectivity probe failed.

In Playwright, for example, you can check offline mode directly or use request logging to see whether the page is really reaching the network.

typescript

await context.setOffline(false);
page.on('request', request => {
  console.log('request', request.url());
});
page.on('response', response => {
  console.log('response', response.status(), response.url());
});

How to inspect service worker behavior

Look for worker registration and activation

If your app registers a service worker, find out when it happens and what scope it covers. Many flaky tests happen because a worker is registered on the first page load, but subsequent navigations run under worker control.

For debugging, inspect:

  • registration scope,
  • worker script URL,
  • activation status,
  • whether the current page is controlled,
  • whether update checks are happening during test execution.

In browser devtools, the Application panel is helpful. In automation, you can often evaluate a few properties on the page.

typescript

const controlled = await page.evaluate(() => Boolean(navigator.serviceWorker?.controller));
console.log({ controlled });

If controlled is true, requests from that page can be intercepted by the service worker.

Temporarily disable service worker caching in test builds

The cleanest test strategy is often to disable registration in test builds or behind an environment flag. If the app does not need offline behavior under test, do not load worker code in the test environment.

Typical options:

  • gate registration on NODE_ENV !== 'test',
  • use a build-time flag for E2E environments,
  • register only in production domains,
  • unregister in a test-only setup step.

This is not always practical if you specifically need to test PWA behavior. In that case, isolate the PWA scenarios into a dedicated suite, and keep the rest of your tests on a fresh profile with worker registration blocked or removed.

Verify updates explicitly

Service workers update asynchronously. A test may pass on the first load because the old worker still handles requests, then fail after the browser notices a newer worker and switches control mid-suite.

To make update behavior visible during debugging, log worker lifecycle events in the app or inspect them in automation. If the app exposes a page-level event bus or telemetry hook, record worker transitions in test logs.

How to inspect Cache Storage and HTTP cache

Dump Cache Storage entries during a failing run

Cache Storage is often the hidden source of stale responses. You can inspect it from the page context.

typescript

const cacheNames = await page.evaluate(async () => await caches.keys());
console.log(cacheNames);

const entries = await page.evaluate(async () => { const names = await caches.keys(); const result: Record<string, string[]> = {}; for (const name of names) { const cache = await caches.open(name); const requests = await cache.keys(); result[name] = requests.map(r => r.url); } return result; }); console.log(entries);

If a failed test is reading a stale HTML shell or JSON response from cache, the URLs in this dump usually make the cause obvious.

Remember that browser HTTP cache is separate

Disabling service workers does not necessarily disable browser HTTP caching. If your failure is due to 304 Not Modified behavior or asset reuse, you may need to:

  • create a new browser context,
  • set cache-related headers in the app or test environment,
  • append cache-busting query params for deterministic test fixtures,
  • clear browser data between runs.

For HTTP-level debugging, logs from a proxy, a browser network trace, or a test runner HAR can be more informative than a UI screenshot.

Avoid overusing global cache disabling

It is tempting to disable all caching for all tests. That can mask the real issue and slow the suite enough that timing shifts create new flakes.

A better approach is usually:

  • disable caching only in the suites that do not test cache behavior,
  • keep one explicit suite for cache and service worker behavior,
  • make test data versioned so stale responses are obvious.

When offline state is the real problem

Offline bugs often show up when automation is run on CI workers with strict network controls, local Dockerized browsers, or proxies that make some requests fail intermittently.

Distinguish app offline UI from browser offline mode

An app might show offline UI because one API call failed, even though the browser is online. That means the root cause could be:

  • backend availability,
  • DNS or proxy resolution,
  • certificate problems,
  • blocked third-party assets,
  • an auth redirect that did not complete,
  • a service worker fallback route.

If the test only checks the visible offline banner, it may hide the original network issue. Add request-level logging and assert the exact endpoint or asset that failed.

Reproduce offline behavior intentionally

You should have at least one test that deliberately simulates offline mode. This gives you a reference for what genuine offline behavior looks like in your app.

Playwright example:

typescript

await context.setOffline(true);
await page.reload();
await expect(page.getByText('You are offline')).toBeVisible();

This is useful because it tells you whether the app handles true offline mode correctly, separate from accidental offline-like failures.

Be careful with retries

Retries can make offline issues harder to diagnose. A request that fails once and succeeds on retry may look like a transient backend issue, when the real problem is a short-lived race with worker activation or app boot code.

If a test is flaky around offline detection, capture the first failure before retry logic changes the state again.

A debugging workflow that usually works

Step 1, minimize the test

Strip the test down to the smallest path that still fails. Keep the same browser, same profile behavior, and same app entry point.

Ask:

  • Does it fail on initial load or after navigation?
  • Does it fail only after login?
  • Does it fail after a page reload?
  • Does it fail only in one browser engine?

Step 2, capture network and storage state

Record:

  • all requests and responses,
  • console logs,
  • localStorage and sessionStorage contents,
  • Cache Storage contents,
  • service worker controller status.

In Selenium, you may need browser-specific devtools integration or app-side logging to get enough visibility. In Playwright, request and console listeners are usually enough for a first pass.

Step 3, repeat with a pristine context

Create a new context with no shared storage. If the failure disappears, you are dealing with state leakage, not a pure timing problem.

Step 4, selectively remove sources of hidden state

Try these one at a time:

  • unregister service workers,
  • clear caches,
  • clear cookies and storage,
  • use a different browser profile,
  • disable offline simulation,
  • bypass service worker in the test environment.

Step 5, confirm the same failure without the UI

If possible, hit the same backend endpoint directly with a test client or inspect the response from the page context. This helps answer whether the data is stale before it reaches the UI, or whether the page is rendering stale data after it receives the correct response.

Practical Playwright patterns for these bugs

Playwright is often a good fit because it gives you fine-grained control over browser contexts and network inspection.

Create a fresh context per test

import { test } from '@playwright/test';
test('uses a clean browser state', async ({ browser }) => {
  const context = await browser.newContext();
  const page = await context.newPage();
  await page.goto('https://example.test');
  await context.close();
});

Log requests that are likely to be intercepted

page.on('request', request => {
  if (request.resourceType() === 'fetch' || request.resourceType() === 'xhr') {
    console.log('request', request.method(), request.url());
  }
});

Clear storage when the app under test allows it

typescript

await page.evaluate(async () => {
  localStorage.clear();
  sessionStorage.clear();
  const keys = await caches.keys();
  await Promise.all(keys.map(key => caches.delete(key)));
});

This does not remove the service worker itself, but it helps reveal whether a stale cache entry is the problem.

Practical Selenium patterns for these bugs

Selenium can still debug this class of failure well, especially in environments already built around WebDriver and continuous integration.

Use a clean profile whenever possible

If your grid allows it, avoid reusing user profiles across test jobs. A reused profile can preserve worker registration, cache data, and storage from previous runs.

Attach browser logs or devtools trace data

Browser console logs are often the fastest path to a clue. If your app logs when a worker takes control or when offline mode is detected, those logs can explain the failure without a full trace.

Do not assume driver.refresh() clears the problem

A refresh may still use the same profile, the same service worker, and the same cached assets. If the issue survives a refresh, try a new browser session instead.

How to make these bugs less likely in the first place

Separate production behavior from test determinism

Not every test should exercise service worker caching. In fact, most functional tests should not. Keep production behavior enabled only in suites that explicitly need it.

A useful rule is:

  • feature tests for app logic, use fresh contexts and minimal storage,
  • PWA or offline tests, use a dedicated suite with controlled cache state,
  • cross-browser smoke tests, keep caching behavior predictable and easy to reset.

Version your test data and assets

If the UI is sensitive to cached data, use versioned API fixtures or unique cache keys per test run. That makes stale cache entries easier to detect and invalidate.

Make service worker changes visible

When app code or a deploy changes the service worker, include clear logging in test environments. A few extra lines in telemetry or console output can save hours of guessing later.

Have a cleanup step that is actually effective

A teardown that only closes the tab is not enough. Depending on your setup, cleanup may need to clear storage, close the browser context, or delete the temporary profile directory.

Decide whether to block or embrace caching in E2E

For each suite, answer one question clearly: is caching part of the thing under test? If not, disable it or isolate it. If yes, write explicit assertions about cached behavior rather than letting it affect unrelated tests.

A quick decision tree for debugging

If a test fails only on rerun

Suspect cached state or service worker registration.

If a test fails only in CI

Suspect profile reuse, network policy, proxy behavior, or slower worker activation timing.

If a test fails after a login/logout flow

Suspect storage leakage, stale cache entries, or cached auth responses.

If a test fails only in one browser engine

Check service worker support, cache eviction differences, and offline behavior across engines.

If a test fails after a page reload but not on initial load

Suspect worker control, cache revalidation, or app boot code that reads old data on startup.

What to capture in a bug report

When you hand this off to a teammate, include enough detail to separate network bugs from browser state bugs:

  • browser and version,
  • test runner and browser automation tool,
  • whether a clean profile fixes it,
  • whether service workers are enabled,
  • whether offline mode was set explicitly,
  • the exact URLs loaded from cache, if known,
  • request and response logs for the failing action,
  • whether the failure reproduces after clearing storage.

That information makes it much easier to decide whether the fix belongs in the app, the test harness, or the infrastructure layer.

Final takeaway

Flaky browser tests service workers are rarely just about one bad assertion. They usually point to hidden browser state that outlives a single test step and changes what the page sees on the next run. The fastest path to a stable fix is to isolate the layer causing the behavior, then make your test setup either fully control that state or eliminate it altogether.

If you treat service workers, Cache Storage, and offline state as first-class variables in your debugging process, the failures become much easier to explain. More importantly, you can decide when caching is a real product behavior worth testing, and when it is just noise getting in the way of reliable automation.