How to Test Browser Extensions in Real Browsers Without Losing CI Reproducibility

Testing browser extensions sounds simple until you try to make it repeatable. The first run passes locally, the second run fails because a permission prompt appeared one click earlier, and CI breaks because your extension state lived in a profile that was never reset. If you are trying to test browser extensions in real browsers, the hard part is not just launching Chrome or Firefox, it is keeping the test environment stable enough that failures mean something.

This matters whether you are shipping a classic extension, an internal productivity add-on, or a product feature that behaves like an extension, such as injected scripts, browser-side workflows, or authenticated overlays. Extension testing sits at the intersection of browser automation, permissions, profile state, and cross-browser compatibility. That combination is exactly where flaky tests tend to flourish.

This guide walks through a practical approach to validating extension install flows, permission prompts, content scripts, background behavior, and persistence in actual browsers while preserving CI reproducibility. It also explains where real-browser infrastructure helps, and where tools like Endtest’s cross-browser testing platform can fit as a repeatable execution layer when you need artifacts and multi-browser coverage without building your own grid from scratch.

Why extension testing is different from ordinary browser automation

Normal end-to-end tests usually assume the browser is a blank slate, the app is under your control, and the test framework can reach the DOM reliably. Extension testing breaks all three assumptions.

Extensions touch browser state outside the page

A browser extension can change:

the browser UI itself,
permissions and site access,
tabs and navigation,
local storage and extension storage,
network requests through background scripts,
page DOM through content scripts.

That means a failure can happen in the page, in the browser shell, or in a profile artifact you never inspected. If you do not control profile setup and teardown carefully, tests become order-dependent.

Browser behavior is not identical across engines

Even when the extension logic is the same, Chrome, Edge, and Firefox do not behave identically. Permission prompts appear differently, extension APIs have browser-specific quirks, and some capabilities are simply unsupported on one browser or another. Cross-browser validation is not a nice-to-have here, because the browser is part of the product surface.

For background on the discipline itself, the general concepts of software testing, test automation, and continuous integration are relevant, but extension testing adds an extra layer of browser-managed state that many generic testing strategies do not address well.

CI failures are often environment failures

A flaky extension test might be caused by:

the extension not loading before the page test starts,
a permission dialog appearing in a slightly different order,
profile corruption from a previous run,
browser auto-updates changing the behavior of a helper UI,
missing command-line flags in headless mode,
machine timing differences in CI runners.

The goal is not to eliminate every timing dependency. The goal is to make the environment deterministic enough that the remaining flakes point to real defects.

A good extension test suite treats browser startup, profile state, and permission handling as first-class test fixtures, not incidental setup.

Decide what you actually need to validate

Not every extension feature needs the same kind of coverage. Split the problem before you write tests.

1. Install and enablement flow

You need to verify:

the extension installs from a packaged artifact or enterprise policy,
the browser enables it successfully,
required permissions are requested or predeclared,
the extension appears in the expected browser UI.

This is often the most fragile part because installation flows differ across browsers and may require browser-specific automation hooks.

2. Permission prompts and site access

Extensions often need access to tabs, storage, clipboard, downloads, or specific origins. You should test:

first-run permission prompts,
optional permission request flows,
access to allowed and disallowed origins,
denial handling, including graceful fallback states.

3. Content script injection

If your extension injects scripts into pages, validate:

the script loads on the intended match patterns,
it does not run on excluded pages,
it handles SPAs and dynamic DOM updates,
it does not break due to CSP, iframe boundaries, or race conditions.

4. Background and persistence behavior

Stateful behavior deserves separate coverage:

extension storage survives browser restarts,
settings sync correctly after changes,
background listeners reconnect after suspend or reload,
cached session state does not bleed between tests.

5. Cross-browser compatibility

A test that passes on Chromium does not tell you much if you ship on Firefox too. Validate browser-specific APIs, UI differences, and install methods in each target browser, especially where a browser has its own extension model quirks.

Use a real browser strategy, not a simulated one

There is a temptation to stub the browser extension layer or use a fake browser environment. That can help for unit tests, but it does not answer the question most teams actually care about: does this extension behave correctly in the browser users run?

Real browser testing is the right layer for:

install and enablement,
UI-level permission prompts,
page injection,
browser storage and profile persistence,
browser-specific behavior.

A practical test pyramid for extensions usually looks like this:

Unit tests, validate pure functions and manifest configuration.
Component or integration tests, validate script logic in a controlled environment.
Real browser E2E tests, validate browser-specific install, permissions, and state.

The key is not to move everything to the top of the pyramid. Use real browsers only where browser behavior is part of the requirement.

Build tests around browser profiles and artifacts

If you want reproducible CI runs, treat browser profiles like disposable infrastructure.

Start from a known profile every time

Never depend on your personal local browser profile. It contains:

logged-in sessions,
prior extension installs,
permission decisions,
cached storage,
browser experiments and flags.

Instead, create a dedicated profile directory per test run or per test case, then throw it away after the run. In CI, that means isolated containers or ephemeral workspaces.

Here is a simple Playwright example that launches Chromium with a dedicated persistent context and a preloaded extension path for local debugging:

import { chromium } from 'playwright';

(async () => { const context = await chromium.launchPersistentContext(‘./tmp/profile’, { headless: false, args: [ ‘–disable-extensions-except=./dist/extension’, ‘–load-extension=./dist/extension’ ] });

const page = await context.newPage(); await page.goto(‘https://example.com’); })();

That pattern is useful for investigation, but in CI you should also clean the profile between jobs, or use a fresh container for each run. If a single profile persists across jobs, your tests will eventually become order-dependent.

Capture artifacts for every meaningful failure

A reproducible failure is much easier to debug if you capture:

browser console logs,
extension background page logs,
network traces,
screenshots,
video or session replay where available,
browser version and flags,
the installed extension version and manifest hash.

When a test fails in CI, the first question is often not “what assertion failed?” It is “what browser state existed at the time?”

Testing installation flows in Chrome, Firefox, and Edge

Extension installation is one of the biggest sources of browser-specific behavior.

Chromium-based browsers

Chrome and Edge both use Chromium under the hood, but do not assume behavior is identical. Installation in automation commonly involves:

unpacked extension loading for local verification,
packaged extensions for release validation,
policy-based installation for enterprise scenarios.

For local and CI smoke tests, unpacked extensions are convenient, but they can hide issues that appear only in packaged artifacts. You should include at least one packaged-extension path in your pipeline before release.

Firefox

Firefox extension automation often means working with signed add-ons or temporary extension loading in test contexts. The install and permission model differs from Chromium, so you need Firefox-specific tests, not just a shared browser-agnostic suite.

Edge

Edge is usually close to Chrome behavior, but treat it as a real target browser. Browser branding, policy defaults, and extension UI details can differ enough to break brittle selectors or assumption-heavy flows.

A useful rule is to test the installation path that your users will actually use. If the extension is distributed through a store, test the store-driven install flow at least once. If it is deployed by policy, validate the policy path in an environment that resembles production.

Handle permission prompts explicitly

Permission prompts are a classic source of flakes because they are modal, browser-managed, and timing-sensitive.

Do not assume the prompt is visible immediately

A click that triggers a permission request may not cause an immediate DOM event you can observe in the page context. Instead, wait for browser UI or extension state changes.

In Playwright, one approach is to wait for the browser context event or a popup page when your flow opens a permissions or onboarding window. Depending on the browser and the exact permission, you may need to inspect context pages or use browser-specific APIs.

Test both accept and deny paths

Extensions frequently fail only when permissions are denied. If a site access request is refused, does the extension:

keep working in reduced mode,
explain the issue clearly,
retry correctly after later approval,
avoid corrupting state?

These branches matter because permission denial is a normal user action, not an edge case.

Keep permission state isolated

Never let one test grant a permission and another test rely on that grant. Permission state should be created and destroyed within the test boundary. If your CI setup reuses browser profiles, permission leakage will create false positives.

Validate content scripts with real pages and hostile pages

A content script that works on your app may fail on other sites because of timing, DOM mutations, cross-origin iframes, or content security policy differences.

Test on simple and dynamic pages

Use at least two kinds of pages:

a simple static page, to validate basic injection,
a dynamic SPA or page with late DOM mutation, to validate resilience.

For example, in Playwright you might check that an injected badge appears after the page loads and after the app re-renders:

import { test, expect } from '@playwright/test';

test('extension injects UI into the target page', async ({ page }) => {
  await page.goto('https://your-test-app.example');
  await expect(page.locator('[data-test=extension-badge]')).toBeVisible();
});

That looks simple, but the real work is in making the page fixture deterministic. Prefer dedicated test pages you control, where you can simulate rerenders, route changes, and iframe nesting.

Test excluded pages too

A robust extension should not inject where it should not. Validate negative cases such as:

internal browser pages,
excluded hostnames,
login pages,
cross-origin iframes,
pages blocked by permissions.

The negative path often reveals the best signal for brittle match patterns in your manifest.

Treat extension storage as test data, not hidden state

Extension storage is often the reason a suite passes once and fails later.

Reset storage before each run

If you can clear extension storage through the browser API or by starting from a new profile, do that. If the extension keeps user preferences, seed them explicitly in the test rather than inheriting them from a previous run.

Verify persistence boundaries

Test what should survive and what should not:

preferences should survive a restart,
ephemeral tokens should expire,
onboarding should not reappear after completion,
per-site settings should remain scoped correctly.

Make state visible in assertions

It is easier to debug stateful extension tests if you can assert against deterministic markers such as:

a settings page value,
a known badge text,
a storage-backed flag exposed through the UI,
a debug endpoint in test builds.

Avoid relying only on hidden background logic. If a test fails, you want a visible state change you can inspect.

Design CI for reproducibility before you scale coverage

CI reproducibility is less about speed and more about controlling variance.

Use pinned browser versions where possible

If your pipeline suddenly starts failing after a browser update, your test suite is doing its job, but your release process might not be ready for that variability. Pin versions in CI, then run a separate scheduled job against newer browser releases to detect regressions intentionally.

Keep test execution environment stable

Use a fixed container image or machine image with:

known browser versions,
the extension build artifact already prepared,
deterministic locale and timezone settings,
stable font packages if your extension checks layout,
predictable network conditions for any external page dependencies.

Separate smoke, compatibility, and deep state tests

A practical split looks like this:

Smoke tests, install extension, open target page, validate one core injected behavior.
Compatibility tests, run the same flow across target browsers and major OS combinations.
Stateful tests, verify persistence, permission recovery, and teardown cleanliness.

This keeps your main CI pipeline useful even when deeper browser coverage takes longer.

Keep logs close to the failing step

If your CI system only gives you a final failed job log, you will waste time reconstructing the run. Emit structured events around:

browser launch,
extension load,
permission grant or denial,
page navigation,
content script detection,
storage verification.

That makes triage much faster.

Here is a GitHub Actions example that emphasizes clean browser setup and artifact retention:

name: extension-tests

on: [push, pull_request]

jobs: e2e: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-node@v4 with: node-version: 20 - run: npm ci - run: npx playwright install –with-deps - run: npm run test:extension - uses: actions/upload-artifact@v4 if: failure() with: name: playwright-traces path: test-results/

When Selenium Grid still makes sense

Playwright is often the easiest way to manage modern browser automation, but some teams already have Selenium infrastructure, shared language bindings, or policy requirements that make Selenium Grid practical.

Selenium can still work well for extension testing if you are disciplined about environment control, but browser extension support and browser-specific setup are more manual. The real requirement is not the framework, it is the ability to launch a real browser with the right profile and inspect the resulting behavior.

If you already operate a grid, make sure it supports:

consistent browser versions,
stable node images,
extension loading mechanisms for your browsers,
artifact collection,
session isolation.

If you do not already have that infrastructure, managed real-browser execution can save time. Teams that want cross-browser coverage without assembling a local farm sometimes use platforms such as Endtest as the execution layer, especially when they want repeatable runs, real browsers on Windows and macOS, and artifact-friendly CI results.

Common flake sources and how to reduce them

Race conditions during extension startup

Many extensions initialize asynchronously. The test opens a page before the background worker or service worker is ready, so the injected UI never appears.

Mitigation:

wait for a deterministic extension-ready marker,
expose a test-only readiness signal,
avoid asserting too early after browser launch.

Headless differences

Some extension behaviors differ in headless mode, especially UI prompts and browser-specific surfaces. If a flow depends on browser chrome or a permission dialog, test it in headed mode at least once in CI or in a nightly job.

Cross-origin iframe limitations

A content script that works on a first-party page may not reach cross-origin frames. Test that limitation intentionally so you know whether the behavior is by design or a regression.

Hidden profile contamination

If you reuse browser contexts, stale extension state can make tests pass for the wrong reason. The fix is usually to isolate profiles, not to add more waits.

Most extension flakes are state leaks or timing leaks, not flaky assertions.

A practical test matrix for extension teams

You do not need to test every combination on every commit, but you do need a matrix you can defend.

A sensible starting point is:

Browsers, Chrome, Firefox, Edge, plus Safari if your product requires it and the extension model supports your use case.
Modes, headed for install and prompt flows, headless for smoke where safe.
Profiles, fresh profile for every CI run, persisted profile only for specific state recovery tests.
Page types, static, dynamic SPA, iframe-heavy, denied permission, allowed permission.

A release gate might look like this:

Smoke install and core behavior on the primary browser.
Cross-browser smoke on all supported browsers.
Permission and persistence suite on nightly.
Browser-version compatibility run before release cut.

This keeps the suite efficient without pretending all browser behavior is equal.

Where Endtest can help, without changing your testing strategy

If your team needs browser-side execution across real browsers with repeatable artifacts, a platform like Endtest can serve as the execution layer for the browser portion of these flows. Its agentic AI and low-code/no-code workflow model are more interesting when you need to standardize repetitive cross-browser runs and capture the artifacts that make failures easier to reproduce.

That said, the core testing strategy stays the same. You still need to design for isolated profiles, explicit permission handling, and deterministic state. A platform can reduce infrastructure work, but it cannot replace good test design.

A checklist you can use before shipping

Before you call an extension test suite production-ready, verify that it covers:

installation from the artifact or distribution path users actually use,
first-run permission prompts and deny paths,
content script injection on both simple and dynamic pages,
no injection on excluded pages,
storage persistence and reset behavior,
browser-specific differences across supported engines,
clean profile setup and teardown,
artifact capture for failed CI runs,
pinned or intentionally managed browser versions.

If any of those are missing, your suite may still be useful, but it is not yet reliable enough to explain extension regressions with confidence.

Final thought

The best way to test browser extensions in real browsers is to treat the browser as part of the system under test, not just the test runner. That means installing the extension in a controlled profile, handling permissions explicitly, validating content scripts on real pages, and designing CI so that state cannot leak between runs.

Do that well, and your tests stop being a source of uncertainty. They become the place where extension regressions are found early, reproduced cleanly, and fixed with confidence.