How to Run Playwright Tests on Real Safari

If you’ve ever passed a Playwright suite in Chromium and Firefox, then opened the same flow in Safari and found layout bugs, click issues, or silent assertion failures, you already know the problem: Safari is often the browser that exposes assumptions your test suite has been making all along.

The tricky part is that Playwright does not run against real Safari. Its WebKit channel is close enough for many compatibility checks, but it is still WebKit, not Safari itself. That distinction matters when you are dealing with font rendering, input behavior, CSS quirks, storage, media APIs, or differences introduced by Apple’s Safari-specific integration on macOS.

This tutorial explains how to run Playwright tests on Safari in the practical sense, what Playwright can and cannot do, where WebKit helps, where it falls short, and how to decide when you need real Safari testing on a real macOS machine.

The short version

If you need the fastest path:

Use Playwright’s WebKit project for early Safari-like coverage.
Run the same tests on a real macOS Safari browser for validation.
Treat WebKit as a proxy, not a guarantee.
If maintaining macOS browsers and Safari infrastructure is becoming a drag, consider a managed platform such as Endtest, an agentic AI [Test automation](https://en.wikipedia.org/wiki/Test_automation) platform,’s cross-browser testing, which runs tests on real browsers on real macOS machines.

WebKit coverage is useful, but it is not the same thing as proving your app behaves correctly in Safari on macOS.

Why Safari testing is different from other browsers

Safari is not just another browser in your matrix. The Safari engine, its system integration, and macOS-specific behavior affect how users experience your app.

Common areas where Safari testing pays off:

Sticky positioning and overflow interactions
Focus management and tab order
Date and time input behavior
Video autoplay policies
Cookie and storage edge cases
File upload and permission prompts
Font metrics and text wrapping
Mobile Safari differences, especially on iOS

Some of these show up in WebKit too, but not all. Even when the rendering engine is the same, the browser environment, OS integration, and system-level APIs can still differ.

Playwright WebKit versus real Safari

Playwright ships with three browser engines for automation:

Chromium
Firefox
WebKit

The official Playwright docs explain the supported browser families and installation model well, and they are worth reading before you build your matrix: Playwright introduction.

What WebKit gives you

WebKit is valuable because it lets you catch many Safari-adjacent issues without having to provision a Mac for every developer machine or CI worker. It is useful for:

Early visual and functional feedback
Basic compatibility checks
Reproducing some CSS and DOM behavior that diverges from Chromium
Running on Linux or Windows during normal development

What WebKit does not give you

WebKit is not a substitute for Safari in these cases:

Safari-only behavior in the browser UI
macOS-specific font rendering and text metrics
Apple platform permissions and integrations
Real-world behavior of browser features tied to Safari releases
Bugs that only happen in the Safari process, window manager, or OS shell

That gap is why teams sometimes say, “It passed in WebKit, but it still broke in Safari.”

When you should run tests on real Safari

Use real Safari when the bug is user-visible and Safari-specific, or when your product has a meaningful share of Safari traffic.

A few practical triggers:

Your app uses complex client-side interactions, and Safari has historically been flaky with them
You are debugging layout or typography regressions on macOS
You support customers in Safari and need confidence before release
You are validating a fix for a Safari-only bug report
Your application uses browser APIs that are known to vary by browser and OS

If you are only doing broad regression tests and you do not have any Safari-specific risks, WebKit may be enough for the first pass. But if you want to say your suite covers Safari, you need real Safari somewhere in the pipeline.

How to run Playwright tests on Safari-like WebKit

For many teams, the first step is to add a WebKit project to the Playwright config. This is not real Safari, but it is the easiest way to extend browser coverage.

import { defineConfig, devices } from '@playwright/test';

export default defineConfig({ projects: [ { name: ‘chromium’, use: { …devices[‘Desktop Chrome’] } }, { name: ‘firefox’, use: { …devices[‘Desktop Firefox’] } }, { name: ‘webkit’, use: { …devices[‘Desktop Safari’] } }, ], });

A few notes:

webkit is the browser engine that Playwright automates.
Desktop Safari is a device preset, not real Safari.
This is a useful baseline, but do not present it internally as full Safari verification.

To run the WebKit project:

bash npx playwright test –project=webkit

If this is all you need, great, but many teams discover that WebKit catches the broad class of issues while Safari itself still requires a separate validation path.

How to run Playwright tests on real Safari on macOS

Playwright does not directly automate Safari through its normal browser channels. Safari automation on macOS is usually done through WebDriver-compatible approaches, or through a testing platform that provisions real Safari browsers for you.

If you are driving Safari yourself, Apple’s documentation on testing with WebDriver in Safari is the right place to start.

What you need on macOS

At minimum, you need:

A Mac running a supported Safari version
Safari’s remote automation enabled in Develop menu settings
A stable test environment with repeatable browser setup
A plan for running tests locally, in CI, or on dedicated hardware

Typical setup steps include:

Install or update Safari on macOS.
Enable the Develop menu in Safari preferences.
Enable remote automation.
Run your tests against the Safari WebDriver endpoint or a platform that exposes Safari sessions.

On local Macs, you can also validate Safari behavior with simple smoke tests using browser automation tooling that talks to WebDriver. For many teams, though, local Safari setup is where inconsistency begins, because browser versions, OS patches, and developer machine drift all create noise.

Why not just use your laptop?

Because local execution is convenient but fragile:

The browser version may drift from CI
Developer laptops accumulate conflicting state
Test results become hard to reproduce
Parallel runs can be limited by machine resources
One person’s machine can become the de facto Safari lab

That is tolerable for debugging, but not ideal for a repeatable release process.

A practical testing strategy for Safari coverage

You usually do not need a single perfect Safari pipeline. You need a layered strategy.

Layer 1, fast feedback in WebKit

Use WebKit in local development and in PR checks for quick confidence. This catches a lot of regressions early.

Layer 2, smoke testing on real Safari

Run a small, high-value set of smoke tests on real Safari. Focus on:

Login
Navigation
Forms
File upload
Critical workflows
One or two visual assertions for the most fragile pages

Layer 3, targeted reproduction for bugs

When a Safari bug appears, reproduce it directly on a real Mac with the same browser and OS combination if possible. Reduce the test until you know whether the issue is in your app, your locators, your timing assumptions, or Safari itself.

The goal is not to put every test on Safari. The goal is to put the right tests on real Safari and avoid misleading green runs.

Common Safari-specific Playwright issues and what they usually mean

1. Clicks fail even though the element is visible

This can happen when Safari’s layout engine places overlays, pseudo-elements, or scroll offsets differently. If a click is flaky in Safari but not Chromium, inspect:

Z-index and overlapping elements
Sticky headers
Scrolling containers
Pointer-event styles

2. Text wraps differently and breaks assertions

Safari and macOS font rendering can produce subtle differences in line breaks and element sizes. If you are asserting exact pixel values, make sure the assertion is necessary. Prefer robust checks for structure and content instead of fragile measurements when possible.

3. Waits pass in Chromium but timeout in Safari

Safari can be slower to settle on some pages, especially those with heavy client-side rendering or delayed hydration. In those cases, use waits tied to app state, not arbitrary sleep values.

typescript

await page.getByRole('button', { name: 'Save' }).click();
await expect(page.getByText('Saved successfully')).toBeVisible();

The pattern above is better than sleeping and hoping the DOM is ready.

4. File uploads and permissions behave differently

Safari on macOS may involve permission dialogs, download behavior, or OS-level prompts that WebKit-based testing does not fully reproduce. This is one of the clearest examples of why “WebKit passed” is not the same as “Safari passed.”

5. Storage or authentication seems inconsistent

Authentication flows that rely on cookies, localStorage, SameSite settings, or cross-site redirects can behave differently across browsers. If Safari is failing, verify the auth sequence end-to-end on macOS, not just the individual API calls.

Making Safari tests less flaky

Safari flakiness is often a test design problem before it is a browser problem.

A few tactics help immediately:

Prefer getByRole, getByLabel, and other user-facing locators
Avoid brittle CSS chains and positional selectors
Wait on visible app state, not on arbitrary timers
Keep test data isolated per run
Do not reuse state across unrelated tests unless you need to
Reduce visual assertions to the minimum needed for confidence

Here is a locator example that is usually more stable than a CSS selector:

typescript

await page.getByRole('button', { name: 'Continue' }).click();
await expect(page.getByRole('heading', { name: 'Billing' })).toBeVisible();

If this still flakes only in Safari, the issue may be timing, focus handling, or a real browser difference that needs product-side investigation.

CI considerations for macOS and Safari

Safari testing in CI is harder than Chromium testing because macOS runners are less flexible and more expensive to manage. You need to decide whether Safari coverage belongs in your own CI or in a managed browser environment.

If you run Safari in your own infrastructure

Plan for:

Dedicated macOS runners or machines
Browser version control
Cleanup between runs
Test isolation and artifact collection
Retry strategy for known transient issues

A simple GitHub Actions workflow might look like this for a Mac runner, although the exact setup will depend on how you launch Safari automation:

name: safari-smoke
on: [push]
jobs:
  test:
    runs-on: macos-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 20
      - run: npm ci
      - run: npm run test:safari

This gives you the shape of the pipeline, but not the complexity of managing browsers, sessions, and flake isolation.

If you use a managed platform

A managed platform becomes attractive when your team wants real Safari coverage without becoming a browser infrastructure team. That is where Endtest’s comparison with Playwright is relevant, especially if your team does not want to own the browser matrix, CI glue, and machine maintenance.

Endtest’s model is worth evaluating if you want:

Real Safari on real macOS machines
A managed environment instead of local browser setup
A lower-friction workflow for non-developers who still need test coverage
A browser testing stack that is less tied to maintaining your own infrastructure

Unlike a local Playwright WebKit run, this is aimed at real-browser validation. That matters when Safari behavior is the issue, not general WebKit compatibility.

Choosing between Playwright WebKit and real Safari

Use this rule of thumb:

Choose WebKit when

You want quick compatibility feedback
The test is part of a broad cross-browser regression pass
You are still in development and need a fast signal
You care about browser engine behavior more than Safari-specific UI integration

Choose real Safari when

The customer issue is reproducible only in Safari
You need release confidence for Safari users
You are debugging browser or OS-specific behavior
You want a true Safari signal before shipping

Choose a managed real-browser platform when

You need real Safari regularly, not occasionally
Your team does not want to own macOS hardware or browser maintenance
You want browser coverage without tying it to one developer’s machine
You need a more scalable alternative to a locally maintained setup

For teams that want this path with less setup overhead, Endtest is a reasonable option to evaluate as a browser testing platform. It runs on real browsers on real machines, including real Safari on macOS, which is the piece WebKit cannot fully replace.

A realistic workflow for frontend teams

If you are trying to make this practical, here is a simple workflow that works well:

Add WebKit to Playwright for fast browser diversity.
Keep your tests locator-driven and state-driven.
Identify the small set of flows that truly need Safari validation.
Run those flows on a real macOS Safari environment.
Track Safari-specific failures separately from general test flakiness.
Reduce the amount of manual browser setup your team has to maintain.

That last point is important. A lot of browser testing pain comes from over-indexing on the idea that the team must own every browser, every runner, and every environment. In practice, the more your Safari strategy depends on hand-managed machines, the more fragile the process becomes.

Final thoughts

If your goal is to run Playwright tests on Safari, the first question is not how to force Playwright into a Safari-shaped box. The real question is whether you need WebKit compatibility, real Safari behavior, or both.

Playwright’s WebKit project is a good early-warning system. It is fast, useful, and easy to add. But if the thing you need to verify is true Safari behavior on macOS, then WebKit is only a proxy. Real Safari testing still needs a real Safari browser on a real Mac, whether that is in your own lab or through a managed platform.

For teams that want to keep the testing workflow simple while still getting real Safari coverage, Endtest is worth a look because it removes a lot of the environment management that usually makes Safari testing harder than it should be.

The most reliable setup is rarely the one with the most browser engines. It is the one that gives you the right signal with the least infrastructure noise.