What to Log When Browser Tests Fail: Video, Console, Network, and DOM State

When a browser test fails in CI, the hardest part is usually not reproducing the test, it is understanding what actually happened. A red build might come from a real product defect, a timing issue, a misconfigured test environment, a third-party dependency, or a flaky assertion that only fails under specific rendering or network conditions. If the only artifact you keep is a stack trace, you are forcing engineers to guess.

Good browser test failure logs are not about collecting everything. They are about keeping the minimum evidence that lets you answer a few practical questions quickly:

Did the browser render the expected UI state?
Did the page throw a JavaScript error?
Did a network request fail, time out, or return unexpected data?
Was the DOM different from what the test expected at the moment of failure?
Was this an infrastructure issue, an app issue, or a test issue?

This checklist focuses on the artifacts that matter most in CI, especially for teams running test automation at scale across Chrome, Firefox, WebKit, or remote browser grids. The goal is not to turn every test into a movie archive. It is to preserve enough context to debug failures without drowning your pipeline in noise.

The short version: what to keep on every meaningful failure

For most browser automation stacks, the most useful browser test failure logs are:

A short failure summary with test name, browser, version, and environment.
A screenshot at the failure point.
A video or screen recording for flows with multiple steps or asynchronous UI behavior.
Console logs, including warnings and errors.
Network traces or request logs for API-driven UI flows.
A DOM snapshot or HTML excerpt from the relevant page state.
Metadata such as URL, viewport, timestamps, and retry count.
Any browser, driver, or grid logs that explain execution problems.

If you only save one thing beyond the stack trace, make it the state of the page at the moment the assertion failed.

The rest of this article explains what each artifact tells you, what it does not tell you, and how to avoid generating so much output that nobody wants to inspect it.

1) Start with failure metadata, not just artifacts

Before you attach a video or DOM dump, capture the context that gives the evidence meaning. A screenshot from a checkout page is not very useful if you do not know which browser, locale, viewport, or test retry produced it.

Capture this metadata for every failure

Test name and suite name
Commit SHA or build number
Branch and pull request ID
Browser name and version
Operating system and runtime image
Viewport size and device emulation settings
URL at failure time
Retry number, if retries are enabled
Test duration and failure timestamp
Parallel worker ID, if applicable
Grid node ID or container ID, if using Selenium Grid or a remote runner

This metadata helps you correlate browser test failure logs with CI logs, infrastructure logs, and application deploys. It also helps distinguish deterministic regressions from test flakiness. A failure that happens only on the third retry, or only in one browser, points you toward a different root cause than a failure that happens consistently across all runs.

Why this matters in practice

A browser test that fails on WebKit but not Chromium could be a browser compatibility issue. A test that fails only on a 1366x768 viewport might be suffering from a responsive layout break. A test that fails only in a container image with a newer font stack could indicate a visual or text-measurement sensitivity. Without metadata, those signals get lost.

2) Keep a screenshot, but do not stop there

Screenshots are still the fastest way to understand many UI failures. They answer the simplest question, what did the user see when the test stopped?

A screenshot is most useful when it includes

The full viewport or a full-page capture when the page is short enough
Browser chrome, if your setup allows it and it helps identify the environment
The exact state at the assertion point, not just the end of the test
The visible error message, toast, modal, or loading indicator

Screenshots are weak at explaining

Why the UI was in that state
Whether a request failed before the screenshot was captured
Whether the page was still animating, rerendering, or waiting on a background job
Whether the issue was only visible in the DOM and not the pixels

For that reason, treat screenshots as the first clue, not the final answer. In React, Angular, Vue, and similar single-page apps, the visual state can look stable while the DOM is still changing or a pending request is about to alter the page again.

Practical rule

Capture a screenshot automatically on failure, and, for flaky-prone flows, capture one after each critical step. This makes it much easier to identify the moment when the UI diverged from the expected path.

3) Add video logs for multi-step flows and timing-sensitive bugs

Video logs are especially valuable when a test failure is caused by transitions, overlays, slow rendering, or user interaction timing. A single screenshot can show the final state, but a video shows how the state evolved.

Video is worth keeping when the test involves

Login and MFA flows
Drag and drop interactions
Infinite scroll or lazy loading
Modals and popovers that depend on animations
File uploads and download flows
Navigation across multiple pages or tabs
Complex form validation with asynchronous save behavior

What video helps you see

The page taking too long to stabilize
A spinner that never disappears
A dialog opening and closing too quickly
An element shifting position during click targeting
The test clicking before the UI is ready
Unexpected browser prompts, permission dialogs, or popups

When video is less useful

Video can be expensive to store and can become hard to search. If every failed test uploads a long video, teams often stop looking at them. For stable, low-risk tests, a screenshot plus logs may be enough. For flows with known timing sensitivity, video is often the best artifact you have.

Keep it short and purposeful

If your runner supports it, start recording at test start and stop at failure, or record only failing tests in full. Long recordings of passing tests rarely help diagnose a specific issue.

4) Capture console logs, including warnings

Console output is a direct line to front-end runtime problems. Many browser failures are not purely test failures, they are app errors that the test exposed.

Save these console events

error messages
warning messages that indicate broken assumptions or deprecations
Unhandled promise rejections
Stack traces from client-side exceptions
CSP violations, if relevant
Failed resource loads that appear in the console

Console errors often explain why the UI never reached the expected state. A button might not render because a JavaScript exception interrupted the component tree. A validation message might never appear because an API response handler crashed. A flaky failure might be tied to a warning that becomes an error only under one browser engine.

Filter carefully

Not every console warning deserves a red build. Some apps emit noisy messages from analytics scripts, browser extensions, or third-party widgets. If you treat every warning as a failure, teams will quickly ignore the signal.

A better approach is:

Always store all console messages for failed tests
Optionally fail the test only on a curated allowlist of severe messages
Suppress known benign noise by source, not by broad message patterns

Selenium and Playwright example: collecting console events

import { test } from '@playwright/test';

test('records console output', async ({ page }) => {
  const messages: string[] = [];
  page.on('console', msg => messages.push(`${msg.type()}: ${msg.text()}`));

await page.goto(‘https://example.com’); // assertions here

console.log(messages.join(‘\n’)); });

If you are using Selenium, you will often need browser-specific log capabilities or driver APIs. The exact setup varies by browser and driver version, so the important part is the principle, keep the logs with the test result instead of hoping someone opens the browser console later.

5) Record network traces for data-dependent failures

A browser test often looks like a UI check, but many failures originate in the network layer. If a page renders incorrectly because a backend request was slow, returned a 500, or produced a shape change in the JSON response, the DOM alone may not reveal enough.

Keep network information such as

Request method, URL, and status code
Request timing and duration
Failed requests and retries
Redirect chains
Response payload summaries for critical API calls
Cache or service worker involvement, when relevant
Throttling or offline simulation settings

Network traces are especially useful when tests fail intermittently because the app depends on race conditions or non-deterministic backend data. They are also useful when a UI failure is actually caused by authentication expiration, CORS misconfiguration, CDN issues, or a third-party API outage.

What to inspect first

If a UI assertion fails, check whether the app made all expected requests. A missing API call can be more important than a wrong DOM assertion. Similarly, a 200 response may still be a failure if the payload shape changed and the client code silently rejected it.

Example: logging failed requests in Playwright

page.on('requestfailed', request => {
  console.log('FAILED', request.method(), request.url(), request.failure()?.errorText);
});

page.on(‘response’, async response => { if (response.status() >= 400) { console.log(‘HTTP’, response.status(), response.url()); } });

Tradeoff

Full network HAR files can be helpful, but they can also be large and noisy. If your app makes dozens of static asset requests, you may only need to retain API calls, failures, and requests related to the current route.

6) Save a DOM snapshot at the moment of failure

The DOM snapshot is often the most underused and most valuable artifact. A screenshot shows pixels, but a DOM snapshot shows structure, attributes, text content, hidden elements, and sometimes state that is not visually obvious.

Good DOM evidence includes

The container element around the failed assertion
The HTML of the component or region under test
Key attributes such as aria-*, data-*, disabled, checked, value, and class
Text content of the target element and nearby context
The computed presence or absence of expected nodes

Why DOM snapshots matter

Many flaky failures are caused by timing mismatches between the test and the UI. The element might exist but be hidden. The element might be visible but detached and reinserted. The text might be correct in the DOM but not yet reflected in the rendered screenshot. The snapshot helps explain which of those states existed at failure time.

Use the smallest useful fragment

Dumping the entire page HTML is often too much. Start with the subtree around the failing locator or the component root. That gives enough context without overwhelming the artifact store.

Example: capturing a targeted DOM snapshot in Playwright

typescript

const el = page.locator('[data-testid="cart-summary"]');
console.log(await el.evaluate(node => node.outerHTML));

For Selenium, a similar approach is to retrieve element attributes or outerHTML via JavaScript execution. The key is to snapshot the relevant part of the page before the DOM changes again.

7) Include browser and driver logs for infrastructure issues

Sometimes the app is fine and the test runner is not. Browser test failure logs should include enough execution detail to diagnose infrastructure problems separately from product problems.

Retain these when available

Browser driver logs
Selenium Grid node logs
Container startup and teardown logs
Browser crash messages
WebSocket or remote debugging connection errors
Resource exhaustion signs, such as out-of-memory messages
Timeouts from the test framework itself

These logs help with issues like browser startup failures, session creation problems, version mismatches, and disconnected nodes. In distributed setups, they can explain failures that never reached the app under test.

If you use continuous integration with parallel workers, this separation becomes even more important. A red job might be due to one bad node, not a regression in the codebase.

8) Decide what to retain based on failure type

Not every failure needs the same evidence. A practical logging strategy uses different artifact sets depending on the failure category.

For assertion failures

Keep:

Screenshot
DOM snapshot
Console logs
Relevant network requests
Test metadata

This is the common case when the UI loaded but the expected state did not appear.

For timeouts

Keep:

Video
Screenshot
Network traces
Console logs
Step timings

Timeouts usually mean the app was late, stuck, or waiting on a dependency. Video often makes the root cause obvious.

For browser crashes or session failures

Keep:

Browser driver logs
Grid or container logs
Screenshot if one exists
Last known console and network events

These failures can happen before the test reaches the application. You need runner evidence more than UI evidence.

For flaky intermittent failures

Keep:

Full artifact bundle for every failure occurrence
Retry number
The exact step that failed
Any preceding warnings, slow requests, or layout changes

For flakes, the failure itself may be less important than the sequence that led to it.

9) Use structured naming so artifacts can be searched later

Artifacts are only useful if engineers can find them. A folder full of screenshot.png and video.mp4 files is not enough.

Good naming conventions include

Test name
Browser
Build number
Retry count
Timestamp
Worker or node ID

Example:

text checkout-add-to-cart_chromium_build-1842_retry-1_node-3_dom.html

This makes it easier to trace failures across CI runs and compare artifacts across browsers.

Store metadata alongside the files

A small JSON or text manifest can make artifact inspection much easier.

{ “test”: “checkout-add-to-cart”, “browser”: “chromium”, “version”: “125”, “retry”: 1, “url”: “https://app.example.com/cart”, “status”: “failed” }

10) Do not over-log by default

The most common mistake is to turn on every possible artifact for every test run. That creates expensive storage, slow CI, and low signal.

Problems caused by over-logging

Artifact storage grows quickly
Uploads slow down the pipeline
Debugging tools become harder to use
Engineers stop checking the logs because there is too much noise
Sensitive data may accidentally be retained longer than intended

A better default

Use a tiered approach:

Always keep metadata and minimal logs
Keep screenshots on failure
Keep video for selected suites or failed retries
Keep DOM and network evidence for high-value or flaky paths
Keep driver logs for environment or session failures

This gives you observability without overwhelming the CI system.

11) Redact secrets and user data before upload

Browser test failure logs can capture real user content, tokens, emails, API responses, and session identifiers. That is useful for debugging, but it is also a security risk.

Redact or avoid storing

Authorization headers
Session cookies
Password fields
One-time codes
Personal data from production-like fixtures
Full request bodies when they include secrets

Safer practices

Run end-to-end tests against synthetic data
Mask sensitive fields in logs before uploading
Avoid full-page screenshots of pages with PII unless access is tightly controlled
Limit artifact retention windows

If your CI environment tests against data that resembles real customer information, treat artifact retention as part of your security policy, not just a debugging preference.

12) A practical failure checklist for CI

Use this as a working standard for browser test failure logs.

Always capture

Test name
Browser and version
Environment and build metadata
Failure message and stack trace
Screenshot

Capture when the test is UI-heavy or flaky-prone

Video
DOM snapshot of the failing region
Console logs
Key network requests and failures

Capture when the failure looks environmental

Browser driver logs
Grid node logs
Container logs
Session creation logs

Capture when the failure is data-driven

API response summaries
Request identifiers or correlation IDs
Timing of critical requests
Cache or auth state if relevant

13) How to wire this into a Playwright or Selenium workflow

The implementation details differ, but the strategy is the same, collect evidence at the point of failure and attach it to the CI artifact bundle.

Playwright pattern

Playwright makes it relatively straightforward to store traces, screenshots, and videos per test. A common pattern is to enable richer artifacts only on retry or failure.

import { defineConfig } from '@playwright/test';

export default defineConfig({ use: { screenshot: ‘only-on-failure’, video: ‘retain-on-failure’, trace: ‘retain-on-failure’ } });

The trace file can be especially valuable because it combines actions, snapshots, console events, and network activity in one place.

Selenium pattern

Selenium usually requires a bit more manual plumbing. A practical setup is to:

Capture screenshots in an after-each hook
Save browser console logs where supported
Export page HTML for the failing element or page
Pull driver or grid logs from the execution environment

The framework matters less than the discipline of preserving evidence close to the failure.

14) Interpreting the evidence, what each artifact tells you

A useful mental model is to map each artifact to the question it answers.

Screenshot, what was visible?
Video, how did the state evolve?
Console logs, did the page throw or warn?
Network traces, did the app receive the right data at the right time?
DOM snapshot, what was actually in the page structure?
Driver logs, did the browser infrastructure fail?

When these artifacts agree, debugging becomes much faster. When they disagree, that is often the most interesting clue. For example, a DOM snapshot may show the correct text while the screenshot still shows the loading state, which can point to rendering or timing behavior rather than a missing API response.

15) A simple retention policy that works for many teams

If you need a starting point for CI policy, use this:

Keep screenshots for every failure for 30 days
Keep DOM snapshots and console logs for every failure for 14 to 30 days
Keep videos only for failed tests, retries, and critical user journeys for 7 to 14 days
Keep network traces for high-value suites or failed API-dependent flows for 7 to 14 days
Keep driver and grid logs for environment failures for 7 days

Adjust retention based on compliance needs, artifact size, and how often the same failure recurs. The right answer is the one your team will actually inspect.

Conclusion

The best browser test failure logs are not the largest ones, they are the ones that let a developer or QA engineer answer the next question immediately. Screenshots show the state. Video shows the sequence. Console logs reveal runtime problems. Network traces expose data and dependency issues. DOM snapshots show what the page actually contained. Infrastructure logs explain failures outside the app.

If you treat these artifacts as a minimal evidence set, not as an unlimited dump, you get faster triage, better flake detection, and cleaner ownership between test problems, application bugs, and environment failures. That is the difference between a red build that gets ignored and one that gets fixed.