This is the verbatim report.md a web-uplift audit writes, for the seeded playground. Screenshots and the reduced-motion recording are the actual evidence artifacts the model captured, inlined; the heavier raw files (HAR, trace, heap) link to the repo.
This audit's subject is the frozen eval fixture, which preserves the seeded modern-UX issues as ground truth. The genuinely-fixed live playground/ reports zero of these findings (see the eval section below). This separation is the point: the fixture proves recall, the live playground proves the fixes are real.
| Modality | Tool | Used for |
|---|---|---|
| DOM + local source | evidence dom --source |
recon, CSS inspection, confirming hard-coded values |
| Computed styles + ad-hoc probes | evidence dom --selector, evidence evaluate --expr |
colour-scheme adaptation, focus outline, animation state, container flex-direction |
| Screenshot | evidence screenshot |
the white card under dark; the clipped hero at 360px; the focused button |
| Transition video | evidence video --emulate-media prefers-reduced-motion=reduce |
the marquee still sliding under reduced-motion |
| Layout metrics + CLS + long tasks | evidence layout |
horizontal overflow at 360px; CLS from the late banner |
| Heap summary | evidence heap |
object population baseline (no leak found this run) |
| Lighthouse | npx lighthouse (model's choice) |
be-fast-and-stable / be-inclusive / follow-best-practices / be-discoverable |
| axe-core | injected from CDN via evidence evaluate (model's choice) |
independent confirmation of the contrast violation |
Each artifact below is recorded in report.json under artifacts[] with its
type, path, capture condition, and the findings it evidences. Screenshots are
embedded inline at the findings they back; the rest are linked. All paths are
under examples/evidence/.
| Type | Artifact | Condition | Evidences |
|---|---|---|---|
| screenshot | no-dark-mode-dark.png | prefers-color-scheme: dark | F-001 |
| video | motion-under-reduce.mp4 | prefers-reduced-motion: reduce | F-002 |
| screenshot | fixed-layout-360.png | viewport: 360x800 | F-003 |
| screenshot | poor-focus.png | keyboard focus | F-004, F-007 |
| trace | trace.json (devtools-loadable) + trace-summary.json | default load | F-005 |
| har | network.har (HAR 1.2) | default load | F-009 |
| heap | heap-summary.json | default load | - |
| lighthouse | lighthouse-summary.json | default load | F-007, F-008, F-009 |
The trace primitive recorded FCP/LCP at ~33ms with 0 long tasks and 0ms total
blocking time over a ~1.7s window; the har primitive captured 11 requests
(10x 200, 1x 404 - the favicon, which backs F-009). The raw trace.json opens
in the DevTools Performance panel; the model reads trace-summary.json instead.
The audit found all nine ground-truth findings in eval/fixtures/seeded-issues/expected-findings.json, each mapped to the correct principle check, with zero false positives. The genuinely-fixed live playground/, audited the same way, surfaced none of them.
| Metric | Value |
|---|---|
| Ground-truth findings (fixture) | 9 |
| Found (true positives) | 9 |
| Missed (false negatives) | 0 |
| Spurious | 0 |
| Recall on the fixture | 100% (9/9) |
| Precision | 100% |
| Live playground seeded findings | 0 |
Six findings are the seeded CSS scenarios; three (F-007 contrast, F-008 meta description, F-009 console 404) are document-level findings the model surfaced by choosing to run Lighthouse and axe. That is the point of the agentic design: the model judged principles (be-inclusive, be-discoverable, follow-best-practices) that no hand-written check covered.
The expansion from 9 to 15 principles added no spurious fixture findings. The nine default-expectation principles in play were judged for real (six surfaced the seeded findings; implement-natural-interactions, provide-guided-navigation, maximize-content-reduce-noise and be-trustworthy passed). The contextual framework-derived principles were judged not-applicable / opted-out with a rationale rather than penalised:
| Principle | Outcome | Why |
|---|---|---|
| be-private-and-secure | not-applicable | bare localhost static host; no transport/headers/auth to assess (the favicon 404 is captured under follow-best-practices as F-009) |
| be-resilient | not-applicable | client-rendered CSS-scenario demo; offline/installable out of scope, the no-JS shell is a shared harness property |
| be-internationalised | not-applicable | single-locale English demo, no locale-sensitive data |
| be-sustainable | not-applicable | tiny hand-authored demo, weight already minimal (judged proportionally) |
| be-agent-ready | opted-out | static UX demo with no agent-facing surface |
The two new-principle observations on the served site (no CSP header; a blank
no-JS shell) are properties of the bare npx serve host and demo harness, and
are present identically on the genuinely-fixed live playground. Counting them as
fixture findings would be dishonest, so the seeded ground truth stays at nine.
prefers-color-scheme: dark, a computed-style
probe and a clipped screenshot of .ndm-card show background
rgb(255, 255, 255). The card hard-codes #ffffff with no light-dark().color-scheme: light dark and use
light-dark(#ffffff, #1e1e1e) for surfaces. Guidance id dark-mode.prefers-reduced-motion: reduce,
.mv-card.getAnimations() returned 1 running animation (mv-slide); a 2.5s
transition video recorded under the reduce preference shows it still sliding.@media (prefers-reduced-motion: no-preference).prefers-reduced-motion: reduce, the card still sliding:(transition video under the reduce preference).
.fl-hero (~1264px) clipped.width: 100%; max-width: 1200px; box-sizing: border-box..pf-btn reads
outline-style: none; CSS has outline: none and no :focus-visible rule..pf-btn:focus-visible { outline: 3px solid #1a73e8; outline-offset: 2px }..ls-slot reserves no height.min-height (or
aspect-ratio) on .ls-slot..cq-card stays flex-direction: row
inside the 240px .cq-narrow container, with no @container/container-type.container-type: inline-size on the wrapper and a
@container (max-width: 320px) rule that stacks the card.color-contrast: 0 on the
three .pf-btn buttons; axe-core, injected from CDN via the evaluate
primitive and run in-page, independently reported one serious
color-contrast violation across 3 nodes. Not a seeded CSS scenario; surfaced
by the Lighthouse-dimension principles.meta-description: 0; the recon
DOM dump confirms a <title> but no <meta name="description">.<meta name="description"> to the head.errors-in-console: 0
with one 404 (the favicon, under the bare static server with no
<link rel=icon>). Adding a favicon eliminates it.data: SVG <link rel="icon">).9 findings: 4 high, 2 medium, 3 low. Modalities used: DOM+source, computed-style
and ad-hoc evaluate probes, screenshots, a reduced-motion transition video,
layout metrics + a CLS observer, a heap summary, plus Lighthouse and an
injected axe-core run. Recall on the nine ground-truth findings was 100% with
zero false positives, and the genuinely-fixed live playground surfaced none of
them (Lighthouse 100/100/100/100). Highest-leverage fix: adopt color-scheme +
light-dark() so the UI respects the user's dark preference.