Netlist NES / 6502 simulators — a performance comparisonNetlist NES / 6502 模擬器 —— 效能比較

A sourced survey of publicly-documented transistor-level / switch-level NES, 2A03, 2C02 and 6502 simulators, and where AprVisual sits among them. Only traceable sources are used; every derived number states its assumption.

整理目前公開可查的 transistor-level / switch-level NES、2A03、2C02、6502 模擬器效能,以及 AprVisual 在其中的位置。只採用可追溯來源;每個推算數字都標明假設。

A note on units單位注意

Different projects' "Hz" don't mean the same thing — some count 6502 clocks or half-cycles, others count chip-simulation steps, half-steps, trace lines, or internal master half-cycles. The safest comparison keeps each source's original unit and only derives a frame time when the assumption is explicit.

不同專案的「Hz」意義不同 —— 有的算 6502 clock 或 half-cycle,有的算 chip simulation step、half-step、trace line,或內部 master half-cycle。最安全的比較是保留各來源原本的口徑,只有在假設明確時才推算 frame time。

For a frame-size baseline: an NTSC PPU frame is 262 scanlines × 341 PPU cycles, and Visual 2C02-style traces use 8 half-cycle lines per PPU tick, giving:

frame 規模基準:NTSC PPU 每 frame 262 scanlines × 341 PPU cycle,而 Visual 2C02 風格的 trace 每個 PPU tick 有 8 個 half-cycle 行,因此:

341 × 262 × 8 = 714,736 master half-cycles / frame

This is also the fallback frame size in AprVisual's RunFrame().

這也是 AprVisual RunFrame() 的 fallback frame 大小。

Summary比較摘要

Project專案 Scope範圍 Public speed claim公開效能說法 ~ frame time約 frame 時間
Visual6502 (JS)6502 transistor-level6502 電晶體級~1 clock/s animated; ~250Hz+ expert mode~1 clock/s(有動畫);expert ~250Hz+n/a (CPU only)
Visual6502 Python / C port6502 transistor-level6502 電晶體級Python ~55Hz; C port ~1kHzPython ~55Hz;C port ~1kHzn/a (CPU only)
FPGA-netlist-tools / Verilator6502 netlist-derived HDL6502 netlist 衍生 HDLFPGA 1MHz+; Verilator ~4kHzFPGA 1MHz+;Verilator ~4kHzn/a (HW/RTL)
perfect65026502 NMOS netlist (C)6502 NMOS netlist(C)~1/30 of 1MHz 6502 on a 2025 CPU2025 CPU 上約 1MHz 6502 的 1/30n/a (CPU only)
Visual NESVisual 2A03 + 2C02 (C++/C#)Visual 2A03 + 2C02(C++/C#)~1/1000 real NES; dual-chip ~5000Hz~1/1000 實機;雙晶片 ~5000Hz~30–60 s
MetalNESfull NES-001 board, transistor-level完整 NES-001 主機板,電晶體級user/press reports, minutes/frame使用者/媒體轉述,分鐘/frame~1–2 min
AprVisual.S1 (C#)NES switch-level, pure BFSNES 開關級,純 BFS67.3K hc/s (this machine, 300k hc)10.62 s
AprVisual rust-s1NES switch-level, pure BFSNES 開關級,純 BFS71.9K hc/s (this machine, 300k hc)9.94 s

AprVisual figures are this machine's actual run (Ryzen 7 3700X, 300k half-cycles of full_palette). A stricter 200k interleaved-paired clean bench gives ~64K (C#) / ~69K (Rust) — same order, slightly more conservative.AprVisual 數字為本機實測(Ryzen 7 3700X,full_palette 300k half-cycle)。更嚴格的 200k 交錯配對 clean bench 為 ~64K(C#)/ ~69K(Rust)—— 同量級、略保守。

Project by project個別專案整理

1. Visual6502 / JSSim — 6502 transistor-level

The original switch-level 6502 simulator. NESdev Wiki records the era's software speeds: a 2010 JavaScript sim ran ~1 clock/s with chip animation; an unreleased Python version ~55Hz; 2011 "expert mode" (no animation) ~250Hz+; Michael Steil et al.'s C port ~1kHz (about 10 s to a C64 BASIC banner, skipping the memory test). All are switch-level — pull-down/pass/pull-up transistors, re-evaluated to stability after every input change.

最早的開關級 6502 模擬器。NESdev Wiki 記錄了那個年代的軟體速度:2010 年的 JavaScript 版有 chip animation 時 ~1 clock/s;未公開的 Python 版 ~55Hz;2011 年「expert mode」(不更新動畫)~250Hz+;Michael Steil 等人的 C port ~1kHz(約 10 秒跑出 C64 BASIC banner,跳過 memory test)。全都是 switch-level —— pull-down/pass/pull-up 電晶體,每次 input 變化後重算到穩定。

These are 2010–2011 6502-only numbers, not full NES frames — but they set the baseline: even one 6502, in software at switch level, needed heavy optimization to get from 1Hz to 1kHz.

這些是 2010–2011 的 6502-only 數字,不是完整 NES frame —— 但它建立了基準:即使只模擬一顆 6502,在開關級軟體上,從 1Hz 到 1kHz 就已經需要大量優化。

2. perfect6502 — 6502 NMOS netlist, C

A C simulator of the raw NMOS 6502 netlist extracted by Visual6502 — half-cycle exact, not a rewrite. Its README states that even highly-optimized C on a 2025 high-end CPU reaches only ~1/30 of a 1 MHz 6502.

直接模擬 Visual6502 抽出的原始 NMOS 6502 netlist 的 C 模擬器 —— half-cycle 精準,不是重寫。README 說即使是高度最佳化的 C、在 2025 高階 CPU 上,也只有 1MHz 6502 的約 1/30。

CPU-only equivalence (with caveats): treating 1/30 of 1MHz as ~33,333 6502 cycles/s (~66,667 half-cycles/s), and an NES NTSC frame as 341×262/3 ≈ 29,781 CPU cycles, gives ~0.89 s per "NES-CPU-equivalent" frame. But perfect6502 has no PPU/APU/mapper — it cannot produce an NES frame; this is only a same-cycle-count conversion.

CPU-only 等量換算(需注意):把 1MHz 的 1/30 視為 ~33,333 6502 cycles/s(~66,667 half-cycles/s),NES NTSC frame 約 341×262/3 ≈ 29,781 CPU cycle,得到約 0.89 秒/「NES-CPU-等量」frame。但 perfect6502 沒有 PPU/APU/mapper —— 它無法生成 NES frame,這只是同 cycle 數的換算。

Its value is as a half-cycle-exact golden CPU model. That a single 6502 netlist costs ~1/30 real time, versus AprVisual's full-NES netlist at tens of thousands of hc/s, shows a single-CPU netlist and a full-board netlist should never be conflated.

它的價值是 half-cycle 精準的 golden CPU model。單顆 6502 netlist 就要 ~1/30 realtime,對比 AprVisual 整台 NES netlist 的數萬 hc/s,說明單 CPU netlist 與整板 netlist 不該混為一談。

3. Visual 2C02 — NTSC PPU transistor-level

Quietust's transistor-level NTSC PPU simulator, on the same Visual6502 core. The wiki gives no fixed Hz, only practical advice (disabling tracing / animation / sprite-RAM display speeds it up a lot). It's more an interactive research tool; the quotable numbers come from the Visual NES port discussion below.

Quietust 的 transistor-level NTSC PPU 模擬器,使用同一個 Visual6502 核心。Wiki 沒給固定 Hz,只有實務建議(關掉 tracing / animation / sprite-RAM 顯示能大幅加速)。它比較偏互動研究工具;可引用的精確數字來自下面的 Visual NES 討論。

4. Visual NES — Visual 2A03 + 2C02, C++/C# (the closest comparison)

Sour's C++/C# port that combines Visual 2A03 and Visual 2C02 into a single simulation that runs NES ROMs — roughly 1/1000 real-NES speed, 10–20× faster than the JavaScript versions. The repo was archived 2022-05-13. The author's 2017 nesdev posts are the most useful detail:

Sour 的 C++/C# port,把 Visual 2A03 與 Visual 2C02 合成單一 simulation 並能跑 NES ROM —— 約 1/1000 實機速度,比 JavaScript 版快 10–20×。Repo 已於 2022-05-13 archived。作者 2017 年的 nesdev 貼文是最有用的細節:

Why this matters for AprVisual: same base (Visual 2A03 + 2C02), same hot spot (recursive connected-component group search), and the author independently found that data shrink + cache + PGO beat fancier data structures and threads — exactly AprVisual's experience. Two independent projects converging on the same conclusion is a strong signal.

對 AprVisual 為何重要:同樣的基礎(Visual 2A03 + 2C02)、同樣的熱點(遞迴 connected-component group 搜尋),而且作者獨立發現縮資料 + cache + PGO 勝過花俏資料結構與多執行緒 —— 正是 AprVisual 的經驗。兩個獨立專案得到同樣結論,是很強的訊號。

5. MetalNES — full NES-001 board, transistor-level (AprVisual's reference)

A transistor-level NES-001 simulation (macOS, no MMU support, board support chips + composite/audio ladders, "needs lots of optimization"). AprVisual's S1 engine is an independent reimplementation of its wire / group-resolution core. Performance is from press and user reports, not a README benchmark: PCGamesN and HotHardware (2022) describe "minutes per frame"; a Hacker News user reported ~2 frames in ~3 minutes on an M1 Max; one video appeared to show ~9000 cycles/s (likely lower on average).

transistor-level NES-001 模擬(macOS、無 MMU、含主機板支援晶片 + composite/audio ladder,「需要大量優化」)。AprVisual 的 S1 引擎是它 wire / group-resolution 核心的獨立重寫。效能來自媒體與使用者回報,不是 README benchmark:PCGamesN 與 HotHardware(2022)形容「分鐘/frame」;一位 Hacker News 使用者在 M1 Max 上回報約 3 分鐘 2 frame;某段影片看似 ~9000 cycles/s(平均可能更低)。

Treat it as the ~1–2 min/frame tier — the same order as Visual NES's tens-of-seconds, both confirming that full NES transistor-level software simulation is far below real time in every public case.

大致是 ~1–2 分鐘/frame 等級 —— 與 Visual NES 的數十秒同量級,都印證了完整 NES 電晶體級軟體模擬在所有公開案例中都遠低於 realtime。

6. AprVisual.S1 / rust-s1 — this project

A C# (AprVisual.S1) and Rust (rust-s1) switch-level NES core using pure node/transistor BFS — no IR, no codegen. On this machine (Ryzen 7 3700X), benchmarking 300,000 half-cycles of full_palette: Rust 71,877 hc/s → 9.94 s/frame; C# 67,284 hc/s → 10.62 s/frame, both producing the identical checksum 0x794A43A8DF169ADA. That's ~600× short of NES NTSC real time (42,954,552 hc/s).

C#(AprVisual.S1)與 Rust(rust-s1)的開關級 NES 核心,純 node/transistor BFS —— 無 IR、無 codegen。在本機(Ryzen 7 3700X)測試 full_palette 300,000 個 half-cycle:Rust 71,877 hc/s → 9.94 秒/frame;C# 67,284 hc/s → 10.62 秒/frame,兩者 checksum 同為 0x794A43A8DF169ADA。離 NES NTSC 實機(42,954,552 hc/s)約 600 倍。

A throughput estimate (714,736 hc/frame) lines up with the measured frame-dump times. It does not mean every game already renders a correct, usable frame — correctness, timing and ROM coverage still need their own rigorous verification — but for the workloads tested, AprVisual already sits clearly above the public early figures for Visual NES and MetalNES (with the caveats that those are older hardware, often with tracing, and use different "frame" definitions).

以 714,736 hc/frame 的 throughput 推算,與實測 frame-dump 時間吻合。這代表每個遊戲都已正確生成一張可用畫面 —— correctness、timing、ROM 覆蓋仍需各自嚴格驗證 —— 但就測試過的工作負載而言,AprVisual 已明顯高於 Visual NES 與 MetalNES 的公開早期數字(但須注意那些是較舊硬體、常帶 tracing、且 frame 定義不同)。

Performance tiers效能等級粗分

Tier等級Representative代表Approx speed大概速度
early interactive JS互動視覺化 JS 初期Visual6502 animation~1 clock/s
improved JS / Python / old C改良 JS / Python / 舊 CVisual6502 expert / Python / C port55Hz – 1kHz (CPU-only)
early full-NES software transistor sim早期完整 NES 軟體電晶體模擬Visual NES / MetalNES~30 s – 2 min/frame
optimized 6502-only netlist最佳化 6502-only netlistperfect6502~1/30 realtime 6502
current AprVisual S1C# / Rust pure BFS~10 s/frame
FPGA netlist-derivedFPGA netlist 衍生FPGA-netlist-tools1MHz realtime 6502

How it relates to AprVisual's optimization work與 AprVisual 優化方向的關聯

Conservative phrasing we recommend建議的保守說法

Avoid: claiming "all netlist NES emulators can only do minutes/frame" (AprVisual and Visual NES beat 1 min); quoting "MetalNES official benchmark = 2 frames/3 min" (that's user/press, not a README benchmark); saying "perfect6502 can render an NES frame in 0.9 s" (it's CPU-only); or equating Visual NES's Hz with AprVisual's hc/s (different step definitions).避免:說「所有 netlist NES 模擬器都只能分鐘/frame」(AprVisual 與 Visual NES 已低於 1 分鐘);引用「MetalNES 官方 benchmark = 2 frames/3 min」(那是使用者/媒體,不是 README benchmark);說「perfect6502 可 0.9 秒生成 NES frame」(它是 CPU-only);或把 Visual NES 的 Hz 直接等同 AprVisual 的 hc/s(step 定義不同)。

Sources來源