A note on units單位注意
Different projects' "Hz" don't mean the same thing — some count 6502 clocks or half-cycles, others count chip-simulation steps, half-steps, trace lines, or internal master half-cycles. The safest comparison keeps each source's original unit and only derives a frame time when the assumption is explicit.
不同專案的「Hz」意義不同 —— 有的算 6502 clock 或 half-cycle,有的算 chip simulation step、half-step、trace line,或內部 master half-cycle。最安全的比較是保留各來源原本的口徑,只有在假設明確時才推算 frame time。
For a frame-size baseline: an NTSC PPU frame is 262 scanlines × 341 PPU cycles, and Visual 2C02-style traces use 8 half-cycle lines per PPU tick, giving:
frame 規模基準:NTSC PPU 每 frame 262 scanlines × 341 PPU cycle,而 Visual 2C02 風格的 trace 每個 PPU tick 有 8 個 half-cycle 行,因此:
341 × 262 × 8 = 714,736 master half-cycles / frame
This is also the fallback frame size in AprVisual's RunFrame().
這也是 AprVisual RunFrame() 的 fallback frame 大小。
Summary比較摘要
| Project專案 | Scope範圍 | Public speed claim公開效能說法 | ~ frame time約 frame 時間 |
|---|---|---|---|
| Visual6502 (JS) | 6502 transistor-level6502 電晶體級 | ~1 clock/s animated; ~250Hz+ expert mode~1 clock/s(有動畫);expert ~250Hz+ | n/a (CPU only) |
| Visual6502 Python / C port | 6502 transistor-level6502 電晶體級 | Python ~55Hz; C port ~1kHzPython ~55Hz;C port ~1kHz | n/a (CPU only) |
| FPGA-netlist-tools / Verilator | 6502 netlist-derived HDL6502 netlist 衍生 HDL | FPGA 1MHz+; Verilator ~4kHzFPGA 1MHz+;Verilator ~4kHz | n/a (HW/RTL) |
| perfect6502 | 6502 NMOS netlist (C)6502 NMOS netlist(C) | ~1/30 of 1MHz 6502 on a 2025 CPU2025 CPU 上約 1MHz 6502 的 1/30 | n/a (CPU only) |
| Visual NES | Visual 2A03 + 2C02 (C++/C#)Visual 2A03 + 2C02(C++/C#) | ~1/1000 real NES; dual-chip ~5000Hz~1/1000 實機;雙晶片 ~5000Hz | ~30–60 s |
| MetalNES | full NES-001 board, transistor-level完整 NES-001 主機板,電晶體級 | user/press reports, minutes/frame使用者/媒體轉述,分鐘/frame | ~1–2 min |
| AprVisual.S1 (C#) | NES switch-level, pure BFSNES 開關級,純 BFS | 67.3K hc/s (this machine, 300k hc) | 10.62 s |
| AprVisual rust-s1 | NES switch-level, pure BFSNES 開關級,純 BFS | 71.9K hc/s (this machine, 300k hc) | 9.94 s |
AprVisual figures are this machine's actual run (Ryzen 7 3700X, 300k half-cycles of full_palette). A stricter 200k interleaved-paired clean bench gives ~64K (C#) / ~69K (Rust) — same order, slightly more conservative.AprVisual 數字為本機實測(Ryzen 7 3700X,full_palette 300k half-cycle)。更嚴格的 200k 交錯配對 clean bench 為 ~64K(C#)/ ~69K(Rust)—— 同量級、略保守。
Project by project個別專案整理
1. Visual6502 / JSSim — 6502 transistor-level
The original switch-level 6502 simulator. NESdev Wiki records the era's software speeds: a 2010 JavaScript sim ran ~1 clock/s with chip animation; an unreleased Python version ~55Hz; 2011 "expert mode" (no animation) ~250Hz+; Michael Steil et al.'s C port ~1kHz (about 10 s to a C64 BASIC banner, skipping the memory test). All are switch-level — pull-down/pass/pull-up transistors, re-evaluated to stability after every input change.
最早的開關級 6502 模擬器。NESdev Wiki 記錄了那個年代的軟體速度:2010 年的 JavaScript 版有 chip animation 時 ~1 clock/s;未公開的 Python 版 ~55Hz;2011 年「expert mode」(不更新動畫)~250Hz+;Michael Steil 等人的 C port ~1kHz(約 10 秒跑出 C64 BASIC banner,跳過 memory test)。全都是 switch-level —— pull-down/pass/pull-up 電晶體,每次 input 變化後重算到穩定。
These are 2010–2011 6502-only numbers, not full NES frames — but they set the baseline: even one 6502, in software at switch level, needed heavy optimization to get from 1Hz to 1kHz.
這些是 2010–2011 的 6502-only 數字,不是完整 NES frame —— 但它建立了基準:即使只模擬一顆 6502,在開關級軟體上,從 1Hz 到 1kHz 就已經需要大量優化。
2. perfect6502 — 6502 NMOS netlist, C
A C simulator of the raw NMOS 6502 netlist extracted by Visual6502 — half-cycle exact, not a rewrite. Its README states that even highly-optimized C on a 2025 high-end CPU reaches only ~1/30 of a 1 MHz 6502.
直接模擬 Visual6502 抽出的原始 NMOS 6502 netlist 的 C 模擬器 —— half-cycle 精準,不是重寫。README 說即使是高度最佳化的 C、在 2025 高階 CPU 上,也只有 1MHz 6502 的約 1/30。
CPU-only equivalence (with caveats): treating 1/30 of 1MHz as ~33,333 6502 cycles/s (~66,667 half-cycles/s), and an NES NTSC frame as 341×262/3 ≈ 29,781 CPU cycles, gives ~0.89 s per "NES-CPU-equivalent" frame. But perfect6502 has no PPU/APU/mapper — it cannot produce an NES frame; this is only a same-cycle-count conversion.
CPU-only 等量換算(需注意):把 1MHz 的 1/30 視為 ~33,333 6502 cycles/s(~66,667 half-cycles/s),NES NTSC frame 約 341×262/3 ≈ 29,781 CPU cycle,得到約 0.89 秒/「NES-CPU-等量」frame。但 perfect6502 沒有 PPU/APU/mapper —— 它無法生成 NES frame,這只是同 cycle 數的換算。
Its value is as a half-cycle-exact golden CPU model. That a single 6502 netlist costs ~1/30 real time, versus AprVisual's full-NES netlist at tens of thousands of hc/s, shows a single-CPU netlist and a full-board netlist should never be conflated.
它的價值是 half-cycle 精準的 golden CPU model。單顆 6502 netlist 就要 ~1/30 realtime,對比 AprVisual 整台 NES netlist 的數萬 hc/s,說明單 CPU netlist 與整板 netlist 不該混為一談。
3. Visual 2C02 — NTSC PPU transistor-level
Quietust's transistor-level NTSC PPU simulator, on the same Visual6502 core. The wiki gives no fixed Hz, only practical advice (disabling tracing / animation / sprite-RAM display speeds it up a lot). It's more an interactive research tool; the quotable numbers come from the Visual NES port discussion below.
Quietust 的 transistor-level NTSC PPU 模擬器,使用同一個 Visual6502 核心。Wiki 沒給固定 Hz,只有實務建議(關掉 tracing / animation / sprite-RAM 顯示能大幅加速)。它比較偏互動研究工具;可引用的精確數字來自下面的 Visual NES 討論。
4. Visual NES — Visual 2A03 + 2C02, C++/C# (the closest comparison)
Sour's C++/C# port that combines Visual 2A03 and Visual 2C02 into a single simulation that runs NES ROMs — roughly 1/1000 real-NES speed, 10–20× faster than the JavaScript versions. The repo was archived 2022-05-13. The author's 2017 nesdev posts are the most useful detail:
Sour 的 C++/C# port,把 Visual 2A03 與 Visual 2C02 合成單一 simulation 並能跑 NES ROM —— 約 1/1000 實機速度,比 JavaScript 版快 10–20×。Repo 已於 2022-05-13 archived。作者 2017 年的 nesdev 貼文是最有用的細節:
- Dual-chip in one simulation: ~5000Hz; should manage ~1 frame/min (≈1 hour per 1 s of NES video).
- 雙晶片同一 simulation:~5000Hz;應能 ~1 frame/min(約 1 小時生成 1 秒 NES 畫面)。
- ~50–60% of time in the recursive group function.
- 約 50–60% 時間花在 recursive group function。
- vector → hashset was slower (groups are usually tiny); a bool presence array didn't help (1–2% slower).
- vector 改 hashset 反而更慢(group 通常很小);bool presence array 沒幫助(慢 1–2%)。
- Multi-threading was hard — heavy lock contention.
- 分多執行緒很難 —— 大量 lock contention。
- int → short, removing struct fields, and PGO took it ~5000Hz → ~7500Hz (PGO alone ~+15%). A newer CPU should exceed 10kHz.
- int 改 short、移除 struct 欄位、加 PGO,從 ~5000Hz 提到 ~7500Hz(PGO 單獨 ~+15%)。較新 CPU 應能超過 10kHz。
Why this matters for AprVisual: same base (Visual 2A03 + 2C02), same hot spot (recursive connected-component group search), and the author independently found that data shrink + cache + PGO beat fancier data structures and threads — exactly AprVisual's experience. Two independent projects converging on the same conclusion is a strong signal.
對 AprVisual 為何重要:同樣的基礎(Visual 2A03 + 2C02)、同樣的熱點(遞迴 connected-component group 搜尋),而且作者獨立發現縮資料 + cache + PGO 勝過花俏資料結構與多執行緒 —— 正是 AprVisual 的經驗。兩個獨立專案得到同樣結論,是很強的訊號。
5. MetalNES — full NES-001 board, transistor-level (AprVisual's reference)
A transistor-level NES-001 simulation (macOS, no MMU support, board support chips + composite/audio ladders, "needs lots of optimization"). AprVisual's S1 engine is an independent reimplementation of its wire / group-resolution core. Performance is from press and user reports, not a README benchmark: PCGamesN and HotHardware (2022) describe "minutes per frame"; a Hacker News user reported ~2 frames in ~3 minutes on an M1 Max; one video appeared to show ~9000 cycles/s (likely lower on average).
transistor-level NES-001 模擬(macOS、無 MMU、含主機板支援晶片 + composite/audio ladder,「需要大量優化」)。AprVisual 的 S1 引擎是它 wire / group-resolution 核心的獨立重寫。效能來自媒體與使用者回報,不是 README benchmark:PCGamesN 與 HotHardware(2022)形容「分鐘/frame」;一位 Hacker News 使用者在 M1 Max 上回報約 3 分鐘 2 frame;某段影片看似 ~9000 cycles/s(平均可能更低)。
Treat it as the ~1–2 min/frame tier — the same order as Visual NES's tens-of-seconds, both confirming that full NES transistor-level software simulation is far below real time in every public case.
大致是 ~1–2 分鐘/frame 等級 —— 與 Visual NES 的數十秒同量級,都印證了完整 NES 電晶體級軟體模擬在所有公開案例中都遠低於 realtime。
6. AprVisual.S1 / rust-s1 — this project
A C# (AprVisual.S1) and Rust (rust-s1) switch-level NES core using pure node/transistor BFS — no IR, no codegen. On this machine (Ryzen 7 3700X), benchmarking 300,000 half-cycles of full_palette: Rust 71,877 hc/s → 9.94 s/frame; C# 67,284 hc/s → 10.62 s/frame, both producing the identical checksum 0x794A43A8DF169ADA. That's ~600× short of NES NTSC real time (42,954,552 hc/s).
C#(AprVisual.S1)與 Rust(rust-s1)的開關級 NES 核心,純 node/transistor BFS —— 無 IR、無 codegen。在本機(Ryzen 7 3700X)測試 full_palette 300,000 個 half-cycle:Rust 71,877 hc/s → 9.94 秒/frame;C# 67,284 hc/s → 10.62 秒/frame,兩者 checksum 同為 0x794A43A8DF169ADA。離 NES NTSC 實機(42,954,552 hc/s)約 600 倍。
A throughput estimate (714,736 hc/frame) lines up with the measured frame-dump times. It does not mean every game already renders a correct, usable frame — correctness, timing and ROM coverage still need their own rigorous verification — but for the workloads tested, AprVisual already sits clearly above the public early figures for Visual NES and MetalNES (with the caveats that those are older hardware, often with tracing, and use different "frame" definitions).
以 714,736 hc/frame 的 throughput 推算,與實測 frame-dump 時間吻合。這不代表每個遊戲都已正確生成一張可用畫面 —— correctness、timing、ROM 覆蓋仍需各自嚴格驗證 —— 但就測試過的工作負載而言,AprVisual 已明顯高於 Visual NES 與 MetalNES 的公開早期數字(但須注意那些是較舊硬體、常帶 tracing、且 frame 定義不同)。
Performance tiers效能等級粗分
| Tier等級 | Representative代表 | Approx speed大概速度 |
|---|---|---|
| early interactive JS互動視覺化 JS 初期 | Visual6502 animation | ~1 clock/s |
| improved JS / Python / old C改良 JS / Python / 舊 C | Visual6502 expert / Python / C port | 55Hz – 1kHz (CPU-only) |
| early full-NES software transistor sim早期完整 NES 軟體電晶體模擬 | Visual NES / MetalNES | ~30 s – 2 min/frame |
| optimized 6502-only netlist最佳化 6502-only netlist | perfect6502 | ~1/30 realtime 6502 |
| current AprVisual S1 | C# / Rust pure BFS | ~10 s/frame |
| FPGA netlist-derivedFPGA netlist 衍生 | FPGA-netlist-tools | 1MHz realtime 6502 |
How it relates to AprVisual's optimization work與 AprVisual 優化方向的關聯
- Visual NES's author put ~50–60% of time in the recursive group function — exactly AprVisual's BFS group-walk hot spot.
- Visual NES 作者把 ~50–60% 時間放在 recursive group function —— 正是 AprVisual 的 BFS group walk 熱點。
- Hashset and bool-presence arrays were net-negative for Visual NES; AprVisual independently found the same — "higher-level data structures" don't automatically win.
- Visual NES 上 hashset 與 bool presence array 都是淨負;AprVisual 獨立得到相同結果 ——「更高階資料結構」不會自動贏。
- Visual NES's real gains came from PGO and int→short; AprVisual's came from byte/ushort + SoA hot/cold splitting — the same "shrink the hot data" direction.
- Visual NES 的實際進步來自 PGO 與 int→short;AprVisual 來自 byte/ushort + SoA hot/cold 拆分 —— 同樣是「縮小熱資料」的方向。
- Both show full NES transistor-level software sim is a tens-of-seconds-to-minutes/frame business, not an FPS one; perfect6502 shows even a single optimized 6502 netlist is only ~1/30 real time. Transistor-level exactness is simply expensive.
- 兩者都顯示完整 NES 電晶體級軟體模擬是「數十秒到數分鐘/frame」而非 FPS 等級;perfect6502 顯示即使單顆最佳化 6502 netlist 也只有 ~1/30 realtime。電晶體級的精確就是這麼貴。
Conservative phrasing we recommend建議的保守說法
- "In public cases, full NES transistor-level software simulation mostly lands at tens of seconds to minutes per frame."
- 「公開案例中,完整 NES 電晶體級軟體模擬多落在每 frame 數十秒到數分鐘。」
- "Visual NES's author reported ~5000Hz / ~1 frame/min for the combined 2A03+2C02 in 2017, ~7500Hz after data shrink + PGO."
- 「Visual NES 作者 2017 年回報整合 2A03+2C02 約 5000Hz、約 1 frame/min;縮資料 + PGO 後約 7500Hz。」
- "AprVisual measures ~67K (C#) / ~72K (Rust) hc/s on a Ryzen 7 3700X ≈ ~10 s/frame, pending correctness/timing/frame-output verification."
- 「AprVisual 在 Ryzen 7 3700X 上量到 ~67K(C#)/ ~72K(Rust)hc/s ≈ ~10 秒/frame,仍待 correctness/timing/frame-output 驗證。」
Avoid: claiming "all netlist NES emulators can only do minutes/frame" (AprVisual and Visual NES beat 1 min); quoting "MetalNES official benchmark = 2 frames/3 min" (that's user/press, not a README benchmark); saying "perfect6502 can render an NES frame in 0.9 s" (it's CPU-only); or equating Visual NES's Hz with AprVisual's hc/s (different step definitions).避免:說「所有 netlist NES 模擬器都只能分鐘/frame」(AprVisual 與 Visual NES 已低於 1 分鐘);引用「MetalNES 官方 benchmark = 2 frames/3 min」(那是使用者/媒體,不是 README benchmark);說「perfect6502 可 0.9 秒生成 NES frame」(它是 CPU-only);或把 Visual NES 的 Hz 直接等同 AprVisual 的 hc/s(step 定義不同)。