Files
kaizen/external/capstone/suite/auto-sync/intro.md
T
iris 00cc9309cb Squashed 'external/ircolib/' changes from ce3cd726c..de6e324bd
de6e324bd separate emu thread
10d3daf86 Roms List improvements
95d202f37 Let's make the rom list process on a separate thread so the emulator doesnt take ages to load.
fc306967f Wow the ROM Header was just completely busted. Game list view works now
bad1691ee fuck this shit
2b59e5f46 game list in progress
d26417b83 remappable inputs in progress
ac4af8106 input
e72abc240 update readme
430139dc9 Qt6 frontend
3080d4d45 Fix this small bug too
08cd13b85 Cop0 unused functions do not actually pose a threat (as per manual). They don't do anything, so shall we.
61bb4fb44 make idle loop detection a little more specific with where the load goes
b037de4c3 SAZDFsdff
12e81e73e need to figure out why n64-systemtest loops indefinitely at some address that appears to be valid (i think it's me not invalidating the cache properly)
204f0e13b idle skipping seems to work!
cb8bb634a sdkfjlasdf
58e5c89c1 Fix compilation issue on my machine (no idea)
24fb2898e attempting more serious idle skipping
214719577 Place rsp.Step inside cached interpreter. Gains about 3 more fps
bb97dcc23 mmmmm
920b77d38 wjkhasdfjhkasdf
430ccdab4 it's a start...
4f42a673a Cached interpreter plays Mario 64. Start looking into RSP as well
c9a030787 idle skipping works!
5fbda03ce new idea
366637aba Idle skipping... maybe?
609fa2fb0 Cache instructions implemented but broken lmao. Commented out for now
e140a6d12 - Stop using inheritance for CPU, instead use composition. - Introduce KAIZEN_JIT_ENABLED optional define instead of relying on __aarch64__ and the like. - More cache work
68e613057 prep cache impl
811b4d809 fix clang format
fda755f7d idk
d5024ebbf small MI refactor in preparation of (eventually) implementing the RDRAM interface properly
694b45341 Merge commit '206dcdedf195fb320913584180edb12c7731e396' as 'external/SDL'
206dcdedf Squashed 'external/SDL/' content from commit 4d17b99d0a
4d16e1cb4 need to update sdl
848b19920 Fix compilation error
db61b5299 Merge commit 'e94a94559f28e49678fbcf72199a5258137b0fe9' as 'external/imgui'
e94a94559 Squashed 'external/imgui/' content from commit 02e9b8cac
52edb3757 need to update imgui
c1a705e86 Emulate weird JALR behaviour
4b4c32f4b Fix exception for "unusable COP1" in 4 instructions i missed accidentally (again)
df5828142 Bug putting 0s in the log everywhere
f8b580048 Make isviewer a sink to file
8241e9735 Fix exception for "unusable COP1" in 4 instructions i missed accidentally
b29715f20 small changes
d9a620bc1 make use of my new small utility library
0d1aa938e Add 'external/ircolib/' from commit 'ce3cd726c8df8388d554abf8bb55d55020eb4450'
e64eb40b3 Fuck git

git-subtree-dir: external/ircolib
git-subtree-split: de6e324bde
2026-06-15 11:56:38 +02:00

4.4 KiB

Why the Auto-Sync framework?

Capstone provides a simple API to leverage the LLVM disassemblers, without having the big footprint of LLVM itself.

It does this by using a stripped down copy of LLVM disassemblers (one for each architecture) and provides a uniform API to them.

The actual disassembly task (bytes to asm-text and decoded operands) is completely done by the LLVM code. Capstone takes the disassembled instructions, adds details to them (operand read/write info etc.) and organizes them to a uniform structure (cs_insn, cs_detail etc.). These objects are then accessible from the API.

Capstone is in C and LLVM is in C++. So to use the disassembler modules of LLVM, Capstone effectively translates LLVM source files from C++ to C, without changing the semantics. One could also call it a "disassembler port".

Capstone supports multiple architectures. So whenever LLVM has a new release and adds more instructions, Capstone needs to update its modules as well.

In the past, the update procedure was done by hand and with some Python scripts. But the task was tedious and error-prone.

To ease the complicated update procedure, Auto-Sync comes in.


How LLVM disassemblers work

Because effectively use the LLVM disassembler logic, one must understand how they operate.

Each architecture is defined in a so-called .td file, that is, a "Target Description" file. Those files are a declarative description of an architecture. They are written in a Domain-Specific Language called TableGen. They contain instructions, registers, processor features, which instructions operands read and write and more information.

These files are consumed by "TableGen Backends". They parse and process them to generate C++ code. The generated code is for example: enums, decoding algorithms (for instructions and operands) or lookup tables for register names or alias.

Additionally, LLVM has handwritten files. They use the generated code to build the actual instruction classes and handle architecture specific edge cases.

Capstone uses both of those files. The generated ones as well as the handwritten ones.

Overview of updating steps

An Auto-Sync update has multiple steps:

(1) Changes in the auto-generated C++ files are handled completely automatically, We have a LLVM fork with patched TableGen-backends, so they emit C code.

(2) Changes in LLVM's handwritten sources are handled semi-automatically. For each source file, we search C++ syntax and replace it with the equivalent C syntax. For this task we have the CppTranslator.

The end result is of course not perfectly valid C code. It is merely an intermediate file, which still has some C++ syntax in it.

Because this leftover syntax was likely already fixed in the equivalent C file currently in Capstone, we have a last step. The translated file is diffed with the corresponding old file in Capstone.

The Differ tool parses both files into an abstract syntax tree. From this AST it picks nodes with the same name and diffs them. The diff is given to the user, and they can decide which one to accept.

All choices are also recorded and automatically applied next time.

Example

Suppose there is a file ArchDisassembler.cpp in LLVM. Capstone has the C equivalent ArchDisassembler.c.

Now LLVM has a new release, and there were several additions in ArchDisassembler.cpp.

Auto-Sync will pass ArchDisassembler.cpp to the CppTranslator, which replaces most C++ syntax. The result is an intermediate file transl_ArchDisassembler.cpp.

The result is close to what we want (C code), but still contains invalid syntax. Most of this syntax errors were fixed before. They must be, because the C file ArchDisassemble.c is working fine.

So the intermediate file transl_ArchDisassebmler.cpp is compared to the old `ArchDisassemble.c. The Differ patches both files to an AST and automatically patches all nodes it can.

Effectively automate most of the boring, mechanical work involved in fixing-up transl_ArchDisassebmler.cpp. If something new came up, it asks the user for a decission.

The result is saved to ArchDisassembler.c, which is now up-to-date with the newest LLVM release.

In practice this file will still contain syntax errors. But not many, so they can easily be resolved.

(3) After (1) and (2), some changes in Capstone-only files follow. This step is manual work.