# V6 Release With the `v6` release we added a new update mechanism called `Auto-Sync`. This is a huge step for Capstone, because it allows for easy module updates, easier addition of new architectures, easy features addition and guarantees less faulty disassembly. This release adds a huge amount of new architectures, extensions, bug fixes and quality of life improvements. ## Contributors Almost all the work was sponsored by [RizinOrg](https://rizin.re/). This release would have simply not happened without them. The developers with the biggest contributions were (alphabetically): - `TriCore`, `M68K` - @billow (Sponsored) - `LoongArch` - @jiegec and @FurryAcetylCoA - `RISC-V` - @moste00 (Sponsored) - `Alpha`, `HPPA` - @R33v0LT (Sponsored) - `AArch64`, `ARM`, `Auto-Sync`, `PPC`, `SystemZ`, modernized testing - @Rot127 (Sponsored) - `Mips`, `NanoMips` - @wargio There are also multiple smaller additions - Reviewing all PRs = @kabeor - Architecture module registration - @oleavr - Building of thin binaries for Apple - @rickmark - Python packaging and testing - @twizmwazin, @peace-maker - `RISC-V` operand access info - @wxrdnx And of course there were many more improvements done by other contributors, which add to the release just as the ones above. For a full list of all the developers, please see the release page. With all that said, we hope you enjoy the new release! ## Overview For `v6` we _updated_ the following architectures: `ARM`, `AArch64`, `Mips` (adding `NanoMips`!), `RISC-V`, `SystemZ`, `PPC`. And added support for several more: `TriCore` (already in `v5`), `Alpha`, `HPPA`, `LoongArch`. These updates are significant! While in `v5` the most up-to-date module was based on `LLVM 7`, the refactored modules are based on `LLVM 16` (`ARM`, `PPC`) and `LLVM 18` (the others)! As you can see, `Auto-Sync` solves the long existing problem that Capstone being hard to update. For [`Auto-Sync`-enabled modules](https://github.com/capstone-engine/capstone/issues/2015) this is no longer the case. The update process is no pretty much standardized and, while not yet 100% reproducible, creates consistently better maintainable and precise results. To achieve it, we refactored some LLVM backends, so they emit directly the code we use in Capstone. Additionally, we implemented many scripts, which automate a great number of manual steps during the update. Because most of the update steps are automated now, the architecture modules must fit this update mechanism. Which means they move closer to the original LLVM code. On the flip site it brings many breaking changes. You can find a list below with descriptions and justification. With all the trouble this might bring for you, please keep in mind that this will only occur once for each architecture (when it gets refactored for `Auto-Sync`). In the long term this will guarantee more stability, more correctness, more features and on top of this makes Capstone directly comparable to `llvm-obdjdump`. If you want to check the current state of this endeavor read the [main Auto-Sync issue](https://github.com/capstone-engine/capstone/issues/2015). Moreover, if you decide to update an existing architecture module (apart from already updated ones), it would be very much welcome! If you want to join the effort, please drop us a note in the issue comments, so we can assist. ## Why an Alpha? Because the changes are so vast and we still need more feedback from the community. We had many early adopters who helped enormously to find bugs and report issues up until now. But there are still features missing, modules not refactored, the test coverage below 100% in the relevant paths and `Auto-Sync` not completely done yet. With all the new features we want to have more feedback from users and eyes on the code before calling it "complete". Although, it is an Alpha, it doesn't mean it is not well tested! The testing compared to any other release has increased a lot. Both in quantity, coverage and code quality checks. The Alpha release now allows projects to pin-point their build to a specific commit and use the new features, while allowing us to add missing features still on the list for `v6` Gold. Some of them are: update and add more architectures (including x86), rework DIET build, improve Auto-Sync with reproducible file generation and quality of life features and more. So when the final `v6` release happens, the `Auto-Sync` transformation of Capstone is completely done. For `v7` we can then focus on other big features, like [SAIL](https://github.com/rems-project/sail) based disassembler modules or a new API to support VLIW architectures like Hexagon or E2K. ## New features **LLVM disassembler based modules** - The `cs_insn.illegal` flag was added. If it is set the instruction is decoded correctly but is considered illegal. This happens for instructions which use invalid operands or are in an illegal context. **Auto-Sync-enabled modules** **More code quality checks** - `clang-tidy` is now run on all files changed by a PR. - ASAN: All tests are now run with the address sanitizer enabled. This includes checking for leaks. - Many more asserts were added. They are only enabled for debug builds (with a few exceptions). Enabling the option `CAPSTONE_ASSERTION_WARNINGS` for a release build will print warnings but won't abort the program. **Instruction formats for PPC, SystemZ, LoongArch** The instruction encoding formats are added for PPC. They are accessible via `cs_ppc->format` (and the equivalently for SystemZ, LoongArch). They do follow loosely the ISA formats of instructions but not quite. Unfortunately, LLVM doesn't group the instruction formats perfectly aligned with the ISA. Nonetheless, we hope this additional information is useful to you. **LoongArch** - Architecture support was added (based on LLVM-18). **HPPA** - Architecture support was added. **Alpha** - Architecture support was added (based on LLVM-3) **AArch64** - Updated to LLVM-18 - Adding new instructions of SME, SVE2 extensions. With it the new `sme` and `pred` operands are added. - System operands are provided with way more detail in separated operand. - The `EXACTFPIMM` operand also sets the `fp` field. **PPC** - Updated to LLVM-16 - The instruction encoding formats are added for PPC. They are accessible via `cs_ppc->format`. They do follow loosely the ISA formats of instructions but not quite. Unfortunately, LLVM doesn't group the instruction formats perfectly aligned with the ISA. Nonetheless, we hope this additional information is useful to you. - Branching information in `cs_ppc->bc` is way more detailed now. - The Paired Single extension was added. **SystemZ** - Updated to LLVM-18 - Operands have now read/write access information - Memory operands have now the address mode specified - Immediate operands have a new `imm_width` field. Storing the bit width if known. - CPU features can be enabled or disabled, grouped by architecture (arch8-arch14). **Mips** - Updated to LLVM-18 - Support added for: `NanoMips`, `microMips32r3`, `microMips32r6`, `Mips16`, `Mips I ISA`, `Mips II ISA`, `Mips32 r2 ISA`, `Mips32 r3 ISA`, `Mips32 r5 ISA`, `Mips32 r6 ISA`, `Mips III ISA`, `Mips IV ISA`, `Mips V ISA`, `Mips64 r2 ISA`, `Mips64 r3 ISA`, `Mips64 r5 ISA`, `Mips64 r6 ISA`, `Octeon (cnMIPS)`, `Octeon+ (cnMIPS+)` - Support for different register naming style (`CS_OPT_SYNTAX_NO_DOLLAR`, `CS_OPT_SYNTAX_NOREGNAME`) - In `capstone.h` new MIPS ISA has been added which can be used by themselves. ``` CS_MODE_MIPS16 = CS_MODE_16, ///< Generic mips16 CS_MODE_MIPS32 = CS_MODE_32, ///< Generic mips32 CS_MODE_MIPS64 = CS_MODE_64, ///< Generic mips64 CS_MODE_MICRO = 1 << 4, ///< microMips CS_MODE_MIPS1 = 1 << 5, ///< Mips I ISA Support CS_MODE_MIPS2 = 1 << 6, ///< Mips II ISA Support CS_MODE_MIPS32R2 = 1 << 7, ///< Mips32r2 ISA Support CS_MODE_MIPS32R3 = 1 << 8, ///< Mips32r3 ISA Support CS_MODE_MIPS32R5 = 1 << 9, ///< Mips32r5 ISA Support CS_MODE_MIPS32R6 = 1 << 10, ///< Mips32r6 ISA Support CS_MODE_MIPS3 = 1 << 11, ///< MIPS III ISA Support CS_MODE_MIPS4 = 1 << 12, ///< MIPS IV ISA Support CS_MODE_MIPS5 = 1 << 13, ///< MIPS V ISA Support CS_MODE_MIPS64R2 = 1 << 14, ///< Mips64r2 ISA Support CS_MODE_MIPS64R3 = 1 << 15, ///< Mips64r3 ISA Support CS_MODE_MIPS64R5 = 1 << 16, ///< Mips64r5 ISA Support CS_MODE_MIPS64R6 = 1 << 17, ///< Mips64r6 ISA Support CS_MODE_OCTEON = 1 << 18, ///< Octeon cnMIPS Support CS_MODE_OCTEONP = 1 << 19, ///< Octeon+ cnMIPS Support CS_MODE_NANOMIPS = 1 << 20, ///< Generic nanomips CS_MODE_NMS1 = ((1 << 21) | CS_MODE_NANOMIPS), ///< nanoMips NMS1 CS_MODE_I7200 = ((1 << 22) | CS_MODE_NANOMIPS), ///< nanoMips I7200 CS_MODE_MICRO32R3 = (CS_MODE_MICRO | CS_MODE_MIPS32R3), ///< microMips32r3 CS_MODE_MICRO32R6 = (CS_MODE_MICRO | CS_MODE_MIPS32R6), ///< microMips32r6 ``` It is also possible to disable floating point support by adding `CS_MODE_MIPS_NOFLOAT`. - **`CS_MODE_MIPS_PTR64` is now required to decode 64-bit pointers**, like jumps and calls (for example: `jal $t0`). **Sparc** - Updated to LLVM-18 - V9 must be enabled explicitly now. - Added Little Endian support. Big endian mode must be enabled explicitly now. - Alias support added. It is possible to choose between real and alias details. - ASI operands are now distinct from immediates. - Memory barriers are now distinct from immediates. - Operands have now read/write access information. - The instruction format was added as detail. - Instruction groups and modes changed to LLVM defined ones. Most notably: `64bit -> HasV9`. - The condition codes are now separate between normal, fp, cp or register conditional flags. The flags can be normalized by unsetting the `SPARC_CC_..._BEGIN` bits. - The CC fields and instruction uses is encoded now consistently in `cs_sparc::cc_field`. This might lead to confusions for instructions which list the cc field explicitly in their asm text. For example the instruction `fcmpeq %fcc2, %f0, %f4` has 2 not 3 operands. Operands are the two registers `f0` and `f4` and the `cc_field` is set to `SPARC_CC_FIELD_FCC0`. **RISC-V** - Updated to LLVM-18 - Operands have now read/write access information - Previously only the basic extensions and the compressed ISA was supported, now every extension supported by LLVM-18 also available (e.g. vector, crypto, ...) - Changed register names * FP Regs: Instead of `RISCV_REG_F_32` and `RISCV_REG_F_64`, they're named `RISCV_REG_F_F` and `RISCV_REG_F_D` for n in `0..31` - Added register names * Vector registes and combinations thereof `RISCV_REG_V[_V]*`, examples * `RISCV_REG_V21` * `RISCV_REG_V9_V10` * `RISCV_REG_V3_V4_V5` * etc... up to 8-register combinations * Half-percision (16-bit) FP registers `RISCV_REG_F_H` for n in `0..31` - Changed instruction names * Instructions ending in `_AQ_RL` now end in `_AQRL` - Added instruction names: massive amount, see `include/capstone/riscv.h` - Added `dimm` and `csr` fields inside the union data of `cs_riscv_op`, with corresponding `riscv_op_type` * `dimm` is used for instructions with FP immediates * `csr` is used for instructions with CSR systrem registes - Added ISA flags to turn ISA extensions on and off * `CS_MODE_RISCV_FD = 1 << 3` * `CS_MODE_RISCV_V = 1 << 4` * `CS_MODE_RISCV_ZFINX = 1 << 5` * `CS_MODE_RISCV_ZCMP_ZCMT_ZCE = 1 << 6` * `CS_MODE_RISCV_ZICFISS = 1 << 7` * `CS_MODE_RISCV_E = 1 << 8` * `CS_MODE_RISCV_A = 1 << 9` * `CS_MODE_RISCV_COREV = 1 << 10` * `CS_MODE_RISCV_THEAD = 1 << 11` * `CS_MODE_RISCV_SIFIVE = 1 << 12` * `CS_MODE_RISCV_BITMANIP = 1 << 13` * `CS_MODE_RISCV_ZBA = 1 << 14` * `CS_MODE_RISCV_ZBB = 1 << 15` * `CS_MODE_RISCV_ZBC = 1 << 16` * `CS_MODE_RISCV_ZBKB = 1 << 17` * `CS_MODE_RISCV_ZBKC = 1 << 18` * `CS_MODE_RISCV_ZBKX = 1 << 19` * `CS_MODE_RISCV_ZBS = 1 << 20` - Added two syntax options for alias control: * `CS_OPT_SYNTAX_NO_ALIAS_TEXT`: RISC-V assigns readable aliases to special cases of more flexible instructions, for example: `ret` is a `jalr`, a more general instruction that takes an arbitrary register as jump destination and a link register. `ret` is the special case where those 2 arguments are restricted to `ra` and `x0` respectively. The default behaviour of Capstone is to print those aliases whenever applicable, but this default can be suppressed by opening capstone with `CS_OPT_SYNTAX_NO_ALIAS_TEXT`. When using `cstool`, the corresponding cmdline option is `+noalias` * `CS_OPT_SYNTAX_NO_ALIAS_TEXT_COMPRESSED`: some find it useful to only suppress aliases for compressed instructions, but leave other instruction printed as usual. This flag implements this restricted non-aliasing. For example the special compressed addition will normally be printed as its equivalent normal addition, but with this flag enabled it will be printed as `c.addi`, and non-compressed aliases won't be suppressed. When using `cstool`, the corresponding cmdline option is `+noaliascompressed` * Interaction: | Case | `+noalias` | `+noaliascompressed` | Options Set | Behavior | | ---- | ---------- | -------------------- | ----------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------- | | 1 | `false` | `false` | *(neither)* | All instruction aliases will be printed (default behavior) | | 2 | `true` | `false` | `CS_OPT_SYNTAX_NO_ALIAS_TEXT` only | All instruction aliases will NOT be printed; exact text only | | 3 | `false` | `true` | `CS_OPT_SYNTAX_NO_ALIAS_TEXT_COMPRESSED` only | Non-compressed instructions show aliases normally; compressed instructions are printed exactly with no aliases | | 4 | `true` | `true` | Both `CS_OPT_SYNTAX_NO_ALIAS_TEXT` & `CS_OPT_SYNTAX_NO_ALIAS_TEXT_COMPRESSED` | All instruction aliases will NOT be printed *(redundant/equivalent to case 2)* | Note that `+noalias` "overpowers" `noaliascompressed` in the second case: despite `+noaliascompressed` being false, meaning aliases are wanted for compressed instructions, `+noalias` being true means ALL aliases are supressed, and this takes precedence. Other than that, case 1 and case 3 work as intuitively expected, and case 4 is redundant. So a single-sentence description of this table is: if `+noalias` is given then no aliases will be printed for any instruction, but if not given then aliases will be printed for non-compressed instruction and alias printing for compressed instruction futher checks `+noaliascompressed` before proceeding. - Added `reg_access` capstone callback to return all read and written registers for the instructions, including registers used as part of memory operands. * Note that `reg_access` does NOT treat CSRs as registers, detailed reasons for why can be found in [the PR implementing the feature](https://github.com/capstone-engine/capstone/pull/2895) * Note that `reg_access` does NOT treat reading the PC's value as reading a register, detailed reasons for why can be found in [the PR implementing the feature](https://github.com/capstone-engine/capstone/pull/2895) > [!NOTE] > All `CS_MODE_RISCV_*` extensions above are disabled by default unless enabled by their option name or the corresponding command line flag in cstool. Any other extension is always enabled and can't be disabled. > [!NOTE] > RISC-V has a massive, sprawling list of extensions, but Capstone's internal implementaton choice of using a 32-bit mode field is not enough to cover all of them. For now, those extension flags above were added because their encoding space is conflicting with either each other or other extensions. More flags can be added later if bug reports come in requesting finer-grained extension control. However, the current implementation using bitfields imposes a strict upper limit and would likely be refactored for a more expansive mechanism in the future. See [this issue](https://github.com/capstone-engine/capstone/issues/2848) for more details. **Xtensa** - Architecture support was added (based on LLVM-18). - Support for `LITBASE`. Set the `LITBASE` with `cs_option(handle, CS_OPT_LITBASE, litbase_value)`. **x86-64** - Decoding of conflicting segment overrides was changed to match CPU behavior: For instructions with both an FS/GS and a ES/CS/SS/DS overrides the FS/GS override now takes priority, regardless of prefix ordering. - Decoding of instructions with multiple mandatory prefixes was fixed. (e.g., `shld` with a data size override and a redundant `F3` prefix, or `addss` with an additional `66` prefix) **BPF** - Added support for eBPF `ATOMIC` class instructions (using Linux mnemonics, not GNU ones. E.g. `acmpxchg64` instead of `axchg`) - Added support for eBPF signed `ALU` class instructions (`sdiv`, `smod`, `movs` variants. E.g. `smod r9, 0xc9d1d20b`) - Added support for eBPF `JMP32` class instructions (E.g. `jslt32 r7, -0xa46e0bd, -0x33f1`) - Updated the syntax for eBPF legacy packet instructions (similar to LLVM mnemonics, not GNU ones (E.g. `ldabsw [skb-0x8]`). `skb` is the socket buffer. - Corrected the signedness interpretation of `immidiate` and `offset` operands **M68K** - Architecture support added for `cpu32`, `M68060`. - Enhanced bitfield instructions, PC-relative addressing, and immediate value type handling. - Expanded integration tests and refactored invalid assembly edge cases. **UX** - Instruction alias (see below). - `cstool`: Architecture specific options can now be enabled with `cstool +