From ca43be4146ddad59c9a8823468da09d730d57bed Mon Sep 17 00:00:00 2001 From: MerryMage Date: Mon, 5 Feb 2018 22:30:39 +0000 Subject: [PATCH] docs: Update documentation (2018-02-05) --- docs/Design.md | 133 ++++++++++++++++++++------------------ docs/RegisterAllocator.md | 75 ++++++++++----------- 2 files changed, 102 insertions(+), 106 deletions(-) diff --git a/docs/Design.md b/docs/Design.md index 6e353391..aa81b83f 100644 --- a/docs/Design.md +++ b/docs/Design.md @@ -4,15 +4,18 @@ Dynarmic is a dynamic recompiler for the ARMv6K architecture. Future plans for d support for other versions of the ARM architecture, having a interpreter mode, and adding support for other architectures. -Users of this library interact with it primarily through [`include/dynarmic/dynarmic.h`](../include/dynarmic/dynarmic.h). -Users specify how dynarmic's CPU core interacts with the rest of their systems by setting members of the -[`Dynarmic::UserCallbacks`](../include/dynarmic/callbacks.h) structure as appropriate. Users setup the CPU state using member functions of -`Dynarmic::Jit`, then call `Dynarmic::Jit::Execute` to start CPU execution. The callbacks defined on `UserCallbacks` -may be called from dynamically generated code, so users of the library should not depend on the stack being in a -walkable state for unwinding. +Users of this library interact with it primarily through the interface provided in +[`include/dynarmic`](../include/dynarmic). Users specify how dynarmic's CPU core interacts with +the rest of their system providing an implementation of the relevant `UserCallbacks` interface. +Users setup the CPU state using member functions of `Jit`, then call `Jit::Execute` to start CPU +execution. The callbacks defined on `UserCallbacks` may be called from dynamically generated code, +so users of the library should not depend on the stack being in a walkable state for unwinding. -Dynarmic reads instructions from memory by calling `UserCallbacks::MemoryRead32`. These instructions then pass -through several stages: +* A32: [`Jit`](../include/dynarmic/A32/a32.h), [`UserCallbacks`](../include/dynarmic/A32/config.h) +* A64: [`Jit`](../include/dynarmic/A64/a64.h), [`UserCallbacks`](../include/dynarmic/A64/config.h) + +Dynarmic reads instructions from memory by calling `UserCallbacks::MemoryReadCode`. These +instructions then pass through several stages: 1. Decoding (Identifying what type of instruction it is and breaking it up into fields) 2. Translation (Generation of high-level IR from the instruction) @@ -20,39 +23,39 @@ through several stages: 4. Emission (Generation of host-executable code into memory) 5. Execution (Host CPU jumps to the start of emitted code and runs it) -Using the x64 backend as an example: +Using the A32 frontend with the x64 backend as an example: * Decoding is done by [double dispatch](https://en.wikipedia.org/wiki/Visitor_pattern) in -[`src/frontend/decoder/{arm.h,thumb16.h,thumb32.h}`](../src/frontend/decoder/). -* Translation is done by the visitors in `src/frontend/translate/translate_{arm,thumb}.cpp`. -The function [`IR::Block Translate(LocationDescriptor descriptor, MemoryRead32FuncType memory_read_32)`](../src/frontend/translate/translate.h) takes a -memory location, some CPU state, and memory reader callback and returns a basic block of IR. + [`src/frontend/A32/decoder/{arm.h,thumb16.h,thumb32.h}`](../src/frontend/A32/decoder/). +* Translation is done by the visitors in `src/frontend/A32/translate/translate_{arm,thumb}.cpp`. + The function [`Translate`](../src/frontend/A32/translate/translate.h) takes a starting memory location, + some CPU state, and memory reader callback and returns a basic block of IR. * The IR can be found under [`src/frontend/ir/`](../src/frontend/ir/). * Optimizations can be found under [`src/ir_opt/`](../src/ir_opt/). * Emission is done by `EmitX64` which can be found in `src/backend_x64/emit_x64.{h,cpp}`. * Execution is performed by calling `BlockOfCode::RunCode` in `src/backend_x64/block_of_code.{h,cpp}`. - + ## Decoder -The decoder is a double dispatch decoder. Each instruction is represented by a line in the relevant instruction table. -Here is an example line from `g_arm_instruction_table`: +The decoder is a double dispatch decoder. Each instruction is represented by a line in the relevant +instruction table. Here is an example line from [`arm.h`](../src/frontend/A32/decoder/arm.h): INST(&V::arm_ADC_imm, "ADC (imm)", "cccc0010101Snnnnddddrrrrvvvvvvvv") - + (Details on this instruction can be found in section A8.8.1 of the ARMv7-A manual. This is encoding A1.) - -The first argument to INST is the member function to call on the visitor. The second argument is a user-readable + +The first argument to INST is the member function to call on the visitor. The second argument is a user-readable instruction name. The third argument is a bit-representation of the instruction. ### Instruction Bit-Representation -Each character in the bitstring represents a bit. A `0` means that that bitposition **must** contain a zero. A `1` +Each character in the bitstring represents a bit. A `0` means that that bitposition **must** contain a zero. A `1` means that that bitposition **must** contain a one. A `-` means we don't care about the value at that bitposition. -A string of the same character represents a field. In the above example, the first four bits `cccc` represent the +A string of the same character represents a field. In the above example, the first four bits `cccc` represent the four-bit-long cond field of the ARM Add with Carry (immediate) instruction. -The visitor would have to have a function named `arm_ADC_imm` with 6 arguments, one for each field (`cccc`, `S`, -`nnnn`, `dddd`, `rrrr`, `vvvvvvvv`). If there is a mismatch of field number with argument number, a compile-time +The visitor would have to have a function named `arm_ADC_imm` with 6 arguments, one for each field (`cccc`, `S`, +`nnnn`, `dddd`, `rrrr`, `vvvvvvvv`). If there is a mismatch of field number with argument number, a compile-time error results. ## Translator @@ -62,9 +65,9 @@ help of the [`IREmitter` class](../src/frontend/ir/ir_emitter.h). An example of bool ArmTranslatorVisitor::arm_ADC_imm(Cond cond, bool S, Reg n, Reg d, int rotate, Imm8 imm8) { u32 imm32 = ArmExpandImm(rotate, imm8); - + // ADC{S} , , # - + if (ConditionPassed(cond)) { auto result = ir.AddWithCarry(ir.GetRegister(n), ir.Imm32(imm32), ir.GetCFlag()); @@ -83,22 +86,22 @@ help of the [`IREmitter` class](../src/frontend/ir/ir_emitter.h). An example of ir.SetVFlag(result.overflow); } } - + return true; } where `ir` is an instance of the `IRBuilder` class. Each member function of the `IRBuilder` class constructs an IR microinstruction. - -## Intermediate Representation - -Dynarmic uses an ordered SSA intermediate representation. It is very vaguely similar to those found in other -similar projects like redream, nucleus, and xenia. Major differences are: (1) the abundance of context microinstructions -whereas those projects generally only have two (`load_context`/`store_context`), (2) the explicit handling of -flags as their own values, and (3) very different basic block edge handling. -The intention of the context microinstructions and explicit flag handling is to allow for future optimizations. The -differences in the way edges are handled are a quirk of the current implementation and dynarmic will likely add a +## Intermediate Representation + +Dynarmic uses an ordered SSA intermediate representation. It is very vaguely similar to those found in other +similar projects like redream, nucleus, and xenia. Major differences are: (1) the abundance of context +microinstructions whereas those projects generally only have two (`load_context`/`store_context`), (2) the +explicit handling of flags as their own values, and (3) very different basic block edge handling. + +The intention of the context microinstructions and explicit flag handling is to allow for future optimizations. The +differences in the way edges are handled are a quirk of the current implementation and dynarmic will likely add a function analyser in the medium-term future. Dynarmic's intermediate representation is typed. Each microinstruction may take zero or more arguments and may @@ -106,6 +109,8 @@ return zero or more arguments. A subset of the microinstructions available is do A complete list of microinstructions can be found in [src/frontend/ir/opcodes.inc](../src/frontend/ir/opcodes.inc). +The below lists some commonly used microinstructions. + ### Immediate: Imm{U1,U8,U32,RegRef} ImmU1(u1 value) @@ -120,13 +125,13 @@ by the IR. GetRegister( reg) SetRegister( reg, value) - + Gets and sets `JitState::Reg[reg]`. Note that `SetRegister(Arm::Reg::R15, _)` is disallowed by IRBuilder. Use `{ALU,BX}WritePC` instead. Note that sequences like `SetRegister(R4, _)` followed by `GetRegister(R4)` are optimized away. - + ### Context: {Get,Set}{N,Z,C,V}Flag GetNFlag() @@ -143,7 +148,7 @@ Gets and sets bits in `JitState::Cpsr`. Similarly to registers redundant get/set ### Context: BXWritePC BXWritePC( value) - + This should probably be the last instruction in a translation block unless you're doing something fancy. This microinstruction sets R15 and CPSR.T as appropriate. @@ -165,73 +170,73 @@ Extract a u16 and u8 respectively from a u32. MostSignificantBit( value) IsZero( value) - + These are used to implement ARM flags N and Z. These can often be optimized away by the backend into a host flag read. ### Calculation: LogicalShiftLeft - + ( result, carry_out) LogicalShiftLeft( operand, shift_amount, carry_in) - + Pseudocode: if shift_amount == 0: return (operand, carry_in) - + x = operand * (2 ** shift_amount) result = Bits<31,0>(x) carry_out = Bit<32>(x) - + return (result, carry_out) - + This follows ARM semantics. Note `shift_amount` is not masked to 5 bits (like `SHL` does on x64). ### Calculation: LogicalShiftRight ( result, carry_out) LogicalShiftLeft( operand, shift_amount, carry_in) - + Pseudocode: if shift_amount == 0: return (operand, carry_in) - + x = ZeroExtend(operand, from_size: 32, to_size: shift_amount+32) result = Bits(x) carry_out = Bit(x) - + return (result, carry_out) - + This follows ARM semantics. Note `shift_amount` is not masked to 5 bits (like `SHR` does on x64). ### Calculation: ArithmeticShiftRight ( result, carry_out) ArithmeticShiftRight( operand, shift_amount, carry_in) - + Pseudocode: if shift_amount == 0: return (operand, carry_in) - + x = SignExtend(operand, from_size: 32, to_size: shift_amount+32) result = Bits(x) carry_out = Bit(x) - + return (result, carry_out) - + This follows ARM semantics. Note `shift_amount` is not masked to 5 bits (like `SAR` does on x64). ### Calcuation: RotateRight ( result, carry_out) RotateRight( operand, shift_amount, carry_in) - + Pseudocode: if shift_amount == 0: return (operand, carry_in) - + shift_amount %= 32 result = (operand << shift_amount) | (operand >> (32 - shift_amount)) carry_out = Bit<31>(result) - + return (result, carry_out) ### Calculation: AddWithCarry @@ -243,7 +248,7 @@ a + b + carry_in ### Calculation: SubWithCarry ( result, carry_out, overflow) SubWithCarry( a, b, carry_in) - + This has equivalent semantics to `AddWithCarry(a, Not(b), carry_in)`. a - b - !carry_in @@ -251,17 +256,17 @@ a - b - !carry_in ### Calculation: And And( a, b) - + ### Calculation: Eor Eor( a, b) - + Exclusive OR (i.e.: XOR) ### Calculation: Or Or( a, b) - + ### Calculation: Not Not( value) @@ -282,17 +287,17 @@ Memory access. ### Terminal: Interpret SetTerm(IR::Term::Interpret{next}) - + This terminal instruction calls the interpreter, starting at `next`. The interpreter must interpret exactly one instruction. - + ### Terminal: ReturnToDispatch - SetTerm(IR::Term::ReturnToDispatch{}) - + SetTerm(IR::Term::ReturnToDispatch{}) + This terminal instruction returns control to the dispatcher. The dispatcher will use the value in R15 to determine what comes next. - + ### Terminal: LinkBlock SetTerm(IR::Term::LinkBlock{next}) diff --git a/docs/RegisterAllocator.md b/docs/RegisterAllocator.md index b182d803..fea6f19e 100644 --- a/docs/RegisterAllocator.md +++ b/docs/RegisterAllocator.md @@ -2,12 +2,14 @@ `HostLoc`s contain values. A `HostLoc` ("host value location") is either a host CPU register or a host spill location. -Values once set cannot be changed. Values can however be moved by the register allocator between `HostLoc`s. This is handled by the register allocator itself and code that uses the register allocator need not and should not move values between registers. +Values once set cannot be changed. Values can however be moved by the register allocator between `HostLoc`s. This is +handled by the register allocator itself and code that uses the register allocator need not and should not move values +between registers. The register allocator is based on three concepts: `Use`, `Def` and `Scratch`. * `Use`: The use of a value. -* `Def`: The definition of a value, this is the only time when a value is set. +* `Define`: The definition of a value, this is the only time when a value is set. * `Scratch`: Allocate a register that can be freely modified as one wishes. Note that `Use`ing a value decrements its `use_count` by one. When the `use_count` reaches zero the value is discarded and no longer exists. @@ -23,63 +25,52 @@ At runtime, allocate one of the registers in `desired_locations`. You are free t ### Pure `Use` - Xbyak::Reg64 UseGpr(IR::Value use_value, HostLocList desired_locations = any_gpr); - Xbyak::Xmm UseXmm(IR::Value use_value, HostLocList desired_locations = any_xmm); - OpArg UseOpArg(IR::Value use_value, HostLocList desired_locations); + Xbyak::Reg64 UseGpr(Argument& arg); + Xbyak::Xmm UseXmm(Argument& arg); + OpArg UseOpArg(Argument& arg); + void Use(Argument& arg, HostLoc host_loc); -At runtime, the value corresponding to `use_value` will be placed into one of the `HostLoc`s specified by `desired_locations`. The return value is the actual location. +At runtime, the value corresponding to `arg` will be placed a register. The actual register is determined by +which one of the above functions is called. `UseGpr` places it in an unused GPR, `UseXmm` places it +in an unused XMM register, `UseOpArg` might be in a register or might be a memory location, and `Use` allows +you to specify a specific register (GPR or XMM) to use. This register **must not** have it's value changed. -* `UseGpr`: The location is a GPR. -* `UseXmm`: The location is an XMM register. -* `UseOpArg`: The location may be one of the locations specified by `desired_locations`, but may also be a host memory reference. - ### `UseScratch` - Xbyak::Reg64 UseScratchGpr(IR::Value use_value, HostLocList desired_locations = any_gpr) - Xbyak::Xmm UseScratchXmm(IR::Value use_value, HostLocList desired_locations = any_xmm) + Xbyak::Reg64 UseScratchGpr(Argument& arg); + Xbyak::Xmm UseScratchXmm(Argument& arg); + void UseScratch(Argument& arg, HostLoc host_loc); -At runtime, the value corresponding to `use_value` will be placed into one of the `HostLoc`s specified by `desired_locations`. The return value is the actual location. +At runtime, the value corresponding to `arg` will be placed a register. The actual register is determined by +which one of the above functions is called. `UseScratchGpr` places it in an unused GPR, `UseScratchXmm` places it +in an unused XMM register, and `UseScratch` allows you to specify a specific register (GPR or XMM) to use. -You are free to modify the register. The register is discarded at the end of the allocation scope. +The return value is the register allocated to you. -### `Def` +You are free to modify the value in the register. The register is discarded at the end of the allocation scope. -A `Def` is the defintion of a value. This is the only time when a value may be set. +### `Define` as register - Xbyak::Xmm DefXmm(IR::Inst* def_inst, HostLocList desired_locations = any_xmm) - Xbyak::Reg64 DefGpr(IR::Inst* def_inst, HostLocList desired_locations = any_gpr) +A `Define` is the defintion of a value. This is the only time when a value may be set. -By calling `DefXmm` or `DefGpr`, you are stating that you wish to define the value for `def_inst`, and you wish to write the value to one of the `HostLoc`s specified by `desired_locations`. You must write the value to the register returned. + void DefineValue(IR::Inst* inst, const Xbyak::Reg& reg); -### `AddDef` +By calling `DefineValue`, you are stating that you wish to define the value for `inst`, and you have written the +value to the specified register `reg`. -Adding a `Def` to an existing value. +### `Define`ing as an alias of a different value - void RegisterAddDef(IR::Inst* def_inst, const IR::Value& use_inst); +Adding a `Define` to an existing value. -You are declaring that the value for `def_inst` is the same as the value for `use_inst`. No host machine instructions are emitted. + void DefineValue(IR::Inst* inst, Argument& arg); -### `UseDef` - - Xbyak::Reg64 UseDefGpr(IR::Value use_value, IR::Inst* def_inst, HostLocList desired_locations = any_gpr) - Xbyak::Xmm UseDefXmm(IR::Value use_value, IR::Inst* def_inst, HostLocList desired_locations = any_xmm) - -At runtime, the value corresponding to `use_value` will be placed into one of the `HostLoc`s specified by `desired_locations`. The return value is the actual location. You must write the value correponding to `def_inst` by the end of the allocation scope. - -### `UseDef` (OpArg variant) - - std::tuple UseDefOpArgGpr(IR::Value use_value, IR::Inst* def_inst, HostLocList desired_locations = any_gpr) - std::tuple UseDefOpArgXmm(IR::Value use_value, IR::Inst* def_inst, HostLocList desired_locations = any_xmm) - -These have the same semantics as `UseDefGpr` and `UseDefXmm` except `use_value` may not be present in the register, and may actually be in a host memory location. +You are declaring that the value for `inst` is the same as the value for `arg`. No host machine instructions are +emitted. ## When to use each? -The variety of different ways to `Use` and `Def` values are for performance reasons. - -* `UseDef`: Instead of performing a `Use` and a `Def`, `UseDef` uses one less register in the case when this `Use` is the last `Use` of a value. -* `UseScratch`: Instead of performing a `Use` and a `Scratch`, `UseScratch` uses one less register in the case when this `Use` is the last `Use` of a value. -* `AddDef`: This drastically reduces the number of registers required when it can be used. It can be used when values are truncations of other values. For example, if `u8_value` contains the truncation of `u32_value`, `AddDef(u8_value, u32_value)` is a valid definition of `u8_value`. -* OpArg variants: Save host code-cache by merging memory loads into other instructions instead of the register allocator having to emit a `mov`. \ No newline at end of file +* Prefer `Use` to `UseScratch` where possible. +* Prefer the `OpArg` variants where possible. +* Prefer to **not** use the specific `HostLoc` variants where possible.