This little guide is my "cheat sheet" to the Xtensa architecture. There is a 662 page PDF document "Xtensa Instruction Set Architecture reference manual" that this is derived from.
The Xtensa implements SPARC like register windows on subroutine calls, but I have never seen this feature used in either the bootrom or code generated by gcc, so this can be ignored.
There are 16 tegisters named a0 through a15.
The processor can be either big or little endian. The bootrom and toolchain use the processor in little endian mode.
The PC is an independent register. There is also a dedicated 6 bit "SAR" (shift amount register).
Store instructions are just the opposite, the first register is the source and will be stored at some address generated from the operands.
Instructions like "add", "and", "xor" flow like the load. The first register is the destination and the operands are on the right.
Since subroutine calls use the a0 register, it is necessary to save it to the stack explicitly before another subroutine is called. Notice also that once a0 has been saved to the stack, it is perfectly OK to use it as a temporary register or whatever.
There is no way to cram a 32 bit address or constant into a 24 bit instruction, so 32 bit constants get dumped into memory (often grouped by the compiler in a block prior to the routine that uses them). They get loaded into a register by the "l32r" instruction, typically with a negative PC relative address, like this:
l32r a2, 400011a0 ; ( 3fffa000 )In this form (what I used to use in my disassembly) the value 400011a0 is the memory location referenced by the l32r instruction. The comment conveniently shows the value fetched from that address and placed into a2.
Note that there is no "s32r" instruction.
I later changed how I display things in my disassembly. In the following I show the value 0x3fffcba0 that is fetched from someplace and loaded into the a10 register. It is in fact stored at 0x40003364, but that location is really of no particular interest.
l32r a10, [0x3fffcba0] ; 0x40003364
Loads and stores are done using a base register and an 8 bit unsigned offset. The following code increments the value stored at 0x3fffa000.
l32r a2, 400011a0 ; ( 3fffa000 ) l32i.n a3, a2, 0 addi.n a3, a3, 1 s32i.n a3, a2, 0There are also 8 and 16 bit loads and stores. The stores are simple enough The 8 bit load is "l8ui" and loads an 8 bit unsigned value. The 16 bit load comes in two flavors, "l16ui" and "l16si", which load unsigned (zero fill) or signed (sign extended). Note that the offset is shifted by 1 or 2 for 16 and 32 bit values. In general this is transparent and handled by the compiler.
The "mov" instruction can work register to register or to move a small immediate value into a register:
mov.n a7, a2 ; a7 <-- a2 movi.n a3, 0
There are conditional flavors of the "mov" instruction
moveqz a7, a8, a9 ; a7 = a8 if a9 == 0 movnez a4, a3, a11 ; a4 = a3 if a11 != 0 movgez a2, a3, a4 ; a2 = a3 if a4 >= 0 movltz a6, a10, a11 ; a6 = a10 if a11 < 0
The "call0" instruction does a PC relative subroutine call.
The "callx0" instruction does a call to an address held in a register.
The "ret" (usually "ret.n") does a return from subroutine,
and is equivalent to "jx a0
Gcc uses registers a2 to a7 to hold subroutine arguments. If there are more than 6 arguments, the extras go on the stack.
beqz a3, 40003518 bnez a1, 400033da bgez a4, 40005ced bltz a4, 4000647d
As appropriate, the above can have an unsigned "u" flavor. Other branches compare against an immediate value.
beqi a2, 2, 40006a08 bnei a2, 6, 400073b9 bgei a4, 1, 400087dc blti a0, 1, 400094aa bgeui a4, 8, 40009944 bltui a13, 6, 4000a18d
Others test if a bit is set or clear via an immediate value. The immediate value is the bit number (0-31). 0 is the lsb.
bbci a3, 1, 4000bedc bbsi a2, 1, 4000df2cAlso a bit can be tested using a bit number held in a second register.
bbc a14, a12, 40004e0c bbs a14, a12, 40005d0cTwo registers can be compared. Here (for example) "GE" is true if the first register is GE the second.
beq a3, a6, 400000e0 bne a3, a4, 40000e74 bge a3, a6, 400000e0 blt a3, a4, 40000e74 bgeu a3, a6, 400000e0 bltu a3, a4, 40000e74Tests can be performed on the first register using a bit mask held in the second, testing if bits are set.
bany a12, a14, 40005fe2 bnone a12, a14, 40005fe2 ball a12, a14, 40005fe2 bnall a12, a14, 40005fe2
extui a0, a1, 16, 3 ; a0 = (a1>>16) & maskWhere the mask holds the number of bits specified, i.e in this case the mask is 0x7.
There are also some special case adds:
12d1ff addmi a1, a1, 0xffffff00 3195f6 l32r a3, 3fffdaac ; ( 3fffc000 ) 3032a0 addx4 a3, a2, a3 ; add to baseThe addmi instruction takes an 8 bit immediate value and shifts it left 8 bits and does sign extension. The disassembler displays the effective result. It yields a signed value in the range -32768 to 32512 at multiples of 256.
The addx4 instruction is perfect for handling lookup tables with 32 bit values. It multiplies the middle register by 4 (by shifting left 2) and adds this to the value in the last register. The result as always going to the first. There are also x2 and x8 variants of the same sort.
add a1, a2, a3 addi a1, a2, NThe "add" adds a2 and a3, placing the result in a1.
Subtract, in the following calculates a1 = a2 - a3
sub a1, a2, a3
wsr.intenable a5 rsr.ccount a2 wsr.intclear a2 xsr.ps a2 rsyncThe "rsync" instruction waits for all prior wsr instructions to finish. The "xsr" instruction does a swap with a register.
Note that the ccount register increments on every processor cycle. There is a "ccompare0" register that can be used in conjunction with it to generate interrupts.
rsil a7, 2This is often used in an idiom to block interrupts over a section of code as follows:
rsil a7, 2 ... wsr.ps a2 rsync
memw
memw is "memory wait". It is basically a pipeline sync. It waits until all loads and stores finish.
excw isync dsync esync nopThe "excw" waits until all prior instructions are either exception free or any exceptions have been taken.