No RISC No Reward:
Return-Oriented Programming on RISC-V
July 29, 2020
RISC-V is an open-source hardware ISA based on the RISC design principles, and has been the subject of some novel ROP mitigation technique proposals due to its open-source nature. However, very little work has actually evaluated whether such an attack is feasible assuming a typical RISC-V implementation. We show that RISC-V ROP can be used to perform Turing complete calculation and arbitrary function calls by leveraging gadgets found in a version of the GNU libc library. Using techniques such as self-modifying ROP chains and algorithmic ROP chain generation, we demonstrate the power of RISC-V ROP by creating a compiler that converts code of arbitrary complexity written in a popular Turing-complete language into RISC-V ROP chains.
RISC-V is an open RISC architecture developed at Berkeley and backed by an international consortium. Like other modern architectures, RISC-V has hardware support for data execution prevention (DEP).
Return-oriented programming, or ROP, is a systematization and generalization of code-reuse attack techniques like return-into-libc [1] and the borrowed-chunks technique [2]. With ROP, short instruction sequences, each ending in a return instruction, are combined into building blocks called gadgets. An attacker who controls the stack can induce any desired functionality by sequencing gadgets, even in the presence of DEP. ROP was originally described for the x86 [3], and subsequently extended to other architectures such as SPARC [4], ARM [5], and PowerPC [6].
The RISC-V instruction set was designed for extensibility, with a substantial portion of the opcode space left unassigned [7]. Researchers have taken advantage of RISC-V’s extensibility to propose hardware extensions that make code-reuse attacks harder to mount (see, e.g., [8]–[11]).
There’s just one problem: No one seems to have shown that general, Turing-complete ROP is possible on RISC-V.1
In this short note, we close this gap by showing that ROP is, indeed, possible on RISC-V. We demonstrate how arithmetic, memory reads and writes, conditional branching, and function invocation can be performed solely through ROP gadgets found in the GNU libc library. To work around limitations in the available instruction sequences, we introduce techniques for self-modifying ROP chains that may be of independent interest. As is now traditional (see, e.g., [13]), we show that our gadget set is Turing complete by designing a compiler that accepts a Brainfuck program of arbitrary complexity and generates an equivalent RISC-V ROP chain.
We analyze glibc-2.30.9000-29.fc32.riscv64, runninng on Fedora-Developer-Rawhide-20200108.n.0 booting on the virt machine in QEMU. Since we do not leverage any features of emulated hardware outside of documented ISA, our ROP chains and gadgets should also work on equivalent hardware.
We target the RV64GC variant of the RISC-V ISA, which includes the 64-bit base integer instruction set as well as the MAFDC extensions: integer multiplication/division, atomic instructions, single- and double-precision floating point, and compressed instructions.
The C extension would be interesting for analyzing ROP potential because it allows for some degree of variable-length instructions. Rather than all instructions being 4 bytes long, the C extension allows certain commonly-used instructions to only take up 2 bytes in an effort to reduce code size.
Unlike ARM Thumb mode, the RISC-V C extension does not require a mode switch, and compressed instructions can coexist with uncompressed instructions in the same instruction stream. We will analyze the implications of this optional but very common extension on ROP gadget availability.
RISC-V makes use of a link register, like similar RISC architectures such as PowerPC, ARM, and SPARC. In RISC-V this is labeled the ra
register. The purpose of this register is to optimize calls to leaf subroutines since the return address
need not be pushed or popped on the stack. The implication of this is that in RISC ROP exploitation is mostly limited to non-leaf function epilogues, and RISC-V ROP is no exception.
For our purposes, we define a chainable ROP gadget as follows: an instruction sequence that:
Loads a value from a(sp)
into ra
where a
is some positive immediate value divisible by 8
Adds an immediate value b
to sp
where b > a
and b
is divisible by 16 (due to stack-alignment requirements)
Ends in a ret
(equivalent to jr ra
)
In addition to the above steps needed to maintain the ROP chain, a good ROP gadget will contain a few extra instructions that perform useful work. Note that these requirements are fairly restrictive, and exclude gadgets that, for example, pop and jump
to a register other than the ra
register. For our purposes, we will focus on gadgets that fulfill these requirements (and thus look like function epilogues) but examining other classes of gadgets is a promising future research direction.
As an example, below is a gadget that does no work except maintining the ROP chain. This is typically called a NOP gadget.
0x0000000000097a68 :
c.ldsp ra, 8(sp)
c.addi sp, 0x10
c.jr ra
Note that in x86, the same three actions are performed by a single 1-byte instruction (ret
). In SPARC, a similar gadget only requires a ret
and restore
to slide the register window. In ARM, a similar gadget looks
like a LDMFD
followed by a RET
.
The fact that the simplest ROP gadget in RISC-V requires more instructions than in other architectures has the following implications:
When a chainable ROP gadget is found, it is very likely that the gadget formed part of a legitimate function epilogue. ROP gadgets that consist entirely of unintended instructions would be exceedingly rare.
The three instructions may not always be contiguous; in other words, part of the work performed by the gadget may be located in between these three instructions. When this happens, the return-oriented programmer is unable to "trim" the work
by jumping into the middle of the work like on other architectures. Even in ARM, the LDMFD
and the RET
are typically contiguous. Similarly, in SPARC, ret
and restore
are almost always contiguous. The
implication of RISC-V’s departure from this norm is that RISC-V ROP chains sometimes must account for a larger number of undesirable gadget side effects.
Take for example the following POP gadget:
0x000000000006a5e8 :
c.ldsp ra, 0x28(sp)
c.ldsp s0, 0x20(sp)
c.ldsp a0, 0(sp)
c.ldsp a1, 8(sp)
c.ldsp s1, 0x18(sp)
c.ldsp s2, 0x10(sp)
c.addi16sp sp, 0x30
c.jr ra
If the Return-Oriented Programmer would like to use this gadget to pop only s2
(maybe a0 contains a runtime-calculated value she would not like to overwrite), she is unable to do so because jumping directly to the
c.ldsp s2, 0x10(sp)
instruction would skip the load into ra
, breaking the ROP chain and causing an infinite loop.
A similar issue presents itself with ARM through LDMFD
instructions that pop a large number of registers, and with SPARC through the restore
instruction. Note however, that in RISC-V, the instructions sandwiched between
c.ldsp ra, 0x28(sp)
and c.addi16sp, sp, 0x30
are often not only pop instructions and thus can cause traps, undefined behavior, and undesirable memory corruption.
The ideal way to avoid undesirable side effects is to entirely avoid using gadgets that cause them. However this is not always feasible, and sometimes more careful and deliberate treatment of side effects is needed.
Take for example the following readMEM gadget:
0x00000000000d3230 :
c.ld a0, 8(a0)
c.add a0, a5
c.ldsp a4, 0x28(sp)
c.ld a5, 0(s0)
bne a4, a5, 0x1e
c.ldsp ra, 0x38(sp)
c.ldsp s0, 0x30(sp)
c.addi16sp sp, 0x40
c.jr ra
This gadget is extremely valuable because it is a very rare readMEM gadget that reads memory pointed at by a register that is easily popped, incremented, and decremented through other gadgets, and does not perform a conditional branch depending on the read value. However, it is prone to the following side effects:
The initial value of a5
is added to the read value of a0
.
The value of a5
is read from (s0)
.
The value of a4
is popped and if the popped a4
is not equal to the read a5
then a branch occurs.
Ideally, we want the initial value of a5
to be 0 so that the first side effect is avoided entirely, and we want (s0)
to not trap and contain some constant known value so that we can make the popped a4
equal to the
value, avoiding the branch. If we maintain these preconditions prior to invoking this gadget (for example, if we found and used a pop gadget for s0
, a4
, and a5
), then we can prevent or account for these side effects.
It turns out that these specific preconditions can be achieved using two other gadgets.
In terms of crafting ROP chains, we can combine the gadgets used to fulfill preconditions and the gadget requiring the preconditions into a single logical unit to make programming large ROP chains easier.
In order to mitigate the relative sparsity of side-effect-free, clean gadgets in RISC-V ROP, we propose a trick that improves the capability of these gadgets.
For example, one issue coming from the previous readMEM gadget is the fact that the address in a0
was overwritten with the value from memory. If we would like to keep the original value for later use, we could either find a way to move the
value to some other register which is not cobbled by the readMEM gadget sequence, or we could write the value to some scratch space in memory so that a later readMEM can read it back.
We propose a technique called "self-modifying ROP" that simplifies the second approach. A similar approach was used in Sigreturn-Oriented Programming in x86, but our approach goes a bit deeper and applies to "vanilla" ROP chains instead. [14]
Rather than using a readMEM gadget to restore the value into the register, we use a POP gadget, and the POP gadget frame itself is the destination we write the original value to. The closest analogue to this technique comes from SPARC ROP, where a value
is written to a future gadget’s register window in order to set the value when the window is restore
d.
In this case, we have the option to use self-modifying ROP rather than writing to scratch space; we avoid having to allocate scratch space and make our gadget sequence more self-contained as a benefit. As we will show later, self-modifyng ROP is much more powerful and can also be used to circumvent several other limitations of gadgets.
One problem we ran into while evaluating RISC-V Turing completeness was needing to write to a runtime-calculated memory address. Since it was easiest to find arithmetic gadgets on a0
, we wanted to find a gadget that wrote to
(a0)
or a constant offset away from (a0)
. However, we were only able to find nice gadgets with minimal side effects that write to memory locations parameterized by s0
, s1
, a3
,
a4
, and a few other registers. However, we couldn’t find nice MOV gadgets from a0
to one of these gadgets with manageable side effects. In the case of s0
and s1
, this makes sense; these registers are
designated as callee-saved registers, so any function that modifies these registers must restore the original value by the time it returns.
Luckily, the fact that s0
and s1
are callee-saved registers means that POP gadgets are easily available for them. Thus, these writeMEM gadgets can be used to write to locations which are hardcoded into the ROP chain. Further,
we can use the same self-modifying ROP technique to write the value of a0
to a future s0
POP gadget, in effect "manufacturing" a MOV gadget from a0
to s0
. This allows our writeMEM gadget to write to an
arbitrary calculated address.
In order to call libc functions from the Return-Oriented Program, we found some POP gadgets and readMEM gadgets that can be used to place an arbitrary set of values into registers a0
-a5
. We also found a gadget that invokes
jalr a5
, which sets the return address and calls the function pointed to by a5
. Using these gadgets, we can call any libc function with up to four parameters, all of which can be calculated at runtime using self-modifying ROP
techniques. Since the found jalr a5
gadget does not cobble a0
, we can also retrieve the return value of a libc function.
A complication arises when the function would like to allocate stack space and write to it, corrupting the gadgets that come before the jalr a5
gadget. This is not a problem when the gadgets execute in a linear fashion, but when we have
loops and conditional branches (as we do), this corruption is an issue. We solve this issue in mostly the same way as SPARC ROP, by putting the jalr a5
gadget frame immediately after a "safe" buffer zone that the function call is free to
modify. We then invoke the jalr a5
gadget using an unconditional branch.
However, there is another complication. Our unconditional branch (part of glibc’s longjmp
function) reads the next gadget’s ra
and sp
from (a0)
and 0x68(a0)
, respectively. While this is
not usually a problem, a0
is also where the first parameter to a function is stored. Thus, if we were to directly branch to the jalr a5
gadget, we would not be able to pass in an arbitrary value as the first parameter. We can
solve this by dynamically creating a a0
POP gadget frame located before the jalr a5
gadget frame in memory, then branching to the a0
POP gadget instead. Note that we have to create the entire gadget frame
including the popped ra
, rather than simply write the popped value for a0
like in the previous instances of self-modifying ROP. This is because the a0
POP gadget frame itself is located within the "safe" zone of the
function call and thus may be overwritten.
As mentioned in 1.1, the C extension for RISC-V allows some level of variable-length instructions, meaning that unintended instructions are possible. For reasons explained in 2.1, it is very likely that the unintended instruction sequence will later resync with intended program control flow to lead to a valid function epilogue. Out of 2837 gadgets we found in the version of glibc analyzed, we found a total of 711 gadgets that start at an entry point not found in the libc disassembly.
Looking through the starting instructions of these gadgets, there are several that write to the zero
register, several that modify the ra
register before it is popped, and several that modify the temporary registers. We mostly
ignore the temporary registers in our ROP gadgets because there are very few POP gadgets for them since they are caller saved. We also found lots of branches, nops, and floating point instructions.
Some instructions that seemed useful were move, load and store instructions that used a wide variety of registers, as well as some instructions that moved the stack pointer (plus an offset) to several registers including a0
. The latter
could be used to perform self-modifying ROP in situations where the location of the stack pointer is not known, by performing arithmetic on the moved sp
value and writing to the resulting address.
Neither our example ROP gadgets nor the gadgets used in our ROP chain generator use unintended instructions, so our ROP chains and techniques should generalize to a RISC-V chip without the C extension.
We demonstrate the Turing completeness of our found RISC-V ROP gadgets by creating a tool that, given an arbitrarily complex Brainfuck program, generates an equivalent RISC-V ROP chain. With some simplification, the Brainfuck instruction set is implemented as follows. The actual implementation takes care of many more preconditions/side effects, and also performs extra arithmetic since our readMEM/writeMEM gadgets are usually offsetted.
Initialization: Set a0
to point the middle of a large buffer. This buffer will be the Brainfuck "tape". For simplicity, we will use a cell size of 64 bits rather than the standard 8 bits. We are free to make this
modification since the Turing completeness of Brainfuck does not depend on integer overflow. [15] a0
will be the only register which is
preserved across instructions other than < and >, and will represent the Brainfuck "pointer".
The > instruction: Increment a0
8 times.
The < instruction: Decrement a0
8 times.
The + instruction: Write a0
to future self-modified gadgets. Read the value of (a0)
into a0
. Increment a0
. Using self-modifying ROP, pop the original address stored in
a0
into s0
and write a0
to (s0)
. Finally, use self-modifying ROP again to restore the original value of a0
.
The - instruction: Identical to above.
The . instruction: Write a0
to a future self-modified gadget. Read the value of (a0)
into a0
. Call putchar
. Using self-modifying ROP, restore the original value of
a0
.
The , instruction: Write a0
to future self-modified gadgets. Call getchar
to replace a0
with user input. Using self-modifying ROP, pop the original address stored in a0
into
s0
and write a0
to (s0)
. Use self-modifying ROP again to restore the original value of a0
.
The [ instruction: Write a0
to a self-modified a0
pop gadget which immediately follows. (This is to restore the value after the unconditional branch in the ]
instruction.) Write a0
to future self-modified gadgets. Dependent on (a0) == 0
, perform a conditional branch either to the next instruction or to the instruction after the corresponding ]
instruction. In either case, the next gadget being executed
should restore the value of a0
through self-modifying ROP.
The ] instruction Write a0
to the self-modified gadget in the implementation of the corresponding [
instruction. Perform an unconditional branch to the corresponding [
instruction, so that it
can restore the original value of a0
.
Ending Set a0
to 0 and call exit
.
We introduce a compiler that compiles Brainfuck programs of arbitrary complexity into RISC-V ROP chains using the above mapping, written in pure JavaScript and designed to run in the browser. An accompanying C/assembly program reads the output of the
browser tool from a file and populates memory using the output, then begins ROP execution by moving sp
to a populated region and simulating a NOP
gadget.
After trying several Brainfuck programs, with the caveat that Brainfuck programs that assume an 8-bit cell size will not work, the vast majority of Brainfuck programs converted to RISC-V ROP and ran easily with no issue, including an implementation of bubble sort (link), a program that prints arbitrarily many square numbers (link), and a program that outputs an ASCII-art of the Sierpinski triangle (link).
We show that return-oriented programming on the RISCV64 architecture is fairly powerful, allowing for Turing-complete code execution on gadgets found in a version of GNU libc for Fedora Linux. We show that although difficult, it is possible to perform arbitrary calculation and to perform actions such as conditional branching, arithmetic, reading and writing memory, and calling libc functions with arguments. We introduce the technique of "self-modifying ROP chains" to solve several issues such as not having gadgets that move a value from one register to another. In order to demonstrate the power of these techniques, we created a compiler that converts Brainfuck code of arbitrary complexity into RISC-V ROP chains.
In the future, we hope that more libraries for RISC-V, including other versions of libc for other operating systems, will be analyzed for ROP potential, and perhaps a more general and automated approach to analyzing and chaining ROP gadgets will make ROP attacks easier to carry out.
We make the Brainfuck-to-RISC-V-ROP compiler available at https://garrettgu10.github.io/fuck-riscv-rop/ and we make its full source code available at https://github.com/garrettgu10/fuck-riscv-rop.