No RISC No Reward:
Return-Oriented Programming on RISC-V

Garrett Gu
UT Austin

,

Hovav Shacham
UT Austin


Abstract

RISC-V is an open-source hardware ISA based on the RISC design principles, and has been the subject of some novel ROP mitigation technique proposals due to its open-source nature. However, very little work has actually evaluated whether such an attack is feasible assuming a typical RISC-V implementation. We show that RISC-V ROP can be used to perform Turing complete calculation and arbitrary function calls by leveraging gadgets found in a version of the GNU libc library. Using techniques such as self-modifying ROP chains and algorithmic ROP chain generation, we demonstrate the power of RISC-V ROP by creating a compiler that converts code of arbitrary complexity written in a popular Turing-complete language into RISC-V ROP chains.

1 Introduction

RISC-V is an open RISC architecture developed at Berkeley and backed by an international consortium. Like other modern architectures, RISC-V has hardware support for data execution prevention (DEP).

Return-oriented programming, or ROP, is a systematization and generalization of code-reuse attack techniques like return-into-libc [1] and the borrowed-chunks technique [2]. With ROP, short instruction sequences, each ending in a return instruction, are combined into building blocks called gadgets. An attacker who controls the stack can induce any desired functionality by sequencing gadgets, even in the presence of DEP. ROP was originally described for the x86 [3], and subsequently extended to other architectures such as SPARC [4], ARM [5], and PowerPC [6].

The RISC-V instruction set was designed for extensibility, with a substantial portion of the opcode space left unassigned [7]. Researchers have taken advantage of RISC-V’s extensibility to propose hardware extensions that make code-reuse attacks harder to mount (see, e.g., [8]–[11]).

There’s just one problem: No one seems to have shown that general, Turing-complete ROP is possible on RISC-V.1

In this short note, we close this gap by showing that ROP is, indeed, possible on RISC-V. We demonstrate how arithmetic, memory reads and writes, conditional branching, and function invocation can be performed solely through ROP gadgets found in the GNU libc library. To work around limitations in the available instruction sequences, we introduce techniques for self-modifying ROP chains that may be of independent interest. As is now traditional (see, e.g., [13]), we show that our gadget set is Turing complete by designing a compiler that accepts a Brainfuck program of arbitrary complexity and generates an equivalent RISC-V ROP chain.

1.1 Experimental Platform

We analyze glibc-2.30.9000-29.fc32.riscv64, runninng on Fedora-Developer-Rawhide-20200108.n.0 booting on the virt machine in QEMU. Since we do not leverage any features of emulated hardware outside of documented ISA, our ROP chains and gadgets should also work on equivalent hardware.

We target the RV64GC variant of the RISC-V ISA, which includes the 64-bit base integer instruction set as well as the MAFDC extensions: integer multiplication/division, atomic instructions, single- and double-precision floating point, and compressed instructions.

The C extension would be interesting for analyzing ROP potential because it allows for some degree of variable-length instructions. Rather than all instructions being 4 bytes long, the C extension allows certain commonly-used instructions to only take up 2 bytes in an effort to reduce code size.

Unlike ARM Thumb mode, the RISC-V C extension does not require a mode switch, and compressed instructions can coexist with uncompressed instructions in the same instruction stream. We will analyze the implications of this optional but very common extension on ROP gadget availability.

2 Complexity of RISC-V ROP gadgets

RISC-V makes use of a link register, like similar RISC architectures such as PowerPC, ARM, and SPARC. In RISC-V this is labeled the ra register. The purpose of this register is to optimize calls to leaf subroutines since the return address need not be pushed or popped on the stack. The implication of this is that in RISC ROP exploitation is mostly limited to non-leaf function epilogues, and RISC-V ROP is no exception.

For our purposes, we define a chainable ROP gadget as follows: an instruction sequence that:

  1. Loads a value from a(sp) into ra where a is some positive immediate value divisible by 8

  2. Adds an immediate value b to sp where b > a and b is divisible by 16 (due to stack-alignment requirements)

  3. Ends in a ret (equivalent to jr ra)

In addition to the above steps needed to maintain the ROP chain, a good ROP gadget will contain a few extra instructions that perform useful work. Note that these requirements are fairly restrictive, and exclude gadgets that, for example, pop and jump to a register other than the ra register. For our purposes, we will focus on gadgets that fulfill these requirements (and thus look like function epilogues) but examining other classes of gadgets is a promising future research direction.

2.1 A minimal ROP gadget

As an example, below is a gadget that does no work except maintining the ROP chain. This is typically called a NOP gadget.

  0x0000000000097a68 :
      c.ldsp ra, 8(sp)
      c.addi sp, 0x10
      c.jr ra

Note that in x86, the same three actions are performed by a single 1-byte instruction (ret). In SPARC, a similar gadget only requires a ret and restore to slide the register window. In ARM, a similar gadget looks like a LDMFD followed by a RET.

The fact that the simplest ROP gadget in RISC-V requires more instructions than in other architectures has the following implications:

  1. When a chainable ROP gadget is found, it is very likely that the gadget formed part of a legitimate function epilogue. ROP gadgets that consist entirely of unintended instructions would be exceedingly rare.

  2. The three instructions may not always be contiguous; in other words, part of the work performed by the gadget may be located in between these three instructions. When this happens, the return-oriented programmer is unable to "trim" the work by jumping into the middle of the work like on other architectures. Even in ARM, the LDMFD and the RET are typically contiguous. Similarly, in SPARC, ret and restore are almost always contiguous. The implication of RISC-V’s departure from this norm is that RISC-V ROP chains sometimes must account for a larger number of undesirable gadget side effects.

Take for example the following POP gadget:

  0x000000000006a5e8 : 
      c.ldsp ra, 0x28(sp)
      c.ldsp s0, 0x20(sp)
      c.ldsp a0, 0(sp)
      c.ldsp a1, 8(sp)
      c.ldsp s1, 0x18(sp)
      c.ldsp s2, 0x10(sp)
      c.addi16sp sp, 0x30
      c.jr ra

If the Return-Oriented Programmer would like to use this gadget to pop only s2 (maybe a0 contains a runtime-calculated value she would not like to overwrite), she is unable to do so because jumping directly to the c.ldsp s2, 0x10(sp) instruction would skip the load into ra, breaking the ROP chain and causing an infinite loop.

A similar issue presents itself with ARM through LDMFD instructions that pop a large number of registers, and with SPARC through the restore instruction. Note however, that in RISC-V, the instructions sandwiched between c.ldsp ra, 0x28(sp) and c.addi16sp, sp, 0x30 are often not only pop instructions and thus can cause traps, undefined behavior, and undesirable memory corruption.

2.2 Preconditions

The ideal way to avoid undesirable side effects is to entirely avoid using gadgets that cause them. However this is not always feasible, and sometimes more careful and deliberate treatment of side effects is needed.

Take for example the following readMEM gadget:

  0x00000000000d3230 : 
      c.ld a0, 8(a0)
      c.add a0, a5
      c.ldsp a4, 0x28(sp)
      c.ld a5, 0(s0)
      bne a4, a5, 0x1e
      c.ldsp ra, 0x38(sp)
      c.ldsp s0, 0x30(sp)
      c.addi16sp sp, 0x40
      c.jr ra

This gadget is extremely valuable because it is a very rare readMEM gadget that reads memory pointed at by a register that is easily popped, incremented, and decremented through other gadgets, and does not perform a conditional branch depending on the read value. However, it is prone to the following side effects:

  1. The initial value of a5 is added to the read value of a0.

  2. The value of a5 is read from (s0).

  3. The value of a4 is popped and if the popped a4 is not equal to the read a5 then a branch occurs.

Ideally, we want the initial value of a5 to be 0 so that the first side effect is avoided entirely, and we want (s0) to not trap and contain some constant known value so that we can make the popped a4 equal to the value, avoiding the branch. If we maintain these preconditions prior to invoking this gadget (for example, if we found and used a pop gadget for s0, a4, and a5), then we can prevent or account for these side effects. It turns out that these specific preconditions can be achieved using two other gadgets.

In terms of crafting ROP chains, we can combine the gadgets used to fulfill preconditions and the gadget requiring the preconditions into a single logical unit to make programming large ROP chains easier.

3 Self-modifying ROP

In order to mitigate the relative sparsity of side-effect-free, clean gadgets in RISC-V ROP, we propose a trick that improves the capability of these gadgets.

3.1 Saving registers

For example, one issue coming from the previous readMEM gadget is the fact that the address in a0 was overwritten with the value from memory. If we would like to keep the original value for later use, we could either find a way to move the value to some other register which is not cobbled by the readMEM gadget sequence, or we could write the value to some scratch space in memory so that a later readMEM can read it back.

We propose a technique called "self-modifying ROP" that simplifies the second approach. A similar approach was used in Sigreturn-Oriented Programming in x86, but our approach goes a bit deeper and applies to "vanilla" ROP chains instead. [14]

Rather than using a readMEM gadget to restore the value into the register, we use a POP gadget, and the POP gadget frame itself is the destination we write the original value to. The closest analogue to this technique comes from SPARC ROP, where a value is written to a future gadget’s register window in order to set the value when the window is restored.

In this case, we have the option to use self-modifying ROP rather than writing to scratch space; we avoid having to allocate scratch space and make our gadget sequence more self-contained as a benefit. As we will show later, self-modifyng ROP is much more powerful and can also be used to circumvent several other limitations of gadgets.

3.2 Manufacturing a MOV gadget

One problem we ran into while evaluating RISC-V Turing completeness was needing to write to a runtime-calculated memory address. Since it was easiest to find arithmetic gadgets on a0, we wanted to find a gadget that wrote to (a0) or a constant offset away from (a0). However, we were only able to find nice gadgets with minimal side effects that write to memory locations parameterized by s0, s1, a3, a4, and a few other registers. However, we couldn’t find nice MOV gadgets from a0 to one of these gadgets with manageable side effects. In the case of s0 and s1, this makes sense; these registers are designated as callee-saved registers, so any function that modifies these registers must restore the original value by the time it returns.

Luckily, the fact that s0 and s1 are callee-saved registers means that POP gadgets are easily available for them. Thus, these writeMEM gadgets can be used to write to locations which are hardcoded into the ROP chain. Further, we can use the same self-modifying ROP technique to write the value of a0 to a future s0 POP gadget, in effect "manufacturing" a MOV gadget from a0 to s0. This allows our writeMEM gadget to write to an arbitrary calculated address.

3.3 Calling functions with arguments

In order to call libc functions from the Return-Oriented Program, we found some POP gadgets and readMEM gadgets that can be used to place an arbitrary set of values into registers a0-a5. We also found a gadget that invokes jalr a5, which sets the return address and calls the function pointed to by a5. Using these gadgets, we can call any libc function with up to four parameters, all of which can be calculated at runtime using self-modifying ROP techniques. Since the found jalr a5 gadget does not cobble a0, we can also retrieve the return value of a libc function.

A complication arises when the function would like to allocate stack space and write to it, corrupting the gadgets that come before the jalr a5 gadget. This is not a problem when the gadgets execute in a linear fashion, but when we have loops and conditional branches (as we do), this corruption is an issue. We solve this issue in mostly the same way as SPARC ROP, by putting the jalr a5 gadget frame immediately after a "safe" buffer zone that the function call is free to modify. We then invoke the jalr a5 gadget using an unconditional branch.

However, there is another complication. Our unconditional branch (part of glibc’s longjmp function) reads the next gadget’s ra and sp from (a0) and 0x68(a0), respectively. While this is not usually a problem, a0 is also where the first parameter to a function is stored. Thus, if we were to directly branch to the jalr a5 gadget, we would not be able to pass in an arbitrary value as the first parameter. We can solve this by dynamically creating a a0 POP gadget frame located before the jalr a5 gadget frame in memory, then branching to the a0 POP gadget instead. Note that we have to create the entire gadget frame including the popped ra, rather than simply write the popped value for a0 like in the previous instances of self-modifying ROP. This is because the a0 POP gadget frame itself is located within the "safe" zone of the function call and thus may be overwritten.

4 Unintended instructions as a result of the C extension

As mentioned in , the C extension for RISC-V allows some level of variable-length instructions, meaning that unintended instructions are possible. For reasons explained in , it is very likely that the unintended instruction sequence will later resync with intended program control flow to lead to a valid function epilogue. Out of 2837 gadgets we found in the version of glibc analyzed, we found a total of 711 gadgets that start at an entry point not found in the libc disassembly.

Looking through the starting instructions of these gadgets, there are several that write to the zero register, several that modify the ra register before it is popped, and several that modify the temporary registers. We mostly ignore the temporary registers in our ROP gadgets because there are very few POP gadgets for them since they are caller saved. We also found lots of branches, nops, and floating point instructions.

Some instructions that seemed useful were move, load and store instructions that used a wide variety of registers, as well as some instructions that moved the stack pointer (plus an offset) to several registers including a0. The latter could be used to perform self-modifying ROP in situations where the location of the stack pointer is not known, by performing arithmetic on the moved sp value and writing to the resulting address.

Neither our example ROP gadgets nor the gadgets used in our ROP chain generator use unintended instructions, so our ROP chains and techniques should generalize to a RISC-V chip without the C extension.

5 Implementing the Brainfuck instruction set in RISC-V ROP

We demonstrate the Turing completeness of our found RISC-V ROP gadgets by creating a tool that, given an arbitrarily complex Brainfuck program, generates an equivalent RISC-V ROP chain. With some simplification, the Brainfuck instruction set is implemented as follows. The actual implementation takes care of many more preconditions/side effects, and also performs extra arithmetic since our readMEM/writeMEM gadgets are usually offsetted.

  • Initialization: Set a0 to point the middle of a large buffer. This buffer will be the Brainfuck "tape". For simplicity, we will use a cell size of 64 bits rather than the standard 8 bits. We are free to make this modification since the Turing completeness of Brainfuck does not depend on integer overflow. [15] a0 will be the only register which is preserved across instructions other than < and >, and will represent the Brainfuck "pointer".

  • The > instruction: Increment a0 8 times.

  • The < instruction: Decrement a0 8 times.

  • The + instruction: Write a0 to future self-modified gadgets. Read the value of (a0) into a0. Increment a0. Using self-modifying ROP, pop the original address stored in a0 into s0 and write a0 to (s0). Finally, use self-modifying ROP again to restore the original value of a0.

  • The - instruction: Identical to above.

  • The . instruction: Write a0 to a future self-modified gadget. Read the value of (a0) into a0. Call putchar. Using self-modifying ROP, restore the original value of a0.

  • The , instruction: Write a0 to future self-modified gadgets. Call getchar to replace a0 with user input. Using self-modifying ROP, pop the original address stored in a0 into s0 and write a0 to (s0). Use self-modifying ROP again to restore the original value of a0.

  • The [ instruction: Write a0 to a self-modified a0 pop gadget which immediately follows. (This is to restore the value after the unconditional branch in the ] instruction.) Write a0 to future self-modified gadgets. Dependent on (a0) == 0, perform a conditional branch either to the next instruction or to the instruction after the corresponding ] instruction. In either case, the next gadget being executed should restore the value of a0 through self-modifying ROP.

  • The ] instruction Write a0 to the self-modified gadget in the implementation of the corresponding [ instruction. Perform an unconditional branch to the corresponding [ instruction, so that it can restore the original value of a0.

  • Ending Set a0 to 0 and call exit.

5.1 Compiling Brainfuck to RISC-V ROP

We introduce a compiler that compiles Brainfuck programs of arbitrary complexity into RISC-V ROP chains using the above mapping, written in pure JavaScript and designed to run in the browser. An accompanying C/assembly program reads the output of the browser tool from a file and populates memory using the output, then begins ROP execution by moving sp to a populated region and simulating a NOP gadget.

5.2 Results

After trying several Brainfuck programs, with the caveat that Brainfuck programs that assume an 8-bit cell size will not work, the vast majority of Brainfuck programs converted to RISC-V ROP and ran easily with no issue, including an implementation of bubble sort (link), a program that prints arbitrarily many square numbers (link), and a program that outputs an ASCII-art of the Sierpinski triangle (link).

6 Conclusion and future work

We show that return-oriented programming on the RISCV64 architecture is fairly powerful, allowing for Turing-complete code execution on gadgets found in a version of GNU libc for Fedora Linux. We show that although difficult, it is possible to perform arbitrary calculation and to perform actions such as conditional branching, arithmetic, reading and writing memory, and calling libc functions with arguments. We introduce the technique of "self-modifying ROP chains" to solve several issues such as not having gadgets that move a value from one register to another. In order to demonstrate the power of these techniques, we created a compiler that converts Brainfuck code of arbitrary complexity into RISC-V ROP chains.

In the future, we hope that more libraries for RISC-V, including other versions of libc for other operating systems, will be analyzed for ROP potential, and perhaps a more general and automated approach to analyzing and chaining ROP gadgets will make ROP attacks easier to carry out.

Availability

We make the Brainfuck-to-RISC-V-ROP compiler available at https://garrettgu10.github.io/fuck-riscv-rop/ and we make its full source code available at https://github.com/garrettgu10/fuck-riscv-rop.

References

[1] Solar Designer, “Getting around non-executable stack (and fix).” BugTraq mailing list post. Online: https://seclists.org/bugtraq/1997/Aug/63, Aug. 1997.

[2] S. Krahmer, “X86-64 buffer overflow exploits and the borrowed code chunks exploitation technique.” Online: https://users.suse.com/~krahmer/no-nx.pdf, Sep. 2015.

[3] H. Shacham, “The geometry of innocent flesh on the bone: Return-into-libc without function calls (on the x86),” in Proceedings of ccs 2007, Oct. 2007.

[4] E. Buchanan, R. Roemer, H. Shacham, and S. Savage, “When good instructions go bad: Generalizing return-oriented programming to RISC,” in Proceedings of acm ccs 2008, Oct. 2008.

[5] T. Kornau, Accessed: 2020-02-16“Return oriented programming for the ARM architecture,” Master’s thesis, Ruhr-Universität Bochum; https://static.googleusercontent.com/media/www.zynamics.com/en//downloads/kornau-tim--diplomarbeit--rop.pdf, 2009.

[6] F. Lindner, “Cisco IOS: Attack and defense: The state of the art.” Presented at 25C3. Slides online: http://www.phenoelit.org/stuff/FX_Phenoelit_25c3_Cisco_IOS.pdf, Dec. 2008.

[7] A. Waterman, Online: https://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-1.pdf“Design of the RISC-V instruction set architecture,” Berkeley, UCB/EECS-2016-1, Jan. 2016.

[8] S. Fingeret, Online: https://people.csail.mit.edu/hes/ROP/Publications/sam-thesis.pdf“Defeating code reuse attacks with minimal tagged architecture,” Master’s thesis, MIT, 2015.

[9] W. Kamolpornwijit, Online: https://dspace.mit.edu/bitstream/handle/1721.1/105996/965798306-MIT.pdf“P-TAXI: Enforcing memory safety with programmable tagged architecture,” Master’s thesis, MIT, 2016.

[10] J. Li et al., “Zipper stack: Shadow stacks without shadow.” arXiv preprint arXiv:1902.00888, Feb. 2019.

[11] A. De, A. Basu, S. Ghosh, and T. Jaeger, “FIXER: Flow integrity extensions for embedded RISC-V,” in Proceedings of date 2019, Mar. 2019.

[12] G.-A. Jaloyan, K. Markantonakis, R. Akram, D. Robin, K. Mayes, and D. Naccache, To appear.“Return-oriented programming on risc-v,” in Proceedings of asiaccs 2020, Oct. 2020.

[13] M. Rushanan and S. Checkoway, “Run-DMA,” in Proceedings of woot 2015, Aug. 2015.

[14] E. Bosman and H. Bos, “Framing signals - a return to portable shellcode,” in 2014 ieee symposium on security and privacy, 2014, pp. 243–258.

[15] M. Davis, “Corrado Böhm. on a family of turing machines and the related programming language. ICC bulletin, vol. 3 (1964), pp. 185–194.” Journal of Symbolic Logic, vol. 31, no. 1, pp. 140–140, 1966, doi: 10.2307/2270680.


  1. In independent, simultaneous work to appear at AsiaCCS in October, Jaloyan et al. [12] describe more general ROP techniques for RISC-V. We were unable to find a preprint of the Jaloyan et al.paper as of July 1, 2020.↩︎