Compiling C to Safe Rust, Formalized

arXiv:2412.15042v1 [cs.PL] 19 Dec 2024

AYMERIC FROMHERZ, Inria, France
JONATHAN PROTZENKO, Microsoft Azure Research, USA
The popularity of the Rust language continues to explode; yet, many critical codebases remain authored in C,
and cannot be realistically rewritten by hand. Automatically translating C to Rust is therefore an appealing
course of action. Several works have gone down this path, handling an ever-increasing subset of C through a
variety of Rust features, such as unsafe. While the prospect of automation is appealing, producing code that
relies on unsafe negates the memory safety guarantees offered by Rust, and therefore the main advantages of
porting existing codebases to memory-safe languages.
We instead explore a different path, and explore what it would take to translate C to safe Rust; that is, to
produce code that is trivially memory safe, because it abides by Rust’s type system without caveats. Our work
sports several original contributions: a type-directed translation from (a subset of) C to safe Rust; a novel static
analysis based on “split trees” that allows expressing C’s pointer arithmetic using Rust’s slices and splitting
operations; an analysis that infers exactly which borrows need to be mutable; and a compilation strategy for
C’s struct types that is compatible with Rust’s distinction between non-owned and owned allocations.
We apply our methodology to existing formally verified C codebases, namely, the HACL★ cryptographic
library, and binary parsers and serializers from EverParse, and show that the subset of C we support is sufficient
to translate both applications to safe Rust via a newly implemented compiler. Our evaluation shows that for
the few places that do violate Rust’s aliasing discipline, automated, surgical rewrites suffice; and that the
few strategic copies we are forced to insert have a negligible performance impact. Of particular note, the
application of our approach to HACL★ results in a 80,000 line verified cryptographic library, written in pure
Rust, that implements all modern algorithms without a single use of unsafe – the first of its kind.
Additional Key Words and Phrases: Rust, Compilation, Semantics

1

Introduction

Despite decades of research, memory safety issues remain prevalent in industrial applications;
recent studies by Google [Chromium Project 2020] and Microsoft [MSRC Team 2019] estimate that
70% of security vulnerabilities are related to incorrect memory handling. To tackle this issue, both
companies and governments now advocate for the use of memory-safe languages for safety-critical
systems, notably, Rust [National Security Agency 2022; The Register 2023; The White House 2024].
Combining the high performance and low-level idioms commonly provided by languages like C
and C++ with memory safety by design, the Rust programming language continues to enjoy record
levels of popularity in industry, being ranked as the “most admired programming language” for the
eigth year in a row [StackOverflow 2023; Verdi 2023], leading several widely used projects to plan
a transition to Rust [Cook 2022; Mozilla Hacks 2024]. However, while Rust offers clear benefits for
new, clean-slate code, the value proposition is less clear for existing code. Indeed, it can be hard,
especially in an industrial setting, to justify rewriting code that has been battle-tested, thoroughly
debugged, and exhibits no glaring issues. In those situations, Rust is perceived as a “nice to have”
and particularly useful to develop new features, but certainly not a business priority when there
are so many other engineering battles to be fought [Vander Stoep and Rebert 2024].
This problem is exacerbated when it comes to industrial-grade code that also happens to be formally verified. Owing to its very nature, formal verifiation is an extremely labor-intensive activity,
and it is not uncommon for projects to accumulate dozens of person-years for the sole verification
effort [Kästner et al. 2018; Klein et al. 2009; Protzenko et al. 2020]. Whether with VST [Appel 2011],
Authors’ Contact Information: Aymeric Fromherz, aymeric.fromherz@inria.fr, Inria, France; Jonathan Protzenko, protz@
microsoft.com, Microsoft Azure Research, USA.

2

Aymeric Fromherz and Jonathan Protzenko

Autocorres [Greenaway et al. 2012] or Cogent [Amani et al. 2016]; and whether it’s parsers or serializers [Ramananandro et al. 2019; Reiher et al. 2019; Swamy et al. 2022], cryptography [Appel 2015;
Dockins et al. 2016; Erbsen et al. 2019; Liu et al. 2019; Polubelova et al. 2020] or protocols [Arquint
et al. 2023; Bhargavan et al. 2016; Delignat-Lavaud et al. 2021; Erbsen et al. 2024], verified code is
unlikely to be rewritten and reverified just to support a transition to Rust.
However, there is hope: to facilitate adoption in a variety of contexts and codebases, many
verification efforts, including all of the examples above, settled on C as their target language for
verified code. If one were to be able to translate C to Rust, while formally arguing for the correctness
of the translation process, then not only could the migration be automated, but the correctness
argument would carry forward, obviating the need for a costly rewrite and reverification effort.
Several existing works therefore attempt to automatically translate C to unsafe Rust [Galois and
Immunant 2019; Lesinski 2017; Ling et al. 2022; Sharp 2020]. Indeed, C allows programmers to be
creative with aliasing, low-level casts, memory management, and data representation. Expressing
those patterns in Rust requires opting out of many static guarantees, in order to allow unchecked
aliasing, casts between representations (a.k.a. “transmutation”, in Rust lingo), and so on, which
is done through the unsafe feature of Rust. But doing so obliterates the benefits of Rust! If the
code cannot be shown to be statically safe, because it relies on the unsafe keyword, then there is
in our view little benefit to a transition to Rust, and a lot to lose in terms of toolchain, build and
integration complexity. We emphasize that avoiding unsafe is seen as a badge of honor in the Rust
community: a library littered with unsafe is unlikely to ever be contemplated for serious adoption.
But targeting safe Rust raises several challenges. First, the ownership and aliasing discipline
of Rust is much more stringent than C’s. As a consequence, many common C patterns, such as
taking pointer arguments that are disjoint or equal, cannot be expressed in safe Rust, as they would
trip its borrow-checker. Second, mutability is explicit in Rust, whereas everything is implicitly
mutable in C. Translating every variable and borrow to mutable in Rust would not only make many
programs ill-typed (because mutable pieces of data must have a unique owner in Rust), but also
produce unusable APIs, since callers would be contaminated with excessive mutability in their
own source code. Third, pointer arithmetic is central in C, and does not exist in safe Rust, where
the only abstraction is slices (a.k.a. runtime-checked fat pointers), and the only arithmetic is done
through splitting a slice in two sub-parts to prevent unchecked aliasing. Fourth, whereas C uses t*
for any pointer type, Rust has several types, such as [T; N], Box<[T]>, &[T], and more. A translation
needs to infer, or devise heuristics, to sensibly map a (not informative) t* to those Rust types.
In this work, we set out to translate C to safe Rust. To address the challenges above, we adopt a
new point in the design space: instead of automatically translating C in its full generality to unsafe
Rust and attempting to make the resulting code safer, we rather target a data-oriented, applicative
subset of C. Our translation process is therefore semi-active: users may have to perform minimal
adjustments to the source C program to fit the supported language subset; once in this subset, our
approach then automatically produces valid, safe Rust code.
Our contributions are as follows. First, we present a formal translation of a subset of C, dubbed
Mini-C, to safe Rust. We express this process modularly, via a type-directed translation, parameterized over an oracle for compiling pointer arithmetic to slice splitting, followed by several
post-translation analyses to soundly infer mutability qualifiers, derive traits, and other Rust-specific
features. Second, we implement our translation in KaRaMeL [Protzenko et al. 2017], an existing
compiler framework that previously compiled Mini-C to actual C. Third, we evaluate our implementation over two large-scale projects: the HACL★ verified cryptographic library (80k of C
code) [Polubelova et al. 2020] and EverParse [Ramananandro et al. 2019], a verified formatters
and serializers library, and specifically its CBOR [Internet Engineering Task Force 2020] parser
(1.4k lines of C). We show that the modifications required of the programmer on the source are

Compiling C to Safe Rust, Formalized

3

either minimal (for HACL★) or non-existent (for EverParse). Fourth, we perform an experimental
evaluation, and distill two central insights: modifying the C source to abide by Rust’s ownership
discipline not only has no noticeable performance impact, but the resulting Rust code also exhibits
the same performance profile as the original C, in spite of fat pointers and runtime bounds checks.
Artifact Submission. All our developments are under open-source licenses, and will be made
public after the review process. To ensure reproducibility of our results, we intend to submit a
complete artifact containing both the implementation of our translation in KaRaMeL and the case
studies presented in the paper alongside scripts to easily reproduce experimental results.
2

Translating Mini-C to Safe Rust

𝑡

::=

uint8_t, bool, . . .
void
𝑡∗
𝑡 (𝑡®)
struct 𝑠

C99 value types: integers, booleans. . .
empty type
pointer type
function type
struct type

𝑠𝑡

::=

® else 𝑠𝑡,
® while (𝑒) 𝑠𝑡
®
return 𝑒, if (𝑒) then 𝑠𝑡
𝑥 =𝑒
𝑒
𝑡 𝑥 = 𝑒, 𝑡 𝑥
𝑡 𝑥 [𝑛] = {®
𝑒 }, 𝑡 𝑥 [𝑛]

control-flow
assignment
expression statement
variable declaration (with,without) initialization
array declaration (with, without) initialization

𝑒

::=

𝑥
𝑒 [𝑒]
𝑒.𝑓
𝑓 (®
𝑒)
∗𝑒
malloc(𝑒)
&𝑒

variable
array indexing
field selection
function call
dereference
heap allocation
address-taking, a.k.a. reference
Fig. 1. Grammar of mini-C types, statements and expressions

We present Mini-C (Figure 1), a data-oriented subset of C that is expressive enough to handle
a large class of applications. Types comprise value, pointer and struct types. Mini-C follows C
and distinguishes between expressions and statements, with support for array allocation and
manipulation, along with assignments, dereferences, field selection and common control-flow.
For simplicity of presentation, we omit syntactic sugar (such as increment operators, or the
arrow operator ->), C variable-length arrays (VLAs), the traditional syntactic separation between
lvalue and rvalue, and present a simplified view of C syntax that does not bring in notions of
specifiers, declarators, etc. We handle C array types, but do not list them in our syntax, to avoid
polluting our rules with concerns related to array decay (and how, e.g., void f(int x[4]) really
means void f(int *x)) – arrays get direct treatment in E-Stack. In short, our presentation focuses
on where the bulk of our work lies, with the understanding that all of the omitted features above
are exercised by our case studies (Section 4) and are thus supported by our implementation.
We present in Figure 2 a simplified Rust that focuses on borrows and boxes of slices, eliding a fullygeneral notion of slice type (that exists internally in Rust, and may appear in trait implementations,

4

Aymeric Fromherz and Jonathan Protzenko

𝑇

::=

ℓ

::=

𝑢8, bool, . . .
()
[𝑇 ; 𝑁 ]
&mut? ℓ ? [𝑇 ]
Box⟨[𝑇 ]⟩
(𝑇®) → 𝑇
𝑆

base value types
unit type
array value type
borrowed slice
owned boxed slice
function type
struct type

′ 𝑎,′ 𝑏...

lifetime

𝐸

::=

𝑥
let 𝑥 = 𝐸; 𝐸
()
[𝐸; 𝑁 ]
®
[𝐸]
vec![𝐸; 𝑁 ]
𝐸 [..]
&mut? 𝐸
∗𝐸
𝐸.𝑓
𝐸 [𝐸]
n =𝐸
o
𝑆 𝑓® : 𝐸®

variable
let-binding
unit value
array repeat
slice literal
vector repeat
array to slice
borrow
dereference
field access
assignment

®
𝑓 ( 𝐸)
®
𝐸.𝑚( 𝐸)

function call
method call

structure

Fig. 2. Grammar of Rust types and expressions. The ? denotes optional syntax.

T-Bool

T-Int

T-Void

T-Pointer
𝑡 ↩→ 𝑇

T-Fun
𝑡 ↩→ 𝑇

bool ↩→ bool

uint8_t ↩→ u8

void ↩→ ()

𝑡∗ ↩→ &[𝑇 ]

𝑡 (𝑡®) ↩→ (𝑇®) → 𝑇

𝑡® ↩→ 𝑇®

Fig. 3. Translation from Mini-C to Rust: types

but can’t be constructed by the user), and omitting details such as: borrows of arrays, base types
like Vec, and a semantics of moves vs copies. Our implementation supports these.
Central to our translation is the fact that while C provides a unique pointer abstraction (t*),
Rust distinguishes both where the allocation lives (stack or heap), and whether a pointer conveys
ownership. Array types [𝑇 ; 𝑁 ] denote arrays of statically-known length 𝑁 , and are value types.
Unlike C, these never decay automatically to pointer types. Boxed slices Box⟨[𝑇 ]⟩ denote heapallocated arrays, whose length can be queried at runtime, and are value types as well. As such,
arrays and boxes have no lifetime, and can be passed around as regular values, potentially incurring
a move to preserve Rust’s ownership semantics. Borrowing from (akin to “taking the address of”,
in C) an array or a box yields a slice borrow, which may be mutable or immutable (the latter
denoted by the absence of a mut qualifier). Borrows have a lifetime, which is used by the Rust
compiler to guarantee the absence of dangling pointers and use-after-frees (i.e., temporal safety).
Slice borrows package their lengths at runtime, which the Rust compiler uses to guarantee the
absence of out-of-bounds errors (i.e., spatial safety). Oftentimes, the lifetime can be inferred by the
Rust compiler, and we will omit it in our presentation whenever possible.
2.1

Type Translation

We present in Figure 3 our translation from Mini-C types to Rust types. Our judgments are of
the form 𝑡 ↩→ 𝑇 , meaning C type 𝑡 translates to Rust type 𝑇 . Judgments for base types are
straightforward: similar types exist in Mini-C and Rust. The key difficulty occurs when handling
pointer types. C pointers omit whether they contain a single-element or an array, nor do they
include information about the provenance of the underlying allocation (stack, heap, or static scope).

Compiling C to Safe Rust, Formalized

5

To provide a common abstraction, we therefore translate C pointers to &[T] (T-Pointer), which
designates slice borrows without mutable ownership. This means that every C pointer type become
a borrowed slice in Rust, regardless of whether it points to one or more elements. Concretely,
uint8_t x; uint8_t *y = &x; generates let y: &[u8] ... in Rust, and not &u8. Naturally, some of these
slice types may need to be mutable, i.e., &mut[u8]: we infer mutability qualifiers later on (Section 3.1),
meaning the translation does not need to concern itself with that for now.
We note that Rust does have a C-like pointer type mut T*, but it requires unsafe Rust code to use,
which is exactly what we set out not to do. We do not use those in our translation.
2.2

Type-Directed Translation of Expressions

Building on our type translation, we now present our translation from Mini-C programs to Rust.
Again, the chief difficulty lies in the multiple representations existing in Rust for pointers and
memory-related types. Because our type translation is modular, our translation of expressions
must reconcile various Rust types (namely, [𝑇 ; 𝑁 ], Box⟨[𝑇 ]⟩) produced at allocation-time with our
uniform pointer abstraction dictated by the type translation (namely, &[𝑇 ]).
Our translation is therefore type-directed, which allows inserting coercions on demand. Our
rules have the form Γ ⊢ 𝑒 ⊳ 𝑇 ⇝ 𝐸 ⊣ Γ ′ , representing that in Rust typing environment Γ, the
translation of Mini-C expression 𝑒 with expected Rust type 𝑇 yields the Rust expression 𝐸 and an
updated Rust typing environment Γ ′ . We extend ⇝ to statements and lists of statements.
We present our translation rules in Figure 4. To reconcile lists of statements (in C) and expressions
everywhere (in Rust), we match on the head of a list of statements in C. For instance, E-Stack
®
matches a stack allocation as the first statement, followed by subsequent statements 𝑠𝑡.
E-Index
Γ ⊢ 𝑒 1 ⊳ &[𝑇 ] ⇝ 𝑒 1′ ⊣ Γ ′

E-Deref
Γ ⊢ 𝑒 ⊳ &[𝑇 ] ⇝ 𝑒 ′ ⊣ Γ ′

Γ ′ ⊢ 𝑒 2 ⊳ usize ⇝ 𝑒 2′ ⊣ Γ ′′

Γ ⊢ 𝑒 1 [𝑒 2 ] ⊳ 𝑇 ⇝ 𝑒 1′ [𝑒 2′ ] ⊣ Γ ′′

Γ ⊢ ∗𝑒 ⊳ 𝑇 ⇝ 𝑒 ′ [0] ⊣ Γ ′

E-Stack
𝑡 ↩→ 𝑇

® ⊳ 𝑇 ′ ⇝ 𝐸′ ⊣ Γ
Γ𝑁 , 𝑥 : [𝑇 ; 𝑁 ] ⊢ 𝑠𝑡

E-Var
𝑥 :𝑇 ∈ Γ
Γ ⊢ 𝑥 ⊳𝑇 ⇝ 𝑥 ⊣ Γ

Γ𝑖 ⊢ 𝑒𝑖 ⊳ 𝑇 ⇝ 𝐸𝑖 ⊣ Γ𝑖+1

® 𝐸′ ⊣ Γ
® ⊳ 𝑇 ′ ⇝ let 𝑥 : [𝑇 ; 𝑁 ] = [𝐸];
Γ0 ⊢ 𝑡 𝑥 [𝑁 ] = {®
𝑒 }; 𝑠𝑡

E-Heap
𝑡 ↩→ 𝑇

® ⊳ 𝑇 ′ ⇝ 𝐸 ′ ⊣ Γ′
Γ, 𝑥 : Box⟨[𝑇 ]⟩ ⊢ 𝑠𝑡

® ⊳ 𝑇 ′ ⇝ let 𝑥 : Box⟨[𝑇 ]⟩ = vec![0; 𝑁 ].into_boxed_slice(); 𝐸 ′ ⊣ Γ ′
Γ ⊢ 𝑡 ∗𝑥 = malloc(𝑁 ∗ sizeof (𝑡)); 𝑠𝑡
E-Box-Slice
𝑥 : Box⟨[𝑇 ]⟩ ∈ Γ

E-Array-Slice
𝑥 : [𝑇 ; 𝑁 ] ∈ Γ

E-Slice-Box
𝑥 : &[𝑇 ] ∈ Γ

Γ ⊢ 𝑥 ⊳ &[𝑇 ] ⇝ &𝑥 ⊣ Γ

Γ ⊢ 𝑥 ⊳ &[𝑇 ] ⇝ &𝑥 [..] ⊣ Γ

Γ ⊢ 𝑥 ⊳ Box⟨[𝑇 ]⟩ ⇝ (∗𝑥).into() ⊣ Γ ′

E-Array-Box
𝑥 : [𝑇 ; 𝑁 ] ∈ Γ

Γ ′ = Γ \ {𝑥 }

Γ ⊢ 𝑥 ⊳ Box⟨[𝑇 ]⟩ ⇝ Box :: new(𝑥) ⊣ Γ ′
E-Call
𝑓 : (𝑇®) → 𝑇 ∈ Γ

Γ𝑖 ⊢ 𝑒 ⊳ 𝑇𝑖 ⇝ 𝐸𝑖 ⊣ Γ𝑖+1

Γ ′ = Γ \ {𝑥 }

E-AddrOf
Γ ⊢ 𝑒 ⊳ 𝑇 ⇝ 𝐸 ⊣ Γ′
Γ ⊢ &𝑒 ⊳ &[𝑇 ] ⇝ &[𝐸] ⊣ Γ ′
Γ𝑛 , 𝑟 : 𝑇 ⊢ 𝑟 ⊳ 𝑇 ′ ⇝ 𝑒 ′ ⊣ Γ ′

® 𝑒 ′ ⊣ Γ′
Γ0 ⊢ 𝑓 (®
𝑒 ) ⊳ 𝑇 ′ ⇝ let 𝑟 = 𝑓 ( 𝐸);
Fig. 4. Translation from Mini-C to Rust: selected statements and expressions

6

Aymeric Fromherz and Jonathan Protzenko

Translating Array Allocations. C allows allocating arrays either on the stack or on the heap.
For stack arrays, we assume the size is known at compile-time (we support, but do not present
here, VLAs). Stack-allocated arrays enjoy various bits of array initialization syntax in C, such as
t x[N] = {...}. Heap-allocated arrays are created with calls to malloc, which takes a size in bytes.
Rule E-Stack thus translates a stack allocation named 𝑥, of 𝑁 elements of type 𝑡, each initialized
® The whole list of statements is expected to have Rust type
with 𝑒𝑖 , followed by more statements 𝑠𝑡.
𝑇 ′ . We first translate the type of array elements t to its Rust counterpart 𝑇 . We then rely on the
newly translated type T to translate each of the initial array elements 𝑒𝑖 to its Rust equivalent 𝐸𝑖 ,
® is translated in
generating a series of environments Γ𝑖 , ending with Γ𝑁 . Finally, the continuation 𝑠𝑡
Γ𝑁 extended with a binding for 𝑥 at Rust array type [𝑇 ; 𝑁 ]. The final result is a Rust let-expression;
we reiterate that Rust is an expression language, meaning that C statements (e.g., 𝑡 𝑥 [𝑁 ] = {®
𝑒 }), C
® all translate to Rust expressions.
expressions (e.g., 𝑒𝑖 ) and C lists of statements (e.g., 𝑠𝑡)
Rule E-Heap operates in a similar manner, with one key difference. As the memory is heap
allocated, the variable is instead translated to a Boxed slice; in Rust, we do this by relying on the
vec! macro, which efficiently creates a new vector, before immediately turning it into a boxed slice,
and without copies when applied to a literal of the form [𝐸; 𝑁 ]. Since C heap-allocated arrays are
not guaranteed to be resizable in-place, we do not need Vec's dynamic resizing capabilities.
These rules present simplified versions of what we support. Our implementation features several
peephole optimizations, for instance, a stack allocation followed by a memset or followed by a forloop initialization translates directly to [E; N], Rust’s syntax for an array of 𝑁 elements filled with
𝐸. Similarly, a heap allocation performed via calloc, followed by memset, or a for-loop initializer
translates to vec![E; N] in one go. Should none of these peephole optimizations trigger, or should
the allocation have no initial value, we simply synthesize a default value at translation-time.
We remark, however, that we expect the size of heap allocations to be expressed as a number of
elements. This is one area where, as previously mentioned, we envision a semi-active approach
to converting C to Rust. Heap allocations that are not expresssed via a sizeof trigger a warning
in most modern compilers; they fail to express intent; and obfuscate the code in a way that flies
against best practices. We therefore consider it reasonable to expect the programmer to rewrite
those, and leverage existing tooling, testsuite and coverage to check that the change preserves the
behavior of the code, before successfully engaging in further translation.
Pointers and Array Accesses. E-Index compiles array accesses, reflecting that Rust expects a usize
integer – we have more ad-hoc conversion rules (omitted) that can insert conversions to usize on
the fly, should the Mini-C program use array indices at a narrower integer type. E-AddrOf preserves
our invariant that all pointer types, in C, become borrowed slices, in Rust: taking the address of 𝑒
automatically inserts a slice literal with a single element. Finally, dereference, consistently with our
translation scheme, uses the borrowed slice indexing operator for &[T]. The dereference operator in
Rust on a slice borrow does not return the first element of a slice, as in C.
Conversions Between Rust Abstractions. The translation presented so far therefore initializes C
pointers as either boxed slices or arrays. However (Section 2.1), our common abstraction to translate
C pointer types is &[T], i.e., a slice borrow. This leads to discrepancies during our translation: consider
a program which creates a local array 𝑥 and passes it to a function 𝑓 expecting a pointer. In C,
this program is valid thanks to array decay: the C compiler will automatically convert the array
to a pointer. In our translation to Rust, this however becomes problematic: after translation, the
function 𝑓 will expect to receive a slice borrow as argument, while the local variable 𝑥 will have
the array type [T; N]; the resulting program will therefore fail to typecheck.
This is where our type-directed translation comes in. When a mismatch occurs between the
current type of a variable in the environment Γ and its expected type 𝑇 , we insert coercions to turn

Compiling C to Safe Rust, Formalized

7

it into an expression of the appropriate type. Consider for instance our earlier example, where a
variable has an array type while a slice borrow was expected. This corresponds to E-Array-Slice,
which introduces a borrow, therefore turning the array into a slice borrow. Rule E-Box-Slice operates
identically, but for boxed slices. Rules E-Slice-Box and E-Array-Box are “reverse conversions”, which
convert stack and unknown allocations into heap ones – the need for those will become apparent
in Section 2.4. Finally, E-Var models the simple case where a variable does have the expected type.
In practice, some of the syntax introduced by the conversion rules is unnecessary, as the Rust
compiler is able to add auto-borrows and auto-derefs in many places. Our implementation is aware
of this, and features a nano-pass at the very end that removes superfluous &s and ∗s.
The coercions introduced by conversion rules can however lead to subtle semantic differences,
specifically, those introduced for the “reverse conversions” E-Slice-Box and E-Array-Box. Consider
for instance the following (simplified) Rust program, where for reasons that will become apparent
in Section 2.4, y must be translated as a Box<[T]>, meaning the array 𝑥 is coerced to a boxed slice. In
this program, the assertion does not hold, as the call to Box::new creates a copy of x. In a C program
however (shown on the left), both x and y would be the same pointer, therefore the change to y on
line 3 would also apply to x. (As a rule of thumb, constructors that take another data structure, or
explicit conversions performed via .into() in Rust generate a copy.)
1
2
3
4

uint8_t x[1] = { 0 };
uint8_t *y = x;
*y = 1;
assert(*x == 1); /* SUCCESS */

1
2
3
4

let x: [u8; 1] = [0; 1];
let mut y: Box<[u8]> = Box::new(x);
y[0] = 1;
assert!(x[0] == 1) /* failure */

These copies cannot be avoided, as in our translation scheme, it might truly be the case that what
initially started as a stack allocation needs to be promoted to a heap allocation. Furthermore, Rust
provides no way of “opting out” of the Copy trait for base types like arrays of integers, meaning that
Rust will silently perform a copy of 𝑥 into 𝑦, while allowing further modifications to 𝑥. We therefore
leverage our output environment Γ ′ and strip the original variables from Γ when performing the
conversion, thus forbidding any further usage of 𝑥.
If the original C program further relies on x, our translation will error out, and will ask the
programmer to fix their source code. This is another area where we adopt a “semi-active” approach
to verification, and declare that some patterns are poor enough, even for C, that they ought to be
touched up before the translation takes place. Empirically, as discussed in Section 4, this situation
only rarely occurs, limiting the modifications needed in the original code.
Function Calls. A subtlety in E-Call is that the result of calling a function may need to undergo a
conversion, e.g., via a further application of E-Box-Slice. However, our conversion rules (above)
operate on variables, so as to be able to easily look up their type in the environment. We simply
let-bind the result of calling a function, which poses no problem, since Rust is an expression
language, meaning we can insert let-bindings anywhere.
2.3

Handling Pointer Arithmetic

When operating on arrays, C programs rarely perform accesses and updates via a single base
pointer. A common programming pattern is to instead divide the array into chunks, or iterate over
the array elements by keeping a local pointer to the current head of the iterator. Such operations
are frequently performed using pointer arithmetic.
Consider the following C example, inspired by an implementation of elliptic-curve cryptography.
The array abcd contains a large number (“bignum”), spread across four 64-bit (8 byte) limbs stored
contiguously in memory. For field addition, a first order of business is to get pointer to individual
limbs, before performing pointwise addition. In other words, we want to perform pointer arithmetic

8

Aymeric Fromherz and Jonathan Protzenko

to access chunks a, b, c, d, spanning intervals [0; 8), [8; 16), [16; 24) and [24; 32) of the base pointer.
Because C does not carry length information for pointers, neither at run-time nor in the type
system, we do not know that each pointer intends to span 8 bytes. Furthermore, we cannot assume
either that pointer arithmetic occurs in left-to-right order; our example, below, illustrates this.
1

uint8_t abcd[32] = { 0 };

3
4
5
6
7

1
2

2

uint8_t *a = abcd + 0;
uint8_t *c = abcd + 16;
uint8_t *b = abcd + 8;
uint8_t *d = abcd + 24;

3
4
5
6
7

let mut abcd = [0u8; 32];
let abcd: &mut [u8] = &mut abcd[..];
let (a_l, a_r) = abcd.split_at_mut(0);
let (c_l, c_r) = a_r.split_at_mut(16);
let (b_l, b_r) = c_l.split_at_mut(8);
let (d_l, d_r) = c_r.split_at_mut(8);
let (a, b, c, d) = (b_l, b_r, d_l, d_r)

This bit of C code cannot be trivially translated to Rust, because in order to guarantee memory
safety, Rust does not allow arbitrary pointer arithmetic. What Rust provides instead is a primitive
named split_at_mut (or split_at for immutable slices), which allows the programmer to relinquish
ownership of a slice, and obtain in exchange two sub slices that split the original slice at the
given index. This permits some restricted notion of pointer arithmetic, while preserving Rust’s
invariant that mutable data should have a unique owner: to regain ownership of the original slice,
the programmer must give up the sub slices.
In the rest of this section, we explain how to map C’s pointer arithmetic onto Rust’s splitting
paradigm, using a novel notion of split trees.
Split Trees. We show on the right, above, the Rust code that our translation generates; we use it
to illustrate our translation by example; then explain the general principle. When translating C’s
pointer arithmetic to Rust, several difficulties arise. First, because we do not have length information
coming from the C side, we need to assume that the chunks are not intended to be overlapping – if
they were, this code would simply be impossible to type-check, and the programmer would have to
rewrite their C code to make the intent more apparent, keeping in line with our semi-active approach
to translating C to Rust. Second, the translation needs to be predictable and understandable by the
user, so that translation failures can easily be matched with the location in the original C code that
needs to be rewritten. For those reasons, we want to avoid backtracking in our translation, and
perform the translation in a forward fashion. This means that we need a data structure that keeps
the history of the calls to split_at, for instance to know at line 5 that C index 8 lives in c_l, and
at line 6 that C index 24 lives in c_r, at Rust index 24 − 16 = 8. This data structure also must be
attached to every program point, so as to translate an access through a C pointer into an access
(possibly with an offset computation) through a Rust slice. For instance, a[0] would translate into
a_r[0] after line 3, but into c_l[0] after line 4, and so on.
We solve these challenges with a data structure called a split tree, synthesized during our
translation: each C pointer maps to a (possibly singleton) split tree, which evolves in a flowdependent fashion. We present the split trees corresponding to abcd at different points of our
translation in Figure 5; the split tree initially only contains the root abcd.
The first pointer addition defines a as a sub-array of abcd starting at index 0, whose intended
length is unknown. We thus split abcd at index 0, and keep this information, meaning indices [0; 0)
of abcd are in the left slice, i.e., a_l, and indices [0; 32) are in a_r (Figure 5a). The second pointer
addition on the same base pointer abcd triggers another split. At this program point, abcd is no longer
available, because it has been borrowed to construct (a_l, a_r). Splitting abcd directly would be a
mistake, because it would terminate these borrows, and render (a_l, a_r) unusable, thus making it
impossible to translate any further usage of C pointer a. Instead, we leverage the fact that our split
tree is a binary search tree, and discover that key 16 needs to be inserted as the right child of node
a, i.e., we must split a_r at index 16. At this stage, a use of C pointer a would trigger a search in the

Compiling C to Safe Rust, Formalized

9

(abcd, 0)
(abcd, 0)
(abcd, 0)
a_l

a_l
a_r

(a) After translating a

(a_r, 16)
c_l

(a_r, 16)

a_l

c_r

(b) After translating c

(c_l, 8)

(c_r, 8)

b_l

d_l

b_r

d_r

(c) After translating b and d

Fig. 5. Successive split trees during C translation. Internal nodes of the form (x, i) have been subjected to a
split at Rust index i and are therefore borrowed at this program point. Leaf nodes are available.

binary tree, and would return c_l as the current slice through which a may be accessed (Figure 5b).
The translation of b is similar. Finally, for d, we find that index 24 is to be found in c_r; because the
indices of the right subslice restart from 0, we must perform a subtraction to know that c_r needs
to be split at index 8 (Figure 5c).
We generalize this mechanism to all variables in the environment, and equip it with a few more
bells and whistles. First, any usage of abcd that is not pointer arithmetic (e.g., f(abcd)) is taken to
indicate that C variables a, b, c, and d are no longer useful and that the user intends to perform
a fresh set of pointer arithmetic computations. (This happens frequently as a function may be
composed of several steps, some expressed through macros. For instance, in our elliptic curve case
study, it is common to have a function be a series of calls to ADD and MUL, each of which expands
to operations that need their own split tree.) This allows the programmer to insert, in the source
code, calls to, e.g., (void)abcd to provide a hint that the split tree of abcd needs to be reset, and that
the corresponding Rust variables can go out of scope (i.e., their lifetimes can end). Such calls are
optimized away by the compiler, and therefore do not have any impact at runtime. We leveraged
this mechanism in numerous places in our HACL★ case study (Section 4.1). Second, we generalize
the form of offsets to accept a more general language of expressions, which we now describe below.
Symbolic Solver. The split trees we presented above operate over constant offsets for pointer
arithmetic – this allows implementing a trivial order, which in turn allows us to maintain the
binary search tree structure. In our study of real-world examples, we commonly encountered more
complex offset expressions, for instance n and n + 8 where n might be a parameter of the current
function, whose value is therefore not statically known.
To address this issue, our implementation relies on a simple, deterministic symbolic solver, which
is able to compare different kinds of arithmetic expressions containing symbolic variables, e.g.,
to determine that n + 8 is greater than n. While this solver is not complete and does not rely on
contextual information to refine the analysis, it is nevertheless sufficient to translate large case
studies as described in Section 4, and could be easily extended for further use-cases.
Should the symbolic solver still fail to compare offsets, our current implementation emits a
warning, along with the location of the problematic source code. This allowed us to fix the source
code in our case studies, for instance, to replace 2*n with n+n. Should none of this work, our compiler
then adopts a final heuristic: that the offsets that occur along the control-flow are monotonically
increasing, i.e., they perform pointer arithmetic from left to right. This is somewhat of a last resort,
although in our experience this was enough to finish compiling all of HACL★ from C to Rust. We
might tweak this behavior to abort compilation instead, by default. Again, our semi-active approach
to translating C to Rust means that, in case the programmer’s intent is unclear, it is oftentimes
worthwhile to rewrite the source, rather than augment our solver with complex heuristics.

10

Aymeric Fromherz and Jonathan Protzenko

Limitations. We now comment on this approach. First, we note that if there is true aliasing, i.e.,
the C program contains both a[8] and b[0] in our earlier example, the translated Rust code will
perform an out-of-bounds array access. In other words, we must assume the array chunks, in C, to
be disjoint. For overlap cases that can be distinguished statically (as above), we emit a compile-time
error; otherwise, the Rust code will panic at runtime. We remark that we have not seen this pattern
in our case studies, so this has not been a problem in our experience. Should this turn out to be a
concern in the future, we envision either supplemental off-the-shelf static analyses; annotations
in the source code to express intent more clearly; or for our chief case studies, i.e., verified code,
changing the translation pipeline to retain array lengths (which do occur at the proof level for
verification purposes) all the way down to Rust, for a sound and complete slice compilation strategy.
2.4

Generalizing to Whole Programs: Struct Definitions and Function Declarations

Finally, we show how to translate top-level type and function definitions. Recall that, by default, C
pointer types translate to Rust borrowed slices. This default choice needs to be selectively overriden.
For instance, consider struct s *new_s(). Such a C function intends to create a new object and
transfer ownership of it to the caller; in Rust, this would be fn new_s() -> Box<S>.
We adopt a set of heuristics that determine whether the return type of a function should be
boxed or should be a borrow. For the former, we devise a new relation ↩→, which inherits all of
@
the rules from ↩→ save for T-Pointer which is replaced by T-Pointer-B; for the latter, we devise a
new relation ↩→
, which inherits all of the rules from ↩→ save for T-Pointer, which is replaced by
′
𝑎

T-Pointer-L. The former translates C pointer types to owned boxes; the latter translates C pointer

types to annotated slice borrows, because Rust makes lifetimes explicit in struct declarations.
T-Pointer-L

T-Pointer-B

𝑡 ↩→
𝑇
′

𝑡 ↩→ 𝑇

𝑎

@

′

& 𝑎[𝑇 ]
𝑡∗ ↩→
′

𝑡∗ ↩→ Box⟨[𝑇 ]⟩

𝑎

@

We also run this analysis on type declarations, notably structures, in order to determine whether
a structure should own its inner pointers or not. In our earlier example, if the C definition is
struct s { uint8_t *data; };, then the type of new_s indicates to us that s intends to own its data,
meaning it translates in Rust to struct S { data: Box<[u8]>; }, not struct S<'a> { data: &'a[u8] }.
This means that some patterns in C become problematic. For instance, a struct s in C might be
constructed either using heap allocations, or stack allocations, since both flavors of allocation yield
a uint8_t*. However, in Rust, these allocations have different types. For heap allocations, a C malloc
translates to a new Box, which simply gets moved in the Rust S structure. For stack allocations,
because our translation does not backtrack, an array is first allocated on the Rust stack, then gets
copied into a Box upon realizing that it needs to be assigned into the data field of S which has type
Box<[u8]>. This finally explains why we have rules E-Array-Box and E-Slice-Box.
1
2
3
4
5
6
7
8
9
10

// T-Var
uint8_t *x = malloc(16);
struct s state = { .data = x };
// T-Array-Box
uint8_t x[16] = { 0 };
struct s state = { .data = x };
// T-Slice-Box
uint8_t x[16] = { 0 };
uint8_t *y = x;
struct s state = { .data = x };

1
2
3

let x = vec![0u8; 16].into_boxed_slice();
let state = State { data: x };

4
5
6

let x = [0u8; 16];
let state = State { data: Box::new(x) }

7
8
9
10

let x = [0u8; 16];
let y: &[u8] = &x;
let state = State { data: (*y).into() }

Compiling C to Safe Rust, Formalized

11

Fundamentally, this discrepancy stems from the absence of ownership polymorphism for struct
types in Rust, as they are nominal, and must commit to either Box<[T]> or &'a[T] at definition-time.
Owing to mutual recursion between structures, we use a fixpoint computation to run this analysis
over all structures in a given program. Ultimately, we separate type declarations into two disjoint
sets: borrowed types, and owned types. This heuristic is best-effort, and can be overridden by the
user if the results are not satisfactory. Again, we may advocate for a language of annotations to
promote our semi-active approach to Rust translation.
Tuples as structural types. Anticipating slightly on §3.1, we comment on nominal typing of structs.
A struct declaration in Rust not only sets in stone how a t* gets represented (Box<[T]> vs &'a[T]),
but also sets in stone whether the latter is mutable or immutable (&'a[T] vs &'a mut[T]). This means
that a particular choice of representation may incur not only extra copies (as we just saw), but may
also force (otherwise immutable) pointers to become mutable just to match the expected field type.
We did run into this issue in one of our case studies, where the C code defined a type for pairs of
pointers. Even though this C type was only used locally with mutable pointers, it forced the global
struct definition to use mutable borrows, which in turn forced every usage of this struct to rely on
mutable borrows, rendering the Rust code ill-typed. To address this unintended “contamination”,
we allow for a struct (a nominal type) to be instead compiled as a tuple (a structural type), which
permits polymorphism, passing around (&[u8], &[u8]) whenever possible, or (&mut [u8], &mut [u8])
otherwise, without affecting the rest of the program through a global struct type.
3

Static Analyses on Rust Code

Building on our translation from Mini-C to Rust presented in the previous section, we now propose
several static analyses operating directly, and post-translation, on Rust code to improve code quality
and support a larger subset of C programs. These analyses are tailored to the borrow and move
semantics of the Rust language; they aim to automatically infer mutability annotations and copiable
types, therefore allowing more programs to typecheck once translated to Rust. Importantly, both
the analyses presented in this section are untrusted: they do not modify the programs and only
provide additional annotations, which are checked by the Rust typechecker at compile-time.
3.1

Mutability Analysis

To ensure memory safety, Rust relies on a principle called mutability XOR aliasing. Concretely, it
distinguishes between immutable borrows, of type &T, which can be freely shared and duplicated
but only allow to read the referenced memory, and mutable borrows, of type &mut T, which allow to
write memory, but require exclusive ownership, that is, no other reference to the same memory
region can be live at the same time. This is enforced by the Rust borrow-checker.
The distinction between mutable and immutable borrows raises two conflicting problems. To
pass Rust borrow-checking, one needs to correctly provide mutability annotations, to ensure that
memory being modified is indeed mutably borrowed. Unfortunately, marking all borrows as mutable
is also not a solution. Beyond being unidiomatic, a function requiring only mutable borrows is also
restrictive, as it prevents usages where aliasing would be safe, i.e., when arguments are only read.
To reach a middle ground, the translation presented in Section 2 instead generates immutable borrows by default, which we now refine as-needed into mutable borrows through a custom mutability
inference analysis. To do so, we perform a backward analysis on all translated functions, identifying
expressions that perform memory updates, and backpropagating which program variables need to
be mutable back to their definition site, inserting mutable borrows along the way.
Mutability rules are as follows in Rust. If x has array type [T; N] and x[i] = e occurs, then x must
be a mutable variable, i.e., declared as let mut x = .... If y has borrow type and y[i] = e occurs,

12

Aymeric Fromherz and Jonathan Protzenko

then y must be a mutable borrow; furthermore, one can only mutably borrow variables that are
themselves mutable, i.e., if let y: &mut [u8] = &mut z, then z itself must be declared as let mut z.
We inductively define an analysis on the Rust syntax that is aware of the semantics above and
synthesizes two variable sets 𝑉 and 𝑅, where 𝑉 contains variables that must be mutable (i.e., x and
z, above), and 𝑅 contains variables that must have a mutable borrow type (i.e., y above). Applying
this analysis to the output of our translation yields a Rust program that has been annotated with
the minimum amount of mut qualifiers in variables, types, and function parameters, in order to typecheck. In practice, our analysis is more general, and can handle many more forms of assignments,
with combinations of fields, borrows, array indexing, and so on. Our implementation also computes
a third set 𝐹 , for fields of structs that ought to be mutable, and a fourth set 𝑃, for pattern variables
in matches that ought to be ref mut – we omit these details here for the sake of simplicity.
We formally present our mutability inference analysis in Figure 6. Our rules are presented as the
𝑉 ,𝑅

combination of a judgment Δ ⊢ 𝑒 ⇒ 𝑒 ′, 𝑉 ′, 𝑅 ′ , and a system of mutually-recursive equations over
𝑀

Δ. The final output of our analysis is the least fixed point that satisfies the equations over Δ. The
𝑉 ,𝑅

⇒ judgment represents that Rust expression 𝑒 is transformed into expression 𝑒 ′ . This translation is
𝑀

performed with current sets 𝑉 , 𝑅, and returns new sets 𝑉 ′ and 𝑅 ′ , as the translation of 𝑒 might have
added variables to both sets. The mode 𝑀 ∈ {𝑚𝑢𝑡, 𝑖𝑚𝑚} indicates the expected borrow mutability
of the expression 𝑒, while Δ contains the function definitions, which are needed to retrieve the
expected mutability when performing function calls.
Rules I-ImmVar and I-MutVar are straightforward: if the translation expects variable 𝑥 to be a
mutable borrow, then we add it to the set 𝑅 to backpropagate the information, otherwise, we
leave both sets invariant. Rules I-ImmBorrow and I-MutBorrow are similar, but operate directly on
borrows: if variable 𝑥 is borrowed and the expected type is a mutable borrow, then we replace the
immutable borrow by a borrow, and indicate that variable 𝑥 must be marked as mutable.
Rule I-Let demonstrates the backward nature of the analysis. As the type of the expression
corresponds to the type of 𝑒 2 , we first translate 𝑒 2 with the same mode 𝑀, returning new sets 𝑉 ′
and 𝑅 ′ . We then rely on these sets to translate 𝑒 1 . If 𝑥 belongs to the set 𝑅 ′ , then it means that the
rest of the program expects it to have a borrow type, we thus translate its definition with mode
𝑚𝑢𝑡. Finally, if variable 𝑥 is used mutably, i.e., belongs to 𝑉 ′ , we make its let-binding mut.
Rule I-Update-Borrow presents the translation of a borrowed slice update, which introduces the
need for mutable borrows. Our translation from C to Rust guarantees that the left-hand side has a
slice borrow type (the translation rule for C updates is similar to E-Index, and ascribes a borrow
type to 𝑒 1 ). To satisfy the Rust borrow-checker, the expression 𝑒 1 being modified must thus become
a mutable slice borrow. We therefore translate with the 𝑚𝑢𝑡 mode. Conversely, both the index and
the value being stored are only read; they are therefore translated with mode 𝑖𝑚𝑚.
The translation of function calls is handled through rule I-Call. When calling a function 𝑓 , we
first retrieve the type signature of 𝑓 from the environment Δ. We then translate the expressions
corresponding to each function argument according to their expected mutability, computed through
the function 𝑖𝑠_𝑚𝑢𝑡𝑏𝑜𝑟𝑟𝑜𝑤 (𝑇𝑖 ). This function returns 𝑚𝑢𝑡 is 𝑇𝑖 is of the shape &mut T, and 𝑖𝑚𝑚 in
all other cases. The translation of all arguments can be done in parallel with the initial sets 𝑉 , 𝑅.
To propagate the information acquired, we finally return the union of these sets; this faithfully
models that if an expression 𝑒𝑖 requires a variable 𝑥 to be mutable while 𝑒 𝑗 does not have this
requirement, then 𝑥 must indeed be mutable. A key feature of this rule is that it may modify the
definition environment Δ, and thus trigger further recomputations to reach the fixed-point of our
system of equations. Indeed, the context may force us to produce a mutable borrow, which can
only be achieved by forcing the return type of f itself to be a mutable borrow.

Compiling C to Safe Rust, Formalized

13

I-MutVar

I-ImmVar
𝑉 ,𝑅

Δ ⊢ 𝑥 ⇒ 𝑥, 𝑉 , 𝑅
𝑖𝑚𝑚

I-ImmBorrow

𝑉 ,𝑅

I-MutBorrow

𝑉 ,𝑅

Δ ⊢ 𝑥 ⇒ 𝑥, 𝑉 , 𝑅 ∪ {𝑥 }

𝑉 ,𝑅

Δ ⊢ &𝑥 ⇒ &𝑥, 𝑉 , 𝑅

𝑚𝑢𝑡

Δ ⊢ &𝑥 ⇒ &𝑚𝑢𝑡 𝑥, 𝑉 ∪ {𝑥 }, 𝑅
𝑚𝑢𝑡

𝑖𝑚𝑚

I-Let
𝑉 ,𝑅

Δ ⊢ 𝑒 2 ⇒ 𝑒 2′ , 𝑉 ′, 𝑅 ′
𝑀

𝑀 ′ = if 𝑥 ∈ 𝑅 ′ then 𝑚𝑢𝑡 else 𝑖𝑚𝑚

𝑉 ′ ,𝑅 ′

𝐴 = if 𝑥 ∈ 𝑉 ′ then 𝑚𝑢𝑡 else ∅

Δ ⊢ 𝑒 1 ⇒′ 𝑒 1′ , 𝑉 ′′, 𝑅 ′′
𝑀

𝑉 ,𝑅

Δ ⊢ let 𝑥 = 𝑒 1 ; 𝑒 2 ⇒ let 𝐴 𝑥 = 𝑒 1′ ; 𝑒 2′ , 𝑉 ′′, 𝑅 ′′
𝑀
I-Update-Borrow
𝑉 ,𝑅

𝑉 ,𝑅

Δ ⊢ 𝑒 1 ⇒ 𝑒 1′ , 𝑉 ′, 𝑅 ′

𝑉 ,𝑅

Δ ⊢ 𝑒 2 ⇒ 𝑒 2′ , 𝑉 ′′, 𝑅 ′′

𝑚𝑢𝑡

Δ ⊢ 𝑒 3 ⇒ 𝑒 3′ , 𝑉 ′′′, 𝑅 ′′′

𝑖𝑚𝑚

𝑖𝑚𝑚

𝑉 ,𝑅

Δ ⊢ 𝑒 1 [𝑒 2 ] = 𝑒 3 ⇒ 𝑒 1′ [𝑒 2′ ] = 𝑒 3′ , 𝑉 ′ ∪ 𝑉 ′′ ∪ 𝑉 ′′′, 𝑅 ′ ∪ 𝑅 ′′ ∪ 𝑅 ′′′
𝑀

I-Call
Δ(𝑓 ) = (𝑥 1 : 𝑇1, . . . 𝑥𝑛 : 𝑇𝑛 ) → 𝑇
∀𝑖, Δ ⊢ 𝑒𝑖

𝑉 ,𝑅

⇒
𝑖𝑠_𝑚𝑢𝑡𝑏𝑜𝑟𝑟𝑜𝑤 (𝑇𝑖 )

𝑒𝑖′, 𝑉𝑖 , 𝑅𝑖

𝑇 ′ = if 𝑀 = 𝑚𝑢𝑡 then 𝑚𝑎𝑘𝑒_𝑚𝑢𝑡 (𝑇 ) else 𝑇
𝑉 ,𝑅

Δ ⊢ 𝑓 (𝑒 1, . . . , 𝑒𝑛 ) ⇒ 𝑓 (𝑒 1′ , . . . , 𝑒𝑛′ ),

Ø

𝑉𝑖 ,

Ø

𝑅𝑖

𝑀
𝑖

Δ(𝑓 ) = (𝑥 1 : 𝑇1, . . . 𝑥𝑛 : 𝑇𝑛 ) → 𝑇 ′

𝑖

I-Sig
𝑅 = {𝑥𝑖 | 𝑖𝑠_𝑚𝑢𝑡𝑏𝑜𝑟𝑟𝑜𝑤 (𝑇𝑖 )}

Δ⊢𝑒

∀𝑖,𝑇𝑖′ = if 𝑥𝑖 ∈ 𝑅 ′ then 𝑚𝑎𝑘𝑒_𝑚𝑢𝑡 (𝑇𝑖 ) else 𝑇𝑖

∅,𝑅

⇒

𝑒 ′, 𝑉 ′, 𝑅 ′

𝑖𝑠_𝑚𝑢𝑡𝑏𝑜𝑟𝑟𝑜𝑤 (𝑇 )

Δ(𝑓 ) = (𝑥 1 : 𝑇1, . . . , 𝑥𝑛 : 𝑇𝑛 ) → 𝑇 {𝑒}

Δ(𝑓 ) = (𝑥 1 : 𝑇1′, . . . , 𝑥𝑛 : 𝑇𝑛′ ) → 𝑇 {𝑒 ′ }
Fig. 6. Mutability Inference Analysis, Selected Rules

To infer mutability information for an entire Rust program, our analysis operates over top-level
function definitions in I-Sig. This rule can be seen as the entrypoint of our analysis. Given a previous
iteration of the analysis for 𝑓 , we compute the initial set 𝑅 of variables whose type is a mutable
borrow, based on the signature of 𝑓 . The translation of function body 𝑒 returns a new function
body 𝑒 ′ , as well as sets 𝑉 ′ and 𝑅 ′ . We update the definitions Δ with a new entry for 𝑓 , based on
information inferred during the translation of 𝑒: if a function parameter 𝑥𝑖 belongs to the set 𝑅 ′ ,
meaning it is expected to be a mutable borrow, then we modify its type 𝑇𝑖 accordingly.
The presentation as a system of iterated equations alleviates the need for a topological sort of the
function definitions – there may simply be no such order, given mutually-recursive definitions in C
and Rust. The environment is initially populated with the output of our translation, meaning there
are no mut qualifiers anywhere (Section 2). We then iterate the equations over Δ until a fixed point
is reached. There trivially exists such a fixed point: we only ever add mut qualifiers, meaning the
number of iterations is bounded by the numbers of function parameters across the whole program.
In practice, we of course do not do repeated iterations, and instead use an optimized fixed point
library to eliminate un-necessary recomputations [Pottier 2009].

14

Aymeric Fromherz and Jonathan Protzenko

While Figure 6 presents the essence of our mutability inference algorithm, as mentioned earlier,
our implementation supports a much larger set of features, needed to translate the real-world case
studies presented in Section 4. We reiterate that this phase does not change the Rust program; it
only augments it with additional mut qualifiers, meaning it can be validated a posteriori by the Rust
compiler and we do not need to argue for its correctness.
3.2

Automatically Deriving Trait Instances

A key component of the Rust language is its pervasive use of traits, which can be broadly seen as a
Rust-specific version of typeclasses. Traits allow programmers to specify that a given type must
implement certain features, for instance that it can be pretty-printed (Display trait), converted from
and to another type (From and Into traits), or that it allows deep copies (Clone trait).
When defining a new type T, i.e., a new structure or enumeration, one can provide an implementation of a given trait 𝔗 by manually defining the methods corresponding to 𝔗 for type T,
e.g., implementing a fmt function that specifies how to pretty-print an element of type T allows
to implement trait Display for T. For some traits however, implementations can be automatically
derived, using the Rust #[derive(Trait)] attribute on the type definition, assuming that all fields of
the structure or enumeration do implement the trait.
One trait of particular interest for our translation is the Copy trait. A common pattern in C is to
define a structure containing a pointer (e.g., a pointer to a string and the corresponding length), and
to pass the structure by value when calling functions. This leads to a mismatch with the Rust move
semantics. Rust implements an affine type system, meaning that, by default, values can be used at
most once: passing a value to a function, e.g., f(s), invalidates the value 𝑠, leading to compile-time
errors if s is further used in the program. To avoid this behavior, types can implement the Copy trait,
which instead performs a shallow copy of the value when passing it to a function.
To avoid Rust compilation errors due to C value-passing semantics when translating programs,
we wish to automatically derive Copy when possible. All basic Rust types (e.g., integers or booleans)
implement Copy. Additionally, if a type T satisfies Copy, so does an immutable borrow type &T.
However, copying a mutable pointer (i.e., either &mut T or Box<T>) is not allowed; this would contradict
Rust’s guarantee that every piece of mutable data has a unique owner. To automatically add a
#[derive(Copy)] annotation on appropriate structure and enumeration type definitions, we traverse
all type definitions, and only derive the Copy trait if all fields are themselves Copy, that is, they are
neither a mutable borrow nor a box, and if they are a custom type, this type implements Copy itself.
The analysis is also performed through a fixpoint computation to handle (mutually) recursive
structs and enums.
In addition to Copy, our analysis also automatically derives the PartialEq, and Clone traits when
possible. PartialEq allows the use of the == operator, and Copy allows let-binding a value without
invalidating (moving out) its old name: our translation uses both of those. Clone allows performing
deep copies on a given value, via an explicit call to the clone method. While our translation does not
directly use this feature, implementing Clone is a prerequisite to implement Copy (in Rust’s parlance,
Copy has Clone as a parent trait). Additionally, Clone is commonly used by library clients; we therefore
strive to provide it as much as possible to facilitate the adoption of the translated Rust library.
4

Case Studies

We implemented our approach in the KaRaMeL compiler [Protzenko et al. 2017], which consumes
Low★, an embedded C-like DSL of the F★ programming language, and turns it into actual C. After
70+ nano-passes, the original Low★ has become Mini-C and only has a few constructs left. While
KaRaMeL then pretty-prints Mini-C as actual C syntax, we instead build on Mini-C and implement
the compilation scheme and analyses presented in the previous sections as an extension to the

Compiling C to Safe Rust, Formalized

15

compilation pipeline, before finally emitting Rust code. This allows us to not only iterate on Mini-C
and our translation quickly, but also to reuse the KaRaMeL compiler infrastructure and directly
repurpose verified Low★ projects from C to Rust, one of the original motivations for our work, and
one of the most high-value targets to translate. We have plans for a libclang-based frontend that
would consume actual C syntax; given that KaRaMeL is written in OCaml, and that well-maintained
OCaml libclang bindings exist, we estimate that this would require only minimal effort.
Our current implementation totals 4,000 lines of OCaml code, including comments, and took
one person-year to implement. We benefited from the existing libraries, helpers and engineering
systems already developed for KaRaMeL; anything to do with Rust was added by us. In particular,
to facilitate the adoption of generated C code into existing codebases, KaRaMeL implements many
nano-passes to make code more idiomatic and human-looking, therefore simplifying its audit as
part of integration processes. We extend these compilation passes with 7 Rust-specific nano-passes
that significantly decrease warnings raised by Clippy, the main linter in the Rust ecosystem. These
passes include, e.g., simplifying immediate dereferences of borrowed expressions (i.e., turning *(&e)
into e), or removing dereferences that can be automatically inferred by the Rust compiler.
4.1

HACL★

Our first chief case study is the HACL★ verified cryptographic library [Polubelova et al. 2020].
HACL★ is made up of 170,000 lines of F★ code, which includes the Low★ implementations, pure
specifications, and proofs of functional correctness, memory safety, and side-channel resistance.
The Low★ code compiles to 95,000 lines of C (including headers) via KaRaMeL. More than 15
person-years went into the HACL★ project, and parts of HACL★ have been integrated into Python,
Firefox, the Linux kernel, and a myriad of other well-known software projects.
To avoid high-impact memory vulnerabilities in security-critical applications [Blessing et al.
2024; Durumeric et al. 2014; National Vulnerability Database 2014], many recent cryptographic
applications have turned to Rust, either to provide safe implementations of widely used protocols
like TLS [Rustls Contributors 2016], Wireguard [Cloudflare 2019] or Signal [Signal Messenger
2020], or to develop novel cryptographic constructions such as zero-knowledge proofs [arkworks
contributors 2022; ZeroEx 2019]. Importantly, these applications all build on top of implementations of cryptographic primitives, raising the need for high-performance, high-assurance safe
cryptographic libraries such as HACL★ in the Rust ecosystem.
Naturally, given the engineering effort needed, it would inconceivable to rewrite all of HACL★
in Rust, and, say, use a Rust verification tool like Aeneas [Ho and Protzenko 2022], Verus [Lattuada
et al. 2023] or Creusot [Denis et al. 2022] to redo all of the proofs. A more realistic solution is
therefore to migrate the code to Rust; this is where our translation comes in.
We converted all of the HACL★ codebase to translate, type-check and execute successfully in
Rust via our translation scheme. We estimate that this effort took 3 person-months. Doing so, we
gained insights into patterns in C that cause problems for translating to Rust. We now describe
these patterns, and comment on the effort and difficulty of rewriting the source.
The first issue was that HACL★ was making aggressive use of “in-place or disjoint” ownership patterns. Concretely, a field operation, e.g., modular multiplication, was expressed as
void fmul(uint8_t *dst, uint8_t *x, uint8_t *y), with the precondition that the destination may
alias either x or y, or both. While extremely common in C, this pattern simply cannot be expressed
in Rust, as it would require aliasing a mutable piece of data. A systematic fix for this was to insert
macro-based wrappers of the following form when calling fmul with possibly aliased values:
1
2
3

#define FMUL_COPYX(dst, x, y)

uint8_t x_copy[ELEM_SIZE];
\
memcpy(x_copy, x, ELEM_SIZE); \
fmul(dst, x_copy, y);

16

Aymeric Fromherz and Jonathan Protzenko

This wrapper has the exact same shape (and specification) as the original fmul wrapper, meaning
that the (potentially-fragile) proof at call-site remains identical. Proving its correctness is also
automatic, owing to its small size and innocuous definition. But once expanded, then translated to
Rust, the original problematic, alias-inducing call-site disappears, and all the Rust compiler sees is
a judiciously-inserted copy that the borrow-checker is happy with.
The second issue was that HACL★ contained pointer arithmetic that fell outside of what our
symbolic checker supported (Section 2.3). The reasons for this were varied: some pointer offsets
were using inconsistent notation (e.g., n+n and 2*n within the same scope); some split trees were
not well-scoped, meaning the code juggled between c, d (see Section 2.3) and a combined chunk cd
alternatively within the same scope; aggressive inlining meant that it was not apparent when a
new split tree should be allocated for a given variable. We fixed those with surgical rewrites of the
source code, along with annotations of the form (void)abcd which indicate to our implementation
that the split tree should be discarded since the user means to use the base pointer again.
The third issue was that some patterns were generating moves in Rust, owing to excessive
let-binding in the source. Consider, for instance, our running example of struct s which contains a
data pointer to mutable bytes.
1
2

struct s state = { data: ... };
struct s state2 = state;

5

2
3

3
4

1

f(state2.data);
g(state.data);

4
5
6

6

let state = S { data: ... };
let state2 = state; /* MOVE */
/* RE-BORROW inserted by Rust automatically (elided) */
f(state2.data);
/* ERROR HERE: state has been moved out */
/* NOTE: state2 is still accessible */

As S contains a mutable pointer, it does not implement the Copy trait in Rust. Therefore, when
assigning state to state2 leads to moving-out the value state, meaning that it cannot be further
used in the program. The C program above therefore does not compile once translated to Rust.
Importantly, despite state2.data not implementing Copy either, passing it to the function f does
not lead to a move, which would forbid further uses of state2: the Rust compiler automatically
inserts a reborrow, meaning it translates the call f(state2.data) to f(&(*(state2.data))). We were
therefore able to fix this issue in a systematic manner, by inlining state2 in the source.
The remaining issues do not fall into a generic pattern, and as such could not be fixed with
a systematic rewriting. Those include specific algorithms like HKDF, which relied on in-place
patterns and required inserting copies, along with actual proof work to reverify the rewritten
algorithm. Another problematic algorithm was the streaming API for block algorithms, which deals
with complex long-lived state, and required ad-hoc rewrites, mostly to avoid accessing the mutable
state through aliases. Fortunately, this streaming algorithm was meta-programmed [Ho et al. 2023],
meaning we only had to fix a single implementation for all 12 block algorithms present in HACL★.
The total diff between our fork of HACL★, which extracts to Rust, and the original main branch
is 2,000 lines added and 500 lines deleted. This means that less than 2% of the HACL★ code was
affected. Most of these changes were systematic rewritings; the last category of issues that truly
required deep insights about a specific algorithm (HKDF, streaming) represent a very small fraction
of the total diff. We assess the performance impact of these many changes in Section 5.
We report that our Rust-compiled HACL* has now been packaged within the libcrux cryptographic library, parts of which have been integrated into Mozilla’s NSS and OpenSSH. We thus
expect that our Rust-compiled HACL* will slowly trickle down to many parts of the Rust ecosystem.
4.2

EverParse

Our second case study relates to the EverParse framework [Ramananandro et al. 2019]. EverParse
takes a data description language, and produces Pulse code (another DSL of F★) that just like

Compiling C to Safe Rust, Formalized

17

Low★ feeds into KaRaMeL and eventually becomes Mini-C before being printed out as concrete
C syntax. We set out to translate to Rust a specific parser generated by EverParse, namely, the
CBOR-DET parser [Internet Engineering Task Force 2020]. CBOR-DET is an ongoing IETF draft for
a binary format akin to JSON, and specifically, a deterministic variant of it. Having a verified Rust
implementation is therefore high value and has the potential to become a reference implementation
for future clients.
We applied our translation to an existing CBOR-DET instance of EverParse, developed independently of this paper. CBOR-DET is made up of 9,000 lines of F★, including implementation,
specification, and proofs. CBOR-DET compiles, via the original KaRaMeL toolchain, to 4,000 lines
of C code. As part of the process, we enacted numerous improvements to our translation. Notably,
the introduction of fixed points stems from the very recursive nature of those parser combinators.
Ultimately, we were able to compile CBOR-DET with no modifications to the source code into Rust
code that borrow-checks, compiles and passes the CBOR-DET test suite. The effort took less than
two person-weeks, all spent on improvements to our implementation.
5

Experimental Evaluation

We now set out to evaluate the performance of our translated code. There are numerous factors
that we set out to investigate. First, we aim to understand whether the extra copies, reorderings,
tweaks and macros we had to add to HACL★ affect the performance: we thus extract our modified
version of HACL★ to C (HACL-rs), and compare it with baseline HACL★ (Vanilla), to see the extent
to which our tweaks for Rust compatibility degrade the performance of the C code.
Next, we wish to measure the overhead of Rust. For this, we compare execution speed of HACL★
and CBOR-DET between C and Rust. This tests whether Rust’s slices (with runtime lengths) and
checked array accesses severely affect performance-critical code.
Third, we consider a variant of HACL★ where our mutability analysis is disabled, and every
piece of data is marked as mutable (HACL-rs-mut). This variant contains more copies than necessary in order to type-check in Rust, and fewer optimization opportunities for the compiler, since
immutability is lost. This tells us whether mutability information has a performance impact.
Last, we compare the impact of different optimization levels on the Rust and C code. To minimize
differences between the toolchain, we rely on CBOR for this comparison, as it consists of a unique,
monolithic file for both the C and Rust versions.
Benchmarks. Our HACL★ benchmarks include several representative cryptographic algorithms,
namely, authenticated encryption with additional data (Chacha20-Poly1305), Diffie-Hellman key
exchange constructions based on elliptic curves (P-256 and X25519), and the widely used SHA-2
hashing function. Following existing performance benchmarks for HACL★, our benchmarks consist
of 100,000 encryptions of 16,384 bytes of data for Chacha20-Poly1305, 4,096 Diffie-Hellman key
exchanges for P-256, 100,000 Diffie-Hellman key exchanges using the elliptic curve Curve25519
(X25519), and 16,384 hashes of 16,384 bytes of data using SHA-256. For our baseline, we use the C
code in the HACL★ GitHub repository, commit f218923. Results presented are the average of 5 runs,
normalized with respect to our baseline, the existing C implementation of HACL★ (Vanilla).
Our CBOR-DET benchmarks call the parser validator on various amounts of data; test validate-N
corresponds to validating N bytes of data, repeated 100 times. Results presented are the average of
5 runs, and normalized w.r.t our baseline, the C implementation of CBOR parsers.
Experimental Setup. All experiments are run on a MacOS Ventura 13.6 with an ARM M1 processor
and 16GB of RAM. Aiming for an apples to apples comparison, both the C and Rust benchmarks
are compiled with LLVM; we compile the C code with the -O2 optimization level, while the Rust
code is compiled in release mode, with opt-level set to 2. To minimize the impact of differences in

18

Aymeric Fromherz and Jonathan Protzenko

compilation toolchains, e.g., Rust treating an entire crate as a single translation unit, which opens
up tremendous optimization opportunities, we enable link-time optimization (LTO) in both C and
Rust. In C, we do this by passing the -flto option to clang, while we set lto = "fat" in Rust.
To compare the impact of compiler optimizations, we evaluate optimization levels 0 to 3 in both
C and Rust, via the -O0, -O1, -O2 and -O3 options for clang, and opt-level for rustc.
Benchmark
Chacha20-Poly1305
P-256
SHA-2
X25519
Average

C
HACL-rs-mut
0.9994
1.0310
0.9964
1.0196
1.0116

Vanilla
1.0
1.0
1.0
1.0
1.0

HACL-rs
1.0006
1.0286
0.9970
1.0124
1.00965

Rust
HACL-rs-mut HACL-rs
1.0361
1.0424
1.0429
1.0469
1.0218
1.0181
1.0081
1.0106
1.0272
1.0295

Fig. 7. Experimental Evaluation of C and Rust versions of HACL★. Results are the average of 5 runs, and
are normalized with respect to our baseline, the C implementation of HACL★, commit f218923. HACL-rs is
the result of the translation presented in this paper, including small tweaks to satisfy the Rust type-checker,
while HACL-rs-mut introduces further copies and marks all variables and borrows as mutable.

HACL★. We present our experimental results for HACL★ in Figure 7. We observe few differences
between the three different C versions; both our modified version of HACL★ containing tweaks to
satisfy the Rust type-checker (HACL-rs) and the version with additional copies to satisfy the Rust
borrow-checker while marking all data as mutable (HACL-rs-mut) are between 0.99x and 1.02x of
vanilla HACL★, with an average overhead of 1.01x in both cases. This suggests that LLVM is mostly
able to detect that these copies are superfluous, and to optimize them away at compile-time.
Results for the Rust versions introduce a slightly larger, but still small overhead, with a performance between 1.01x-1.05x of the vanilla C implementation on the different benchmarks. This
suggests that LLVM is also able to optimize the extra copies away when they appear in Rust code.
The average overhead is in both cases around 1.02-1.03x, indicating that our translation only has a
negligible overhead on high-performance, high-assurance cryptographic code.
C
Benchmark
validate-495
validate-995
validate-1495
validate-1995
validate-2245
validate-2495
Average

Vanilla
1.0
1.0
1.0
1.0
1.0
1.0
1.0

Inline
1.0264
1.0033
0.9977
0.9964
0.9970
0.9759
0.9996

Rust
Vanilla Inline
1.3611 0.8463
1.5188 0.9330
1.4977 0.9322
1.4990 0.8880
1.4944 0.8969
1.4899 0.9046
1.4768 0.9002

Rust-unchecked
Vanilla
Inline
1.3497
0.8053
1.4987
0.8956
1.5273
0.8873
1.5018
0.8872
1.5374
0.8909
1.4782
0.8679
1.4821
0.8724

Fig. 8. Experimental evaluation of C and Rust versions of CBOR-DET. All results are normalized w.r.t. C.

CBOR. We present in Figure 8 an experimental comparison of the C and Rust versions of CBORDET. As binary parser implementations heavily rely on array operations, we also compare against
a variant of the Rust implementation where memory accesses are replaced with unsafe variants
omitting runtime bounds checks (get_unchecked). We only do so for performance evaluation, and
because the original C code is verified, meaning that there is a meta-argument that array bounds
check can be safely skipped. Finally, we also include variants of the C and Rust implementations

Compiling C to Safe Rust, Formalized

19

where core functions are decorated with inline annotations. Our goal here is to assess whether
performance differences are driven by hardcoded Rust choices (fat pointers), or are simply subject
to optimization opportunities and different levels of compiler aggressivity.
While the output of our translation (Vanilla Rust) is slower than the original C code (Vanilla
C), we observe that the introduction of inlining brings significant improvements to the Rust version,
but leaves the performance of the C unchanged. Inline Rust outperforming Inline C is to take
with a grain of salt: when reproducing experiments on other machines (namely, a Mac Pro M3 and
an Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz running Linux), trends were similar, but the inlined
Rust code was sometimes slightly slower than the corresponding C code. This rather suggests
that rustc is missing some optimization opportunities due to inlining that clang correctly identifies.
Interestingly, the removal of runtime bounds checks does not provide a significant impact; this is
in line with previous empirical studies on Rust bounds checking [Marzoev 2022]. Our conclusion
is thus that the translation to Rust poses no significant performance issues, and only exposes the
brittleness of performance optimizations in compilers, knowing that those can be remedied with
sufficient inlining annotations, or perhaps upstream Rust compiler tweaks.

Benchmark
validate-495
validate-995
validate-1495
validate-1995
validate-2245
validate-2495
Average

Opt level 0
C
Rust
1.0 0.6253
1.0 0.6328
1.0 0.6395
1.0 0.6303
1.0 0.6369
1.0 0.6280
1.0 0.6321

Opt level 1
C
Rust
0.0142 0.0168
0.0141 0.0168
0.0143 0.0170
0.0143 0.0170
0.0143 0.0170
0.0142 0.0169
0.0142 0.0169

Opt level 2
C
Rust
0.0127 0.0167
0.0127 0.0168
0.0129 0.0169
0.0129 0.0169
0.0128 0.0170
0.0128 0.0169
0.0128 0.0169

Opt level 3
C
Rust
0.0100 0.0167
0.0100 0.0167
0.0101 0.0169
0.0100 0.0169
0.0101 0.0169
0.0101 0.0168
0.0101 0.0168

Fig. 9. Comparison of optimization levels in C and Rust on CBOR validation. All results are normalized w.r.t.
the C version compiled without optimizations.

Compiler Optimizations. Our comparison of optimization levels is available in Figure 9. We
observe several trends: first, while our translated Rust code outperforms the original C code when
compiled without optimizations, C code compiled with an optimized clang outperforms Rust code
compiled with rustc at a similar optimization level. Interestingly, performance gains also stop on
the Rust side as soon as optimizations are enabled: Rust code compiled with optimization levels 1, 2,
3 has an identical performance for our translation of CBOR binary parsers, suggesting that possible
optimizations might be missed by rustc, and that our earlier forced inlining might trigger those.
6

Related Work

Since this is such a high-value target, many attempts have been made to automatically translate C
to Rust. We now review the landscape of tools and techniques, keeping in mind that to the best of
our knowledge, we are the only tool that sets as a goal to generate safe Rust code, while helping
the programmer rewrite their C source to successfully translate.
Historically, the first widespread tool for C to Rust translation is the aptly-named C2Rust [Galois
and Immunant 2019], which combines earlier attempts Citrus [Lesinski 2017] and Corrode [Sharp
2020]. The tool translates C99 code to unsafe Rust, leveraging Rust’s ability to manipulate raw,
C-like pointers via the use of unsafe. The tool accepts a large subset of C, and envisions a manual
rewriting of the unsafe Rust into safe Rust. Several tools were since then built atop C2Rust, trying
to automate the rewriting process that gradually removes unsafe in the translated Rust code.

20

Aymeric Fromherz and Jonathan Protzenko

Emre et al. [2021] simply borrow-check the output of C2Rust, hijacking the Rust compiler in
order to see if any unsafe raw pointers actually do borrow-check. In the event that they do, the
usage of unsafe is removed, and the raw C pointers get promoted to regular, safe Rust borrows.
The authors tackle programs of the same order of magnitude as us (e.g., libxml2, 200,000 lines of
C), which C2Rust natively supports. Their analysis, however, focuses on functions whose unsafety
comes from usage of raw pointers only, meaning that in libxml2, they rewrite 210 functions out of
3,029, but with a 97% success rate. The rest of the functions remain unsafe.
Zhang et al. [2023] build a custom set of ownership- and mutability-based analyses that operate
post-C2Rust, again trying to limit the amount of unsafe pointers and unsafe code. The tool, dubbed
Crown, achieves greater unsafe reduction rates than Emre et al. [2021].
Emre et al. [2023] later opt for an in-house analysis (rather than reusing the Rust compiler),
which allows for more functions to be successfully translated to safe Rust. This newer analysis may
rewrite programs (rather than simply turn raw pointers into borrows). Still, only a fraction of the
functions eventually make it to safe Rust. An interesting note is that the authors share our issues
with nominal struct types (§7.2), but do not seem to use the “tuple trick” we described earlier.
Ling et al. [2022] pushes the idea further, and uses TXL, a programming language dedicated to
source-to-source transformations, to automate the rewriting of unsafe Rust into safe Rust even
more. The authors introduce “semantics-approximating” rules, which ultimately allow them to
convert a large fraction of unsafe functions into safe ones, although the preservation of semantics
does not seem to be guaranteed. Once again, this is seen as a stepping stone for engineers to tackle
the migration of a codebase, rather than a fully automated “fire and forget” translation approach.
All of the works above aim to translate first into almost fully-unsafe Rust, then either manually
or automatically rewrite the code to pull it out of the unsafe subset, reaching for partial safety. We
advocate for a different approach: iterate on the C code until it successfully translates, keeping
existing integration, unit testing, or proofs to assess the validity and/or correctness of the rewrites.
Then, once the rewrite fits within the subset accepted by our translation, proceed, and obtain fully
safe code. In addition, we make a formal claim as to the correctness of our approach.
A separate line of work takes a more focused approach and studies specific patterns that occur
in C and tries to translate them to idiomatic Rust – this is a concern that goes beyond mere safety
and correctness. This line of work still builds atop C2Rust. Problems include: the Posix Lock API of
C, and how to translate it to Rust’s native lock and ownership semantics [Hong and Ryu 2023];
how to replace output parameters with more Rust-native return values and algebraic data types,
using an abstract interpretation analysis to infer what constitutes an output parameter [Hong 2023;
Hong and Ryu 2024a]; or reconstructing Rust enumerations from C tagged unions [Hong and Ryu
2024b]. Again, these all partially remove unsafe; furthermore, these target a more systems-oriented
subset of C, rather than the data-oriented subset we tackle.
With the explosion in popularity of LLMs, translating C to Rust with an LLM was inevitable.
[Pan et al. 2024] find that while GPT-4 generates code that is more idiomatic than C2Rust, only 61%
of it is correct (i.e., compiles and produces the expected result), compared to 95% for C2Rust. Hong
and Ryu [2025] investigate how to mitigate errors introduced by LLMs in the translation process.
These works tackle another problem area, where possibly-faulty translations might be acceptable.
We focus instead on critical applications, notably verified ones, and where having confidence in
the correctness of the translation, and in the safety of the produced Rust code, is paramount.
Focusing on C++ instead, CRAM [GrammaTech 2024] advocates for an approach similar in
spirit to ours: rewrite the source C++ code to use abstractions that encode Rust’s borrow-checking
discipline using advanced C++ templates and type-system features; then, once the code has been
sufficiently refactored, translate it to Rust. Details on CRAM are scarce, as the tool appears to be

Compiling C to Safe Rust, Formalized

21

closed-source, and no publications are available. The translation is described as “provably safe”,
which does not conclusively indicate whether the produced code uses unsafe or not.
Acknowledgments
We thank Guillaume Boisseau and Son Ho for several discussions about the Rust semantics and
borrow-checker, Tahina Ramananandro for invaluable help with understanding EverParse and the
CBOR-DET implementation, and Franziskus Kiefer for integrating our work into libcrux. This work
benefited from funding managed by the French National Research Agency under the France 2030
programme with the reference ANR-22-PECY-0006.
References
Sidney Amani, Alex Hixon, Zilin Chen, Christine Rizkallah, Peter Chubb, Liam O’Connor, Joel Beeren, Yutaka Nagashima,
Japheth Lim, Thomas Sewell, Joseph Tuong, Gabriele Keller, Toby Murray, Gerwin Klein, and Gernot Heiser. 2016. Cogent:
Verifying High-Assurance File System Implementations. Association for Computing Machinery, New York, NY, USA,
175–188. https://doi.org/10.1145/2872362.2872404
Andrew W. Appel. 2011. Verified Software Toolchain. In Proceedings of the European Conference on Programming Languages
and Systems (ESOP). https://doi.org/10.1007/978-3-642-19718-5_1
Andrew W. Appel. 2015. Verification of a Cryptographic Primitive: SHA-256. In ACM Transactions on Programming Languages
and Systems (TOPLAS).
arkworks contributors. 2022. arkworks zkSNARK ecosystem. https://arkworks.rs
Linard Arquint, Malte Schwerhoff, Vaibhav Mehta, and Peter Müller. 2023. A generic methodology for the modular
verification of security protocol implementations. In Proceedings of the ACM Conference on Computer and Communications
Security (CCS). 1377–1391.
Karthikeyan Bhargavan, Cédric Fournet, and Markulf Kohlweiss. 2016. miTLS: Verifying protocol implementations against
real-world attacks. In Proceedings of the IEEE Symposium on Security and Privacy (S&P).
Jenny Blessing, Michael A Specter, and Daniel J Weitzner. 2024. Cryptography in the Wild: An Empirical Analysis of
Vulnerabilities in Cryptographic Libraries. In ACM Asia Conference on Computer and Communications Security (ASIA
CCS). 605–620.
The Chromium Project. 2020. Memory Safety. https://www.chromium.org/Home/chromium-security/memory-safety/.
Cloudflare. 2019. BoringTun: Userspace WireGuard implementation in Rust. https://github.com/cloudflare/boringtun.
Kees Cook. 2022. [GIT PULL] Rust introduction for v6.1-rc1. https://lore.kernel.org/lkml/202210010816.1317F2C@keescook/.
Antoine Delignat-Lavaud, Cédric Fournet, Bryan Parno, Jonathan Protzenko, Tahina Ramananandro, Jay Bosamiya, Joseph
Lallemand, Itsaka Rakotonirina, and Yi Zhou. 2021. A Security Model and Fully Verified Implementation for the IETF
QUIC Record Layer. In Proceedings of the IEEE Symposium on Security and Privacy (S&P).
Xavier Denis, Jacques-Henri Jourdan, and Claude Marché. 2022. Creusot: a foundry for the deductive verification of rust
programs. In International Conference on Formal Engineering Methods. Springer, 90–105.
Robert Dockins, Adam Foltzer, Joe Hendrix, Brian Huffman, Dylan McNamee, and Aaron Tomb. 2016. Constructing Semantic
Models of Programs with the Software Analysis Workbench. In Proceedings of the International Conference on Verified
Software: Theories, Tools, Experiments (VSTTE).
Zakir Durumeric, Frank Li, James Kasten, Johanna Amann, Jethro Beekman, Mathias Payer, Nicolas Weaver, David Adrian,
Vern Paxson, Michael Bailey, and J. Alex Halderman. 2014. The Matter of Heartbleed. In Proceedings of the ACM SIGCOMM
Conference on Internet Measurement (IMC).
Mehmet Emre, Peter Boyland, Aesha Parekh, Ryan Schroeder, Kyle Dewey, and Ben Hardekopf. 2023. Aliasing limits on
translating C to Safe Rust. Proceedings of the ACM on Programming Languages 7, OOPSLA1 (2023), 551–579.
Mehmet Emre, Ryan Schroeder, Kyle Dewey, and Ben Hardekopf. 2021. Translating C to safer Rust. Proceedings of the ACM
on Programming Languages 5, OOPSLA (2021), 1–29.
Andres Erbsen, Jade Philipoom, Jason Gross, Robert Sloan, and Adam Chlipala. 2019. Simple High-Level Code for Cryptographic Arithmetic - With Proofs, Without Compromises. In Proceedings of the IEEE Symposium on Security and Privacy
(S&P).
Andres Erbsen, Jade Philipoom, Dustin Jamner, Ashley Lin, Samuel Gruetter, Clément Pit-Claudel, and Adam Chlipala.
2024. Foundational Integration Verification of a Cryptographic Server. In Proceedings of the Conference on Programming
Language Design and Implementation (PLDI).
Galois and Immunant. 2019. C2Rust: Migrating Legacy Code to Rust. https://github.com/immunant/c2rust.
GrammaTech. 2024. CRAM: C++ to Rust Assisted Migration. https://www.grammatech.com/cyber-security-solutions/
migration-to-memory-safe-code/

22

Aymeric Fromherz and Jonathan Protzenko

David Greenaway, June Andronick, and Gerwin Klein. 2012. Bridging the gap: Automatic verified abstraction of C. In
Proceedings of the International Conference on Interactive Theorem Proving (ITP). Springer, 99–115.
Son Ho, Aymeric Fromherz, and Jonathan Protzenko. 2023. Modularity, Code Specialization, and Zero-Cost Abstractions
for Program Verification. In Proceedings of the International Conference on Functional Programming (ICFP). https:
//doi.org/10.1145/3607844
Son Ho and Jonathan Protzenko. 2022. Aeneas: Rust verification by functional translation. Proceedings of the ACM on
Programming Languages 6, ICFP (2022), 711–741. https://doi.org/10.1145/3547647
Jaemin Hong. 2023. Improving Automatic C-to-Rust Translation with Static Analysis. In 2023 IEEE/ACM 45th International
Conference on Software Engineering: Companion Proceedings (ICSE-Companion). IEEE, 273–277.
Jaemin Hong and Sukyoung Ryu. 2023. Concrat: An automatic C-to-Rust lock API translator for concurrent programs. In
2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 716–728.
Jaemin Hong and Sukyoung Ryu. 2024a. Don’t Write, but Return: Replacing Output Parameters with Algebraic Data Types
in C-to-Rust Translation. Proceedings of the ACM on Programming Languages 8, PLDI (2024), 716–740.
Jaemin Hong and Sukyoung Ryu. 2024b. To Tag, or Not to Tag: Translating C’s Unions to Rust’s Tagged Unions. arXiv
preprint arXiv:2408.11418 (2024).
Jaemin Hong and Sukyoung Ryu. 2025. Type-migrating C-to-Rust translation using a large language model. Empirical
Software Engineering 30, 1 (2025), 3.
Internet Engineering Task Force. 2020. Concise Binary Object Representation (CBOR) - RFC 8949. https://datatracker.ietf.
org/doc/rfc8949/.
Daniel Kästner, Jörg Barrho, Ulrich Wünsche, Marc Schlickling, Bernhard Schommer, Michael Schmidt, Christian Ferdinand,
Xavier Leroy, and Sandrine Blazy. 2018. CompCert: Practical experience on integrating and qualifying a formally verified
optimizing compiler. In Proceedings of the European Congress on Embedded Real Time Software and Systems (ERTS).
Gerwin Klein, Kevin Elphinstone, Gernot Heiser, June Andronick, David Cock, Philip Derrin, Dhammika Elkaduwe, Kai
Engelhardt, Rafal Kolanski, Michael Norrish, Thomas Sewell, Harvey Tuch, and Simon Winwood. 2009. seL4: Formal
Verification of an OS Kernel. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP).
Andrea Lattuada, Travis Hance, Chanhee Cho, Matthias Brun, Isitha Subasinghe, Yi Zhou, Jon Howell, Bryan Parno, and Chris
Hawblitzel. 2023. Verus: Verifying Rust Programs using Linear Ghost Types. In Proceedings of the ACM SIGPLAN Conference
on Object-Oriented Programming, Systems, Languages and Applications (OOPSLA). https://doi.org/10.1145/3586037
Kornel Lesinski. 2017. Citrus: Convert C to Rust. https://gitlab.com/citrus-rs/citrus.
Michael Ling, Yijun Yu, Haitao Wu, Yuan Wang, James R. Cordy, and Ahmed E. Hassan. 2022. In rust we trust: a transpiler
from unsafe C to safer rust (ICSE ’22). Association for Computing Machinery, New York, NY, USA, 354–355. https:
//doi.org/10.1145/3510454.3528640
Jiaxiang Liu, Xiaomu Shi, Ming-Hsien Tsai, Bow-Yaw Wang, and Bo-Yin Yang. 2019. Verifying arithmetic in cryptographic
C programs. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE,
552–564.
Alana Marzoev. 2022. How much does Rust’s bounds checking actually cost? https://readyset.io/blog/bounds-checks.
Mozilla Hacks. 2024. Porting a cross-platform GUI application to Rust. https://hacks.mozilla.org/2024/04/porting-a-crossplatform-gui-application-to-rust/.
The MSRC Team. 2019. A Proactive Approach to More Secure Code. https://msrc.microsoft.com/blog/2019/07/a-proactiveapproach-to-more-secure-code/.
National Security Agency. 2022. Software Memory Safety. https://media.defense.gov/2022/Nov/10/2003112742/-1/-1/0/CSI_
SOFTWARE_MEMORY_SAFETY.PDF.
National Vulnerability Database. 2014. Heartbleed bug. CVE-2014-0160 http://web.nvd.nist.gov/view/vuln/detail?vulnId=
CVE-2014-0160.
Rangeet Pan, Ali Reza Ibrahimzada, Rahul Krishna, Divya Sankar, Lambert Pouguem Wassi, Michele Merler, Boris Sobolev,
Raju Pavuluri, Saurabh Sinha, and Reyhaneh Jabbarvand. 2024. Lost in translation: A study of bugs introduced by
large language models while translating code. In Proceedings of the IEEE/ACM 46th International Conference on Software
Engineering. 1–13.
Marina Polubelova, Karthikeyan Bhargavan, Jonathan Protzenko, Benjamin Beurdouche, Aymeric Fromherz, Natalia
Kulatova, and Santiago Zanella-Béguelin. 2020. HACLxN: Verified Generic SIMD Crypto (for All Your Favourite
Platforms). In Proceedings of the ACM Conference on Computer and Communications Security (CCS).
François Pottier. 2009. Lazy least fixed points in ML. Unpublished. Manuscript available at http://gallium. inria. fr/˜
fpottier/publis/fpottier-fix. pdf (2009).
Jonathan Protzenko, Bryan Parno, Aymeric Fromherz, Chris Hawblitzel, Marina Polubelova, Karthikeyan Bhargavan,
Benjamin Beurdouche, Joonwon Choi, Antoine Delignat-Lavaud, Cédric Fournet, Natalia Kulatova, Tahina Ramananandro,
Aseem Rastogi, Nikhil Swamy, Christoph M. Wintersteiger, and Santiago Zanella-Béguelin. 2020. EverCrypt: A Fast,
Verified, Cross-Platform Cryptographic Provider. In Proceedings of the IEEE Symposium on Security and Privacy (S&P).

Compiling C to Safe Rust, Formalized

23

https://doi.org/10.1109/SP40000.2020.00114
Jonathan Protzenko, Jean-Karim Zinzindohoué, Aseem Rastogi, Tahina Ramananandro, Peng Wang, Santiago ZanellaBéguelin, Antoine Delignat-Lavaud, Catalin Hritcu, Karthikeyan Bhargavan, Cédric Fournet, and Nikhil Swamy. 2017.
Verified Low-Level Programming Embedded in F*. In Proceedings of the International Conference on Functional Programming
(ICFP).
Tahina Ramananandro, Antoine Delignat-Lavaud, Cédric Fournet, Nikhil Swamy, Tej Chajed, Nadim Kobeissi, and Jonathan
Protzenko. 2019. EverParse: Verified Secure Zero-Copy Parsers for Authenticated Message Formats.. In Proceedings of the
USENIX Security Symposium.
Tobias Reiher, Alexander Senier, Jeronimo Castrillon, and Thorsten Strufe. 2019. RecordFlux: Formal Message Specification
and Generation of Verifiable Binary Parsers. In Formal Aspects of Component Software: 16th International Conference,
FACS 2019, Amsterdam, The Netherlands, October 23–25, 2019, Proceedings (Amsterdam, The Netherlands). Springer-Verlag,
Berlin, Heidelberg, 170–190. https://doi.org/10.1007/978-3-030-40914-2_9
Rustls Contributors. 2016. rustls: A Modern TLS library in Rust. https://github.com/rustls/rustls.
Jamey Sharp. 2020. Corrode: Automatic semantics-preserving translation from C to Rust. https://github.com/jameysharp/
corrode.
Signal Messenger. 2020. libsignal-protocol-rust. https://github.com/signalapp/libsignal-protocol-rust.
StackOverflow. 2023. 2023 Developer Survey. https://survey.stackoverflow.co/2023/#section-admired-and-desiredprogramming-scripting-and-markup-languages.
Nikhil Swamy, Tahina Ramananandro, Aseem Rastogi, Irina Spiridonova, Haobin Ni, Dmitry Malloy, Juan Vazquez, Michael
Tang, Omar Cardona, and Arti Gupta. 2022. Hardening attack surfaces with formally proven binary format parsers. In
Proceedings of the Conference on Programming Language Design and Implementation (PLDI).
The Register. 2023. Microsoft is busy rewriting core Windows code in memory-safe Rust. https://www.theregister.com/
2023/04/27/microsoft_windows_rust/.
The White House. 2024. Back to the Building Blocks: a Path Toward Secure and Measurable Software. https://www.
whitehouse.gov/wp-content/uploads/2024/02/Final-ONCD-Technical-Report.pdf.
Jeff Vander Stoep and Alex Rebert. 2024. Eliminating Memory Safety Vulnerabilities at the Source. https://security.googleblog.
com/2024/09/eliminating-memory-safety-vulnerabilities-Android.html.
Sara Verdi. 2023. Why Rust is the most admired language among developers. https://github.blog/2023-08-30-why-rust-isthe-most-admired-language-among-developers/.
ZeroEx. 2019. OpenZKP. https://github.com/0xProject/OpenZKP.
Hanliang Zhang, Cristina David, Yijun Yu, and Meng Wang. 2023. Ownership guided C to Rust translation. In International
Conference on Computer Aided Verification. Springer, 459–482.