Memory Management & Safety
One of the most important aspects of writing Ruby extensions is proper memory management. This chapter covers how Ruby's garbage collector interacts with Rust objects and how to ensure your extensions don't leak memory or cause segmentation faults.
Improper memory management is the leading cause of crashes and security vulnerabilities in native extensions. Rust's safety guarantees help prevent many common issues, but you still need to carefully manage the boundary between Ruby and Rust memory.
Ruby's Garbage Collection System
Ruby uses a mark-and-sweep garbage collector to manage memory. Understanding how it works is essential for writing safe extensions:
- Marking Phase: Ruby traverses all visible objects, marking them as "in use"
- Sweeping Phase: Objects that weren't marked are considered garbage and freed
When you create Rust objects that reference Ruby objects, you need to tell Ruby's GC about these references to prevent
premature garbage collection. The TypedData
trait and mark
method provide the mechanism to do this.
TypedData and DataTypeFunctions
Magnus provides a TypedData
trait and DataTypeFunctions
trait for managing Ruby objects that wrap Rust structs. This
is the recommended way to handle complex objects in Rust.
Basic TypedData Example
Here's how to define a simple Ruby object that wraps a Rust struct:
#![allow(unused)] fn main() { use magnus::{prelude::*, Error, Ruby, TypedData, DataTypeFunctions}; // Define your Rust struct #[derive(TypedData)] #[magnus(class = "MyExtension::Counter", free_immediately)] struct Counter { count: i64, } // Implement required functions impl DataTypeFunctions for Counter {} // Implement methods for your struct impl Counter { fn new(initial_value: i64) -> Self { Counter { count: initial_value } } fn increment(&mut self, amount: i64) -> i64 { self.count += amount; self.count } fn value(&self) -> i64 { self.count } } // Register with Ruby fn init(ruby: &Ruby) -> Result<(), Error> { let class = ruby.define_class("Counter", ruby.class_object())?; class.define_singleton_method("new", function!(|initial: i64| { Ok(class.wrap(Counter::new(initial))) }, 1))?; class.define_method("increment", method!(Counter::increment, 1))?; class.define_method("value", method!(Counter::value, 0))?; Ok(()) } }
Implementing GC Marking
When your Rust struct holds references to Ruby objects, you need to implement the mark
method to tell Ruby's GC about
those references. Here's a simple example:
#![allow(unused)] fn main() { use magnus::{ prelude::*, Error, Ruby, Value, TypedData, DataTypeFunctions, gc::Marker, typed_data::Obj }; // A struct that holds references to Ruby objects #[derive(TypedData)] #[magnus(class = "MyExtension::Person", free_immediately, mark)] struct Person { // Reference to a Ruby string (their name) name: Value, // Reference to Ruby array (their hobbies) hobbies: Value, // Optional reference to another Person (their friend) friend: Option<Obj<Person>>, } // Implement DataTypeFunctions with mark method impl DataTypeFunctions for Person { // This is called during GC mark phase fn mark(&self, marker: &Marker) { // Mark the Ruby objects we reference marker.mark(self.name); marker.mark(self.hobbies); // If we have a friend, mark them too if let Some(ref friend) = self.friend { marker.mark(*friend); } } } impl Person { fn new(name: Value, hobbies: Value) -> Self { Self { name, hobbies, friend: None, } } fn add_friend(&mut self, friend: Obj<Person>) { self.friend = Some(friend); } fn name(&self) -> Value { self.name } fn hobbies(&self) -> Value { self.hobbies } fn friend(&self) -> Option<Obj<Person>> { self.friend.clone() } } // Register with Ruby fn init(ruby: &Ruby) -> Result<(), Error> { let class = ruby.define_class("Person", ruby.class_object())?; class.define_singleton_method("new", function!(|name: Value, hobbies: Value| { Ok(class.wrap(Person::new(name, hobbies))) }, 2))?; class.define_method("name", method!(Person::name, 0))?; class.define_method("hobbies", method!(Person::hobbies, 0))?; class.define_method("friend", method!(Person::friend, 0))?; class.define_method("add_friend", method!(Person::add_friend, 1))?; Ok(()) } }
In this example:
- The
Person
struct holds references to Ruby objects (name
andhobbies
) and another wrapped Rust object (friend
) - We implement the
mark
method to tell Ruby's GC about all these references - During garbage collection, Ruby will know not to collect these objects as long as the
Person
is alive
A Real-World Example: Trap from wasmtime-rb
Here's a slightly simplified version of a real-world example from the wasmtime-rb project:
#![allow(unused)] fn main() { use magnus::{ prelude::*, method, Error, Ruby, TypedData, DataTypeFunctions, typed_data::Obj, Symbol }; // A struct representing a WebAssembly trap (error) #[derive(TypedData)] #[magnus(class = "Wasmtime::Trap", size, free_immediately)] pub struct Trap { trap: wasmtime::Trap, wasm_backtrace: Option<wasmtime::WasmBacktrace>, } // No references to Ruby objects, so mark is empty impl DataTypeFunctions for Trap {} impl Trap { pub fn new(trap: wasmtime::Trap, wasm_backtrace: Option<wasmtime::WasmBacktrace>) -> Self { Self { trap, wasm_backtrace, } } // Return a text description of the trap error pub fn message(&self) -> String { self.trap.to_string() } // Return the wasm backtrace if available pub fn wasm_backtrace_message(&self) -> Option<String> { self.wasm_backtrace.as_ref().map(|bt| format!("{bt}")) } // Return the trap code as a Ruby symbol pub fn code(&self) -> Result<Option<Symbol>, Error> { match self.trap { wasmtime::Trap::StackOverflow => Ok(Some(Symbol::new("STACK_OVERFLOW"))), wasmtime::Trap::MemoryOutOfBounds => Ok(Some(Symbol::new("MEMORY_OUT_OF_BOUNDS"))), // More cases... _ => Ok(Some(Symbol::new("UNKNOWN"))), } } // Custom inspect method pub fn inspect(rb_self: Obj<Self>) -> Result<String, Error> { Ok(format!( "#<Wasmtime::Trap:0x{:016x} @trap_code={}>", rb_self.as_raw(), rb_self.code()?.map_or("nil".to_string(), |s| s.to_string()) )) } } // Register with Ruby pub fn init(ruby: &Ruby) -> Result<(), Error> { let class = ruby.define_class("Trap", ruby.class_object())?; class.define_method("message", method!(Trap::message, 0))?; class.define_method("wasm_backtrace_message", method!(Trap::wasm_backtrace_message, 0))?; class.define_method("code", method!(Trap::code, 0))?; class.define_method("inspect", method!(Trap::inspect, 0))?; Ok(()) } }
This example shows:
- A Rust struct that wraps WebAssembly-specific types
- Methods that convert Rust values to Ruby-friendly types
- A simple implementation of
DataTypeFunctions
(since there are no Ruby object references to mark)
More Complex Example: Memory References
Let's look at a more complex scenario involving memory management:
#![allow(unused)] fn main() { use magnus::{ prelude::*, gc::Marker, Error, Ruby, TypedData, DataTypeFunctions, typed_data::Obj, value::Opaque, RString }; // Represents WebAssembly memory #[derive(TypedData)] #[magnus(class = "Wasmtime::Memory", free_immediately, mark)] struct Memory { // Reference to a store (context) object store: StoreContext, // The actual WebAssembly memory memory: WasmMemory, } impl DataTypeFunctions for Memory { fn mark(&self, marker: &Marker) { // Mark the store so it stays alive self.store.mark(marker); } } // A guard that ensures memory access is safe struct MemoryGuard { // Reference to Memory object memory: Opaque<Obj<Memory>>, // Size when created, to detect resizing original_size: u64, } impl MemoryGuard { fn new(memory: Obj<Memory>) -> Result<Self, Error> { let original_size = memory.size()?; Ok(Self { memory: memory.into(), original_size, }) } fn get(&self) -> Result<&Memory, Error> { let ruby = Ruby::get().unwrap(); let mem = ruby.get_inner_ref(&self.memory); // Check that memory size hasn't changed if mem.size()? != self.original_size { return Err(Error::new( magnus::exception::runtime_error(), "memory was resized, reference is no longer valid" )); } Ok(mem) } fn mark(&self, marker: &Marker) { marker.mark(self.memory); } } // A slice of WebAssembly memory #[derive(TypedData)] #[magnus(class = "Wasmtime::MemorySlice", free_immediately, mark)] struct MemorySlice { guard: MemoryGuard, offset: usize, size: usize, } impl DataTypeFunctions for MemorySlice { fn mark(&self, marker: &Marker) { // Mark the memory guard, which marks the memory object self.guard.mark(marker); } } impl MemorySlice { fn new(memory: Obj<Memory>, offset: usize, size: usize) -> Result<Self, Error> { let guard = MemoryGuard::new(memory)?; // Validate the slice is in bounds let mem = guard.get()?; if offset + size > mem.data_size()? { return Err(Error::new( magnus::exception::range_error(), "memory slice out of bounds" )); } Ok(Self { guard, offset, size, }) } // Read the slice as a Ruby string (efficiently, without copying) fn to_str(&self) -> Result<RString, Error> { let ruby = unsafe { Ruby::get_unchecked() }; let mem = self.guard.get()?; let data = mem.data()?; // Extract the relevant slice let slice = &data[self.offset..self.offset + self.size]; // Create a Ruby string directly from the slice (zero-copy) Ok(ruby.str_from_slice(slice)) } // Read the slice as a UTF-8 string (with validation) fn to_utf8_str(&self) -> Result<RString, Error> { let ruby = unsafe { Ruby::get_unchecked() }; let mem = self.guard.get()?; let data = mem.data()?; // Extract the relevant slice let slice = &data[self.offset..self.offset + self.size]; // Validate UTF-8 and create a Ruby string match std::str::from_utf8(slice) { Ok(s) => Ok(RString::new(s)), Err(e) => Err(Error::new( magnus::exception::encoding_error(), format!("invalid UTF-8: {}", e) )) } } } }
This more advanced example demonstrates:
- Guarded Resource Access: The
MemoryGuard
ensures memory operations are safe by checking for resizing - Proper GC Integration: Both structs implement marking to ensure referenced objects aren't collected
- Efficient String Creation: Using
str_from_slice
to create strings directly from memory without extra copying - Error Handling: All operations that might fail return meaningful errors
- Resource Validation: The code validates bounds before accessing memory
Common Memory Management Pitfalls
These pitfalls can lead to crashes, memory leaks, or undefined behavior in your Ruby extensions. Understanding and avoiding them is crucial for writing reliable code.
1. Forgetting to Mark References
If your Rust struct holds Ruby objects but doesn't implement marking, those objects might be collected while still in use:
Click the eye icon () to see an additional example of marking multiple references in a more complex struct.
2. Creating Cyclic References
Cyclic references (A references B, which references A) can lead to memory leaks. Consider using weak references or redesigning your object graph.
3. Inefficient String Creation
String handling is often a performance bottleneck in Ruby extensions. Using the right APIs can significantly improve performance.
Creating strings inefficiently can significantly impact performance:
Both memory usage and performance are significantly improved by avoiding unnecessary allocations and copies. The eye icon () reveals additional efficient string handling examples.
4. Not Handling Exceptions Properly
Ruby exceptions can disrupt the normal flow of your code. Ensure resources are cleaned up even when exceptions occur.
RefCell and Interior Mutability
When creating Ruby objects with Rust, you'll often need to use interior mutability patterns. The most common approach is
using RefCell
to allow your Ruby objects to be mutated even when users hold immutable references to them.
Understanding RefCell and Borrowing
Rust's RefCell
allows mutable access to data through shared references, but enforces Rust's borrowing rules at
runtime. This is perfect for Ruby extension objects, where Ruby owns the object and we interact with it via method
calls.
A common pattern is to wrap your Rust struct in a RefCell
:
#![allow(unused)] fn main() { use std::cell::RefCell; use magnus::{prelude::*, Error, Ruby}; struct Counter { count: i64, } #[magnus::wrap(class = "MyExtension::Counter")] struct MutCounter(RefCell<Counter>); impl MutCounter { fn new(initial: i64) -> Self { Self(RefCell::new(Counter { count: initial })) } fn count(&self) -> i64 { self.0.borrow().count } fn increment(&self) -> i64 { let mut counter = self.0.borrow_mut(); counter.count += 1; counter.count } } }
The BorrowMutError Problem
A common mistake when using RefCell
is trying to borrow mutably when you already have an active immutable borrow. This
leads to a BorrowMutError
panic:
#![allow(unused)] fn main() { // BAD - will panic with "already borrowed: BorrowMutError" fn buggy_add(&self, val: i64) -> Result<i64, Error> { // First borrow is still active when we try to borrow_mut below if let Some(sum) = self.0.borrow().count.checked_add(val) { self.0.borrow_mut().count = sum; // ERROR - already borrowed above Ok(sum) } else { Err(Error::new( ruby.exception_range_error(), "result too large" )) } } }
The problem is that the borrow()
in the if
condition is still active when we try to use borrow_mut()
in the body.
Rust's borrow checker would catch this at compile time for normal references, but RefCell
defers this check to
runtime, resulting in a panic.
The Solution: Complete Borrows Before Mutating
The solution is to complete all immutable borrows before starting mutable ones:
#![allow(unused)] fn main() { // GOOD - copy the value first to complete the borrow fn safe_add(&self, val: i64) -> Result<i64, Error> { // Get the current count, completing this borrow let current_count = self.0.borrow().count; // Now we can safely borrow mutably if let Some(sum) = current_count.checked_add(val) { self.0.borrow_mut().count = sum; // Safe now Ok(sum) } else { Err(Error::new( ruby.exception_range_error(), "result too large" )) } } }
By copying count
to a local variable, we complete the immutable borrow before starting the mutable one, avoiding the
runtime panic.
Complex Example with Multiple Operations
When working with more complex data structures:
#![allow(unused)] fn main() { struct Game { players: Vec<String>, current_player: usize, score: i64, } #[magnus::wrap(class = "MyGame")] struct MutGame(RefCell<Game>); impl MutGame { fn new() -> Self { Self(RefCell::new(Game { players: Vec::new(), current_player: 0, score: 0, })) } // INCORRECT: Multiple borrows that will cause issues fn buggy_next_player_scores(&self, points: i64) -> Result<String, Error> { let game = self.0.borrow(); if game.players.is_empty() { return Err(Error::new( magnus::exception::runtime_error(), "No players in game" )); } // This would panic - we're still borrowing game let mut game_mut = self.0.borrow_mut(); game_mut.score += points; let player = game_mut.current_player; game_mut.current_player = (player + 1) % game_mut.players.len(); Ok(format!("{} scored {} points! New total: {}", game_mut.players[player], points, game_mut.score)) } // CORRECT: Copy all needed data before releasing the borrow fn safe_next_player_scores(&self, points: i64) -> Result<String, Error> { // Read all the data we need first let player_name: String; let new_player_index: usize; let new_score: i64; { // Create a block scope to ensure the borrow is dropped let game = self.0.borrow(); if game.players.is_empty() { return Err(Error::new( magnus::exception::runtime_error(), "No players in game" )); } player_name = game.players[game.current_player].clone(); new_player_index = (game.current_player + 1) % game.players.len(); new_score = game.score + points; } // borrow is dropped here // Now we can borrow mutably let mut game = self.0.borrow_mut(); game.score = new_score; game.current_player = new_player_index; Ok(format!("{} scored {} points! New total: {}", player_name, points, new_score)) } } }
Using Temporary Variables Instead of Block Scopes
If you prefer, you can use temporary variables instead of block scopes:
#![allow(unused)] fn main() { fn add_player(&self, player: String) -> Result<usize, Error> { // Get the current number of players first let player_count = self.0.borrow().players.len(); // Now we can mutate let mut game = self.0.borrow_mut(); game.players.push(player); Ok(player_count + 1) // Return new count } }
RefCell Best Practices
-
Complete All Borrows: Always complete immutable borrows before starting mutable borrows.
-
Use Block Scopes or Variables: Either use block scopes to limit borrow lifetimes or copy needed values to local variables.
-
Minimize Borrow Scope: Keep the scope of borrows as small as possible.
-
Clone When Necessary: If you need to keep references to data while mutating other parts, clone the data you need to keep.
-
Consider Data Design: Structure your data to minimize the need for complex borrowing patterns.
-
Error When Conflicting: If you can't resolve a borrowing conflict cleanly, make the operation an error rather than trying to force it.
Best Practices
- Use TypedData and DataTypeFunctions: They provide a safe framework for memory management
- Always Implement Mark Methods: Mark all Ruby objects your struct references
- Validate Assumptions: Check that resources are valid before using them
- Use Zero-Copy APIs: Leverage APIs like
str_from_slice
to avoid unnecessary copying - Use Guards for Changing Data: Validate assumptions before accessing data that might change
- Test Thoroughly with GC Stress: Run tests with
GC.stress = true
to expose memory issues - Handle RefCell Borrowing Carefully: Complete all immutable borrows before starting mutable ones to avoid runtime panics
By following these practices, you can write Ruby extensions in Rust that are both memory-safe and efficient.
Next Steps
In the next chapter, we'll explore performance optimization techniques that leverage Rust's strengths while maintaining memory safety.