User Space Primitive Documentation

Introduction

This documentation is largely still a collection of notes based off the previous C-based Twizzler system, and is being updated to reflect new APIs.

Where to begin

Twizzler introduces objects to organize persistent data, rather than files in traditional systems. This provides the benefit of not having to serialize and deserialize data to make it persistent.

Pages explaining the main abstractions of the OS are available at the following links: Objects (for the main data abstraction), Views (for thread environments), and Kernel State Objects (the security model). From these basics, there are a number of features provided by the Twizzler userspace that can be used to enhance programs, but are not necessary for understanding the fundamentals of the OS.

Building Twizzler

A bit of a time consuming process the first time, so make sure you have some nice tea or something before you start :)

Requirements

This build process has been tested on an Ubuntu 20.04 system with standard development tools installed, in addition to rustup (which is required). We require rustup because we will build our own toolchain during the build, and link the result through rustup for easier invocation of the Rust compiler.

To build a boot image, you'll need the limine bootloader installed. In particular, we need the EFI code to help boot Twizzler through their boot protocol.

To run qemu through the build system, you'll need qemu installed.

Overview

Installing the tools:

  1. sudo apt install build-essential
  2. sudo apt install python
  3. sudo apt install cmake
  4. sudo apt install ninja-build
  5. Install Rust https://www.rust-lang.org/tools/install

Building Twizzler is done in several steps:

  1. Building xtask.
  2. Building the toolchain.
  3. Building Twizzler itself.

Fortunately, step 0 is handled automatically whenever we try to do anything. That's because xtask is the "build system orchestrator". Essentially, building Twizzler requires using the right toolchain, target specification, and compile flags at the right times, so we've placed that complexity in an automation tool to make builds easier. To get an idea of what xtask is doing, you can run cargo xtask --help. Note that this repo's cargo config provides aliases for the common commands, as we will see below. In fact, it's advisable to NOT use the default cargo commands, and instead run everything through xtask.

Step 1: Building the Toolchain

This step takes the longest, but only has to happen once. Run

cd where/you/cloned/twizzler
cargo bootstrap

and then wait, while you sip your tea. This will compile llvm and bootstrap the rust compiler, both of which take a long time. At the end, you should see a "build completed successfully" message, followed by a few lines about building crti and friends. This process ends by linking the toolchain to rustup, which you can verify through rustup show, which should list twizzer under the installed toolchains.

Note that this use of rustup means that there can only be one active twizzler toolchain at a time.

Step 2: Building Twizzler

Now that we've got the toolchain built and linked, we can compile the rest of Twizzler. Run

cargo build-all

which will compile several "collections" of packages:

  1. The build tools, for things like making the initrd.
  2. The kernel.
  3. The userspace applications.

By default all will be built in debug mode, which will run very slow. You can build for release mode with:

cargo build-all --profile release

Step 3: Running Twizzler

You can start Twizzler in Qemu by running

cargo start-qemu

which will bootup a qemu instance. If you want to run the release mode version, you can run

cargo start-qemu --profile release

Step 4: Exiting Twizzler

At the moment Twizzler does not have a shutdown command. To exit the QEMU based simulation use the Ctrl-a X command which is a part of the simulator.

Objects

Definition

Objects are an abstraction of a set of related data with the same lifetime and permissions. This vague definition allows for applications to define what data is contained in a single object in a way that is most reasonable for the particular use case. For example, a B-Tree could contain all nodes in the same object given that the nodes likely have the same permissions and lifetime. However, another tree with different permissions for children could separate these nodes into different objects. For this second example, managing their lifetime can be done with ties.

Kernel interposition is only done when creating and deleting objects, leaving access and modification to userspace facilities and hardware. Access control is limited by specifying policies and letting hardware enforce those policies. This allows the kernel avoid involvement in access, improving performance without sacrificing security. Objects maintain a reference count to prevent deletion of object data when multiple pointers reference it.

Object Creation

When creating objects, the medium storing data can be chosen, such as choosing between volatile DRAM and non-volatile memory. While these options are supported by default, other types can be configured based on the hardware support of the particular machine. Different storage mediums provide different benefits and costs and a more in depth discussion can be found at lifetime.

When creating objects, a source object can be denoted, where the new object will be a copy of the original. This allows for easy versioning, as objects can be copied and kept as different versions. Copying an object uses copy-on-write, meaning another copy of the data is only created when a change is made, rather than immediately on creation.

IDs

While 264 object IDs provides a large enough space for a single computer's address space without worries of running out object IDs, adding the ability to generate IDs without having to interact with a central authority and the Twizzler's future of a transparent single id space on a distributed set of computers creates the possibility of collisions. Thus 128 bits are used to shrink the possibility of collisions and the ability to guess an object ID, while also creating a large enough ID space to allow for distribution.

ID derivation

IDs are derived by inputting a nonce, 128 bits of random data, to a hash function. The nonce is provided for objects created using copy-on-write, the objects where a src points to a valid object when calling twz_obj_new() so as to create unique IDs despite having the same object content.

There is also the ability to create object IDs by hashing the contents of the object. This is most useful for conflict-free replicated data types (CRDTs), where multiple computers are running distributed Twizzler and can aggresively replicate objects without worrying about consistency issues. Hashing to obtain an object ID is designed for immutable objects.

Object Lifetime

Volatile Memory

Placing objects in volatile memory (such as DRAM) limits the object lifetime to at most the time of the next power cycle. This can provide easy cleanup for temporary data, such as the result of computation or cached data kept in memory for locality (faster access).

Objects in volatile memory can be accessed and used in the same ways they are in non-volatile memory.

Ties

Ties handle object lifetime by allowing for automatically deleting objects once other constructs are deleted. For example, if an application crashes, the temporary computation might be useless, yet keeping the temporary computation in volatile memory until a power cycle occurs is a waste of that space. Instead, we can tie the lifetime of the object to other objects, such that the object is automatically deleted with the other, freeing the memory before a power cycle. This mechanism is convenient because the kernel does not have to maintain an understanding of the implied lifetime of an object, rather it can be specified relative to other objects.

Ties also provide the benefit of allowing temporary context (such as stacks and heaps) to be stored in persistent memory, allowing for recovery after a power cycle.

For example if we have two objects: koala and coldbrew, tying koala to coldbrew means that koala will not be deleted until after coldbrew is. While koala can be deleted immediately after coldbrew is, if koala is tied to multiple objects, it will only be fully deleted when the final object it is tied to is deleted. Additionally, if koala is tied to a handful of objects, once all of those objects are deleted, koala will be automatically deleted too. This is similar to the practice of creating a file and immediately unlinking it within Unix, so the file is automatically deleted once the file descriptor is closed.

Since most construts in Twizzler are just various types of objects, we can use ties to establish a lifetime of objects based on the existence of other objects, such as threads or views. For a more detailed explanation of views, see the page on views. For temporary computation done in a thread, an instance of computation, objects can be tied to the thread. This provides similar semantics to creating and immediately unlinking files on Unix, as with both, once the application exits, the data is deleted. With views, this provides an address space for an application to run in, possibly over multiple time periods. Tying an object to a view allows the object to exist for as long as the execution state of the application, which could be as long as the application is installed, or only removed when all application data is deleted.

Ties to Volatile Memory

Tying volatile objects to each other implies that both objects will be deleted at a power cycle, which is to be expected. However, things get a little more complicated with ties between volatile and persistent objects. Tying a volatile object to a persistent one breaks the semantics of ties in the event of a power cycle, as the volatile object is deleted and the persistent one is not. However, this outcome is to be expected, and we assume programmers will use this when doing temporary computation. Tying a persistent object to a volatile object is dangerous as both objects will be deleted in the event of a power cycle, including an unexpected one.

Pointers

Definition

There are two types of pointers in Twizzler: persistent and dereferenceable. Persistent pointers are used in order refer to external data with no extra context needed. They can be thought of file names on a traditional operating system, where the data exists longer than any process or power cycle. Dereferenceable pointers are references to data in ways that act like traditional memory accesses when programing on other operating systems (such as stack or heap data). Unlike persistent pointers, the data can be acted upon, such as by reading or writing, but additional context is necessary.

Rationale

Persistent pointers are much more efficient than file I/O, as there is a link to data from the pointer without the need for deserialization of a file. This just allows links of data in data structures, which is what objects are.

Foreign Object Table

Persistent pointers work by indexing into a Foreign Object Table (FOT), which holds a longer reference to the data, allowing for late binding of names. Late binding in FOTs are explained in more detail in a later section. Persistent pointers are thus just an index of external object wanted (16 bits, allowing for 65,536 object references), and an offset within the foreign object (40 bits, a maximum offset of 1 terabyte). Because access control is at an object granularity, multiple pointers to the same object can use the same FOT entry.

Late-Binding

Late binding of names is used often with libraries to allow for updates of the library without requiring every program using the library to be recompiled. In Twizzler, late-binding is done by putting a name as the entry in the FOT. When creating the FOT entry, a name resolver can also be specified to allow for different objects to have different name resolvers. The actual name resolution happens when converting the persistent pointer to a dereferenceable one, allowing for different objects to be resolved at different times, just based on the name at dereference time.

Permissions

A thread has permission to access an object if:

  • They have not been restricted by a mask (including global mask)
  • The thread has the capability, or delegated capability. (by attaching to a security context).
  • The thread knows the object's name. (security by obscurity)

Permission values for objects

There are 5 permissions an object can have: read, write, execute, use, and delete. Except for use (and to an extent delete), these permissions exist in Unix systems, and are used in the same way.

  • Read: This allows a thread the ability to look at the contents of an object.
  • Write: This allows a thread the ability to modify an object.
  • Execute: The object can be run as a program.
  • Delete: The object can be deleted. Usually Unix systems include this as part of write permissions, and Windows systems allow this to be a separate permission.
  • Use: This marks the object as available for the kernel to operate on, such as a kernel state object, further explained on kernel state objects. Often times this is used for attaching a thread to a security context.

Masks

Masks further restrict permissions to objects. This is similar to umask in Unix systems. For example, while by default any object may have access to an object called bloom, we may want a specific security context called Fall to not have access to the object.

We do not need signatures on masks because they are part of the security context, meaning threads can only modify the mask if they can modify the security context object.

Capabilities

Capabilities are when permissions a provided to objects as tokens, where the program can access the data if it has a valid token. Unlike previous implementations of capability systems, Twizzler includes an object ID as part of the capability signature to prevent a capability from being stolen by leaking the signature to malicious parties. While this does require identity to be checked in addition to the validity of the signature, this prevents simple leaks of secrets from breaking the security of an object.

Delegation

Delegation allow for capabilities to be shared and futher restricted with other views. In order to delegate a capability, it must have high permissions within the object it wishes to delegate (enough so as to access the private key of the object).

Late Binding Access Control

Rather than checking an object when it is initially accessed, such as in Unix with a call to open(), Twizzler checks access at the time when the operation is done, such as a read or write. This means that a thread can open an object with more permissions than allowed and not cause a fault, and only once that illegal operation is attempted will the fault occur.

This method for enforcing access control is different from Unix systems because the kernel is not involved for memory access, which is how Twizzler formats all data. However protection still exists because when loading a security context, the MMU is programmed to limit access.

Views

Views are an address space abstraction that sets an environment for threads to run in. Persistent objects are mapped into the view and given dereferenceable virtual pointers. These virtual pointers allow access to the object inside the view.

Because views are normal objects, they can be written on persistent memory and allow recovery of application state in the event of a crash or power cycle. The abstraction ov views also allows easy sharing of thread state, such as references to data. This is convient as it allows for sharing of data without requiring serialization through a construct like a pipe or file, and without the need of a call to mmap. Sharing views with other threads does provide a security threat, as one thread could corrupt the view for both. To deal with this security vulnerability, there is the abstraction of secure API calls/gates which allows communication between threads without allowing one or the other to corrupt another's data.

When a thread wants to map a new object into the view, they can call _____ and when they attempt to access the object, the kernel will automatically map it in. To change or remove an entry, the kernel must be involved with the function invalidate_view(), to update references to the underlying memory.

To switch between views, the system call become() is needed.

Kernel State Objects

These are normal objects used by both userspace programs and the kernel. For them to be used by the kernel, the use permission must be set. To learn more about the use permission, see Permissions.

Security Contexts

A security context is an object that contains information about which objects can be accessed and how (such as managing capabilities). A thread attaches to the security context to gain access to the objects. This can be useful for operations similar to the sudo command on UNIX, where privileges are temporarily increased in order to perform certain privileged operations without fully changing user ID.

Additionally, security contexts can be used to limit permissions. To prevent a limited thread from shedding their limited permission state, attached contexts can be set as undetachable.

Extensions

This is the interface abstraction in many programming languages, or the functions that must be implemented for drivers, such as read() and write(). In practice, Twizzler's implementation of this applies to objects, where a set of methods are defined and noted such that external threads accessing the object can call the interface methods without a need to understand the specifics of under the hood operations for the object.

Examples

Two examples of extensions are IO and Event. IO is useful for reading and writing to an object, and for an object to support the extension, the object must implement the functions read(), write(), ioctl(), and poll(). When registering the extension, the object will provide pointers to all of the functions, so calls to read() for example on the object will know how to implement the function in an object specific way.

Event is a way of waiting for something to happen to an object, similar to poll() on a file descriptor in Unix. Specific events can be waited for by using event_wait() with the object and event passed in as arguments. Because this is just an interface, an object can implement it in a way that makes sense to it, such as waiting for data from a network or a write to an object to complete.

Tags

Tags are a way of uniquely identifying an extension, such as IO, and checking if the object supports the extension. These are stored in the metadata for the object, and when the tag is added to the metadata, a pointer to the functions that implement the interface are also added.

Gates

Gates, also known as secure API calls, are a means of exposing a limited set of functions available to an external user. This is similar to system calls, where the user can call into the kernel to do specific actions. Gates are used for interprocess communication.

Gates are a way of an object exposing a system call like interface. This allows an object to define arbitrary behavior other threads can call. Because external threads can only access the object through the gate, they are restricted from detrimental actions, provided the gate is correctly written. While this does place the responsibility for secure code in the hands of any programmer rather than the typical relegation of secure code to security experts, gates are optional and can be avoided if there is worry about security flaws.

When writing gates, best security practices are required to avoid vulnerabilities in the gates. As such, beware of timing attacks and other side channels that can be used to subtly exploit the object.

Direct Memory Access (DMA)

A key aspect of a device driver involves programming the device to access host memory. When a device accesses host memory, we usually call it Direct Memory Access (DMA). DMA is used, for example, by NICs to access transmit rings, or to copy packet data into main memory (memory that the CPU, and thus the OS and user programs can access). However, devices access main memory differently to how threads running on a CPU access memory. Before we discuss the API that Twizzler provides for DMA, we should discuss how devices access memory, and the implications this has for memory safety, translation, and coherence.

Considerations for DMA

When programs access memory in Twizzler they do so via accessing object memory, which involves an MMU translating some kind of object address to a physical address. On x86, for example, this involves a software translation to a virtual address followed by a translation via the Memory Management Unit (MMU) to a physical address. Similarly, when a device accesses memory, it emits a memory address (likely programmed by the driver) that may undergo no translation or some other translation on the bus before attempting to access host memory. There are two important considerations that are the result of this alternate (or no) translation:

  • Contiguous addresses. While object memory is contiguous (within an object), the physical memory that backs that object memory may not be. Devices and drivers need to be capable of handling access to memory in a scatter-gather manner.
  • Access Control. Access control can be applied differently between host-side driver software and devices. Thus driver software must be aware that it may have access to memory via the device that it should not directly. We can use devices like the IOMMU to limit this effect.

In addition to the above, we need to consider the issue of coherence. While CPU caches are coherent across cores, devices accessing host memory do not necessarily invalidate caches. Thus we have to handle both flushing data to main-memory after writing before the device reads it and invalidating caches if a device writes to memory. Some systems automatically invalidate caches, but not all do.

Memory Safety

Finally, we must consider memory safety. While we can control writes from host software to DMA buffers, we cannot necessarily control how the device will access that memory. To ensure memory safety of shared regions, need to ensure:

  1. The device and host software cannot both mutate shared state at the same time (thread safety), or if this can happen, then the shared memory region that can be updated by both entities is comprised of atomic variables.
  2. The device mutates data such that each mutation is valid for the ABI of the type of the memory region.

Enforcing these at all times would add significant overhead. We take some inspiration from Rust's stance on external influences to memory, tempering this somewhat with the addition of a DeviceSync marker trait.

Overview of DMA System

The Twizzler DMA system is contained within the twizzler-driver crate in the dma module. The module exposes several types for using Twizzler objects in DMA operations along with an abstraction that enables easier allocation of DMA-able memory. The key idea behind Twizzler's DMA operation is that one can create a DmaObject, from which one can create a DmaRegion or a DmaSliceRegion. These regions can then be "pinned", which ensures that all memory that backs them is locked in place (the physical addresses do not change), and the list of physical addresses that back the region are made available for the driver so that it may program the device.

Coherence and Accessing Memory

The primary way that the driver is expected to access DMA memory is through the DmaRegion's with or with_mut method. These functions take a closure that expects a reference to the memory as argument. When called, the with function ensures coherence between the device and the CPU, and then calls the closure. The with_mut function is similar, except it passes a mutable reference to the closure and ensures coherence after the closure runs as well.

The DmaSliceRegion type provides similar with functions, except they take an additional Range as argument that can be used to select only a subslice of the region that the closure gets access to. Allowing for subslicing here is useful because it allows the driver to communicate to the library which parts of the region need coherence before running the closure.

Access Directions and Other Options

Regions can be configured when they are created for various different use cases.

The Access Direction refers to which entities (the device and the CPU) may read and write the memory. Driver writers should pick the most restricted (but correct) mode they can, as is can have implications for maintaining coherence. It can have one of three values:

  • HostToDevice: The memory is used for the host to communicate to the device. Only the host may write to the memory.
  • DeviceToHost: The memory is used for the device to communicate to the host. The host may not write to the memory.
  • BiDirectional: Either entity may write to the memory.

In addition to access direction, regions can be configured with additional options, a bitwise-or of the following flags:

  • UNSAFE_MANUAL_COHERENCE: The with functions will not perform any coherence operations. The driver must manually ensure that memory is coherent.

Pinning Memory

Before a device can be programmed with a memory address for DMA, the driver must learn the physical address that backs the DMA region while ensuring that that address is stable for the lifetime of whatever operation it needs the device to perform. Both of these are taken care of with the pin function on a DmaRegion or DmaSliceRegion. The pin function returns a DmaPin object that provides an iterator over a list of PhysInfo types, which can provide the physical address of a page of memory.

A region of DMA memory that comprises some number of pages (contiguous in virtual memory) can list the (likely non-contiguous) physical pages that it maps to. The order that the pages are returned in is the order that they appear for backing the virtual region. In other words, the 4th PhysInfo entry in the iterator of a DmaPin for a region contains the physical address of the 4th virtual page in the DMA region.

Any future calls to pin return another DmaPin object, but the underlying pin information (that is, the physical addresses) may be the same, even if the DmaRegion is dropped and recreated. However, if the DmaObject is dropped and recreated, the driver cannot rely on the pin to be consistent. More specifically, the pin's lifetime is tied to the DmaObject, not the DmaRegion. The reason for this somewhat conservative approach to releasing pins is to reduce the likelihood of memory corruption from accidental mis-programming. Another consideration for pinned memory lifetime is that it can leak if the driver crashes. Allowing for leaks in this case is intentional, as it makes it less likely that the device will stomp over memory in the case of a driver crash.

Pools

While we can use a DmaObject to perform DMA on an existing Twizzler object, it is common for a device driver to need a simple pool of DMA-able memory that it can allocate from so that it may communicate with the device (e.g. DMA memory for a ring buffer). For this, twizzler-driver provides a DmaPool type that can be used to allocate DMA regions that share an access type and a set of DmaOptions. The pool will internally create new Twizzler objects that it uses to allocate DMA memory from, which it then uses to create DMA regions on-demand.