dynlink/
lib.rs

1//! Welcome to the dynamic linker.
2//!
3//! The job of the dynamic linker (dynlink) is:
4//!   1. Load dynamic shared objects (DSOs) and their dependencies.
5//!   2. Fixup all the relocations inside those DSOs.
6//!   3. Manage TLS regions
7//!
8//! On the surface, this isn't too bad. But it's mired in a long history, compatibility, deep
9//! magic for performance, and a lack of good, easy to understand "official" documentation. However,
10//! we will, in this crate, try to be as clear and forthcoming with what we are doing and why.
11//!
12//! # Basic Dynamic Linking Concepts
13//! *What is a dynamic shared object (DSO)?*
14//! Practically speaking (and for our purposes), it's an ELF file that has been prepared in such a
15//! way that we can load it into memory, fix it up a bit based on where we loaded it (the file is
16//! relocatable), and then call code within it. The overall process looks like this:
17//!
18//! Loading:
19//! 1. Map the library into memory
20//! 2. Register TLS template, if the library has one
21//! 3. Register constructors, if any.
22//! 4. Insert the library into the global dependency graph (a directed graph, possibly with cycles)
23//! 5. For each dependency, recurse
24//! 6. Add edges from the library to each dependency
25//!
26//! Relocating (from a starting point DSO):
27//! 1. If marked done, return.
28//! 2. Recurse on all dependencies
29//! 3. For each relocation entry, 3a. Fixup the relocation entry according to its contents, possibly
30//!    looking up a symbol if necessary.
31//! 4. Mark as done
32//!
33//! Let's talk about loading first. In step 1, for example, we need to iterate the program headers
34//! of the ELF file, looking for PT_LOAD statements. These statements tell us how to setup the
35//! virtual memory for this program. Since these DSOs are relocatable, we can load them _at a
36//! specific base address_. Each DSO gets loaded to its own base address and is mapped into memory
37//! according to the base address and the PT_LOAD entries. In Twizzler, we can leverage the powerful
38//! copy-from primitive to make this easier.
39//!
40//! In steps 2 and 3 we are noting down information ahead of time. We want to record the loaded
41//! libraries for TLS purposes in this order, since we must reserve one exalted DSO to live right
42//! next to the thread pointer. On most systems, this is reserved for the executable. For us, it's
43//! just the first DSO to be loaded. We also note down if this library has any constructors, that
44//! is, code that needs to be run before we can call any other code in the DSO.
45//!
46//! In step 4, we just add the library into global context. At this point, we have recorded enough
47//! info that we can make this library namable and searchable for symbols. Finally, in the last two
48//! steps, we recurse on each dependency, and add edges to the graph to note dependencies. I should
49//! note that dependencies may have already been loaded (e.g. a library foo depends on bar and baz,
50//! and library bar depends on baz, only one copy of baz will be loaded), and thus if we try to load
51//! a library that already has been loaded according to some namespace, we can just point the graph
52//! to the existing node instead of loading up a fresh copy of the library. This is why the graph
53//! may have cycles, by the way.
54//!
55//! When relocating a DSO, we need to ensure that it is fixed up to run at the base address we
56//! loaded it to. As a simple mental model, we can imagine that if we had some static variable, foo,
57//! that lives in a DSO. When linking, the linker has no idea where the dynamic linker will end up
58//! putting the DSO in memory. So when accessing foo, the compiler emits some _relative_ address for
59//! reaching foo, say "0x300 + BASE", where BASE is a 64-bit value in the code. But again,
60//! we don't know the base address, so we need to emit an entry in a relocation table that tells the
61//! dynamic linker, "hey, when you load this DSO, go to _this spot_ (where BASE is) and change it to
62//! the actual base address of the DSO".
63//!
64//! In practice, of course, its more complex, there are optimizations, there are indirections, etc,
65//! but this is basically the idea. In the steps listed above, we perform a post-order depth-first
66//! walk over the graph, performing all relocations that the DSO specifies.
67//!
68//! One key idea that happens in relocations is _symbol lookup_. A relocation can say, "hey, write
69//! into me the address of the symbol foo", and the dynamic linker will go look for that symbol's
70//! address by name. This is possible because each DSO has a symbol table for symbols that it is
71//! advertising as useable for dynamic linking. The dynamic linker thus, when looking up symbols,
72//! transitively looks though a DSO's dependencies until it finds the symbol. If it doesn't, it
73//! falls back to a global lookup, where it traverses the entire graph looking for the symbol.
74//!
75//! # Basic Concepts for this crate
76//!
77//! ## Context
78//! All of the work of dynlink happens inside a Context, which contains, essentially, a single
79//! "invocation" of the dynamic linker. It defines the symbol namespace, the compartments that
80//! exist, and manages the library dependency graph.
81//!
82//! ## Library
83//! This crate calls DSOs Libraries, because in Twizzler, there is usually little difference.
84//!
85//! ## Error Handling
86//! This crate reports error with the [error::DynlinkError] type, which implements std::error::Error
87//! and miette's Diagnostic.
88//!
89//! ## Compartments
90//! We add one major concept to the dynamic linking scene: compartments. A compartment is a
91//! collection of DSOs that operate within a single, shared isolation group. Calls inside a
92//! compartment operate like normal calls, but cross-compartment calls or accesses may be subject to
93//! additional processing and checks. Compartments modify the dependency algorithm a bit:
94//!
95//! When loading a DSO and enumerating dependencies, we check if a dependency can be satified within
96//! the same compartment. If so, dependencies act like normal. If not, we do a _global compartment
97//! search_ for that same dependency (subject to restrictions, e.g., permissions). If we don't find
98//! it there, we try to load it in either the same compartment as its parent (if allowed) or in
99//! a new compartment (only if we must). Thus the dependency graph is still correct, and still
100//! allows symbol lookup to work, even if libraries' dependency relationships may cross compartment
101//! boundaries.
102
103#![feature(never_type)]
104#![feature(iterator_try_collect)]
105#![feature(allocator_api)]
106#![feature(result_flattening)]
107#![feature(alloc_layout_extra)]
108#![feature(pointer_is_aligned_to)]
109
110// Nothing arch-specific should export directly.
111pub(crate) mod arch;
112
113mod error;
114pub use error::*;
115
116pub mod compartment;
117pub mod context;
118pub mod engines;
119pub mod library;
120pub mod symbol;
121pub mod tls;