Making OCaml Safe for Performance Engineering

Chris Casinghino

Jane Street

Jane Street is a trading firm that uses a variety of high-performance systems built in OCaml to provide liquidity to financial markets worldwide. Over the last couple of years, we have started developing major extensions to OCaml’s type system, with the primary goal of making OCaml a better language for writing high-performance systems. In this talk, we will attempt to provide a developer’s-eye view of these changes. We’ll cover two major directions of innovation: first, the addition of modal types to OCaml, which opens up a variety of ambitious features, like memory-safe stack-allocation; type-level tracking of effects, and data-race freedom guarantees for multicore code. The second is the addition of a kind system to OCaml, which provides more control over the representation of memory, in particular allowing for structured data to be represented in a cache-and-prefetch-friendly tabular form. Together, these features pull together some of the most important features for writing high performance code in Rust, while maintaining the relative simplicity of programming in OCaml. In all of this, we will focus less on the type theory, and more on how these features are surfaced to users, the practical problems that they help us solve, and the place in the design space of programming languages that this leaves us in.

Transcript

I work on the tools and compilers team at Jane Street, and the particular sub-team that I work on, it’s called the OCaml Language Team. And we work on a bunch of different things. We build the core libraries that Jane Street uses and open sources. We work on tools for things like refactoring code and pre-processing code. But the thing that I’m gonna talk about today is our work on the OCaml programming language itself.

I think OCaml is a great language for many purposes, including writing high-performance software. And if you go on YouTube you can find lots of talks of people from Jane Street explaining why we think it’s a great programming language. But today I’m gonna talk about the flip side of that a little bit. Some of the things we think could be better about OCaml and the ways that we’re trying to make it better.

Okay, so before I dive into, you know, the details of the extensions we’re building, I want to sort of put us on a level playing field. I assume not everybody here is an OCaml expert, so I’m gonna talk a little bit for the first third of the talk or so about how OCaml works today and why some of those things are maybe suboptimal for some of Jane Street’s use cases. And then after we get through that, I’ll cover the ways that we’re trying to make it better.

Okay, let’s dive in.

So I’m gonna start with a pretty low-level detail about OCaml, which is the way it represents values in memory. And this is a fact about OCaml that, you know, most OCaml programmers don’t really have to think about in their day-to-day programming, but if you’re writing really high-performance applications, you do have to think about this fact.

And so OCaml has what’s called a uniform representation of values. So every value in the OCaml program is represented exactly the same way. They’re all one word wide and that word is either an immediate with the data packed right into that word or it’s a pointer to a block on the heap containing the data that your value represents.

So immediates are anything that fits in a machine word. Actually, anything that fits in 63 bits will represent it as an immediate. And if your type is 63 bits or less, then I will pack it into the first 63 bits of a word and we’ll make the last bit one to mark this as that immediate type. So things like integers, characters, and Booleans are represented this way in OCaml.

On the other hand, if it doesn’t fit into 63 bits, then we represent it as a pointer to a block on the heap. And this block has a uniform layout. It starts with a header word that explains how big the block is, for example, and then it has a word for every value contained in that block. So for example, a record, we represent it this way and it would have one sub-word for each field of the record.

So, this is really cool. Every value in OCaml is represented as just one word, either an immediate or a pointer. And this is really useful for the garbage collector, among other things.

Why is it useful for the garbage collector? Well, OCaml is a garbage collected language and so the garbage collector has to like scan all the data associated with your program. And this uniform representation of values makes that really easy. The garbage collector can just look at any OCaml value and it knows it’s gonna be one word wide and that if it ends in a zero, it is a pointer to a block on the heap. All words that end in zero in the OCaml program are always valid pointers to blocks on the heap. And anything that ends in a one, it knows is an immediate and it doesn’t need to track for the purposes of garbage collection. So this uniform representation makes it easier to implement the runtime, the garbage collector.

It’s also really useful for the implementation of polymorphism for a fast-separate combination of polymorphic functions. So probably everybody knows what polymorphism is. This is also what’s called generics in languages like Java and C++.

But let’s talk about another nice feature of OCaml, which is polymorphism. So OCaml has a nice, excuse me, expressive static type system with type inference. So here’s a function, an OCaml function, the map function, it’s one of the most common functions. It takes a list L and a function F, and it applies the function F to every element of the list, producing a new list.

So this is a very simple function and the details of its implementation don’t matter that much for this talk, but one thing that’s cool about it is that it has this type and OCaml could automatically figure out this type for you. So this type says the map takes a list and a function from tick A to tick B and returns a tick B list. And this tick A and tick B, these are type variables. They stand in for any type, kind of like a generic variable in Java or C++. And OCaml doesn’t need any help. You didn’t have to write any annotations on your function map in order for it to figure out that it can get this very generic type, in some sense, the most general type possible for this map function.

So because we have nice type inference, you can figure out these very generic types, polymorphism is simple and pervasive in OCaml.

But coming back to our uniform representation of values, how do you implement polymorphism? It’s not actually obvious how to do that in most programming languages. If the data in OCaml had different shapes, if sometimes it was multiple words wide and sometimes it had to be passed in different registers, it would be hard to take a function like a map and compile it to one assembly code implementation because how does that assembly code call the function F? It doesn’t know whether the data should be passed in a general-purpose register or a floating-point register, how many words wide it is.

So because we have this nice uniform representation of values, we can compile polymorphic functions really easily because it doesn’t matter what type they operate on, that type is always one word. So we can compile every polymorphic function just once and rely on this uniform representation of values, which makes compilation fast and easy. Of course, other languages have other approaches to features like polymorphs or generics. In C++, to a first approximation, the compiler makes one copy of your function per type. So the compiler does a little static analysis to see all the different types that you use a generic function at and it’ll make a copy for each of those different types.

And this has some advantages and disadvantages relative to the OCaml approach. For example, you know, C++ might be able to optimize these functions a little bit more because it’s building a specialized copy for each individual type. But on the downside, there are many copies of the functions now, and this means your binaries are gonna be bigger and also there’s more pressure on the instruction cache. So you might call map several different times in your program. And even if you call it one right after the other, it might not be on the cache because it’s actually a different copy of map that we don’t have to go fetch.

And in C# and other languages, they approach polymorphism with compile time or runtime code generation. So this is a JIT, right? Basically in C#, when you call a function on a type, the actual language runtime goes and checks to see if you have an implementation of the function for that type yet. And if not, it’ll compile it right then for you. It’ll sort of pause the execution of your program, compile the function, and then you can use that function. This is called just-in-time compilation, also a valid approach.

Okay, so changing gears a little bit, I wanna talk about one other feature of OCaml before we talk about ways it could be better, and that is that OCaml has parallelism, which sounds like a boring thing because most programming languages support parallelism. But in OCaml, this is actually kind of a new innovation. Before OCaml 5, which only came out a few years ago, OCaml had no support for shared memory parallel programming, which made it really very far behind the times in terms of modern programming language implementations.

But OCaml 5 did come out and it does have support for multi-core parallel programming with shared memory, which is tricky in a language, in a memory-managed language like OCaml, ‘cause you need to build a garbage collector and a runtime that can deal with the fact that you have multiple threads executing concurrently and touching the same shared heap. And although OCaml was late to this party, I think being late meant that we could, or the OCaml community could do a really good job at implementing parallelism and building a parallel runtime.

So in particular, OCaml has a sane memory model. What do I mean by that? Languages like C++, for example, in the presence of data races, they have what you might call a halt and catch fire semantics. In the presence of a data race, all bets are off. Anything could happen. You know, your computer can explode, and okay, this sounds ridiculous, probably your computer is not gonna actually explode, but you can see really, really surprising behaviors in your program.

You might think that if you have a data race, okay, well, once the data race happens, values might be corrupted, but before that, you know, everything is fine. That’s not the case in C++. You can actually see corrupted values before you reach the point of the program where you seem to have a data race. And also data races are not bounded in space, by which I mean you might think you have a data race on some variables and therefore their values can be corrupted and anything they flow to can be corrupted, but other values are gonna be fine. And that’s also not the case in C++. If you have a data race, values that appear to be totally unrelated from the data race in your program can have arbitrarily corrupted values. Sorry, and I should have said a data race, of course, is when you have multiple threads that access the same memory and at least one of them is modifying it.

In OCaml we have this really cool property that data races are bounded in space and time and that’s, you know, so when you do have a derivation in your program, it only affects the variables that are involved in the data race, and it only happens at times that seem like they’re logically consistent with the program that you’ve written. And this means that debugging data races is much easier. I’m sure most of you have written parallel programs and you know that it is basically impossible for humans to write data race-free programs without substantial machine support. It’s just really hard to do. And that’s compounded by the fact that they have these weird behaviors. It makes them really hard to debug. It’s a lot like debugging a memory corruption bug. You know, the consequences of this data race are just so far removed from the logical place where it appears in your program that it can be really hard to track down.

So this, the fact that data races are bounded in space and time is a real advantage to the OCaml parallelism implementation. But it’s not perfect. You may know that in Rust, for example, there’s a race-free programming model. So if you write your program in the safe fragment of Rust, then the language makes a guarantee that it will have no data races. The type system of Rust guards against that. And that’s not the case in OCaml. We do not have, at least in OCaml 5, out of the box, a race-free programming model.

Okay, so we’ve talked about some facts about how OCaml works and I wanna call your attention to like two aspects of that that are less than ideal, which are probably obvious. So I talked about how great it is to have a uniform representation of values and how this makes it easy to implement polymorphism and to implement the garbage collector. But it does mean you have very limited control over like the layout of your memory and of your program’s data. So if you are writing a really high-performance application where that can be important, this can be a problem.

So just to give you some examples of how bad it can be, in OCaml it’s very hard to keep data compact. Remember I said everything is one word wide. So that means if I have 8 bits of data, it’s, well, we just pack it into those 63 bits at the beginning of an immediate and stick a 1 at the end. So your 8 bits, your byte takes a whole word. And it gets worse.

How about an int64? So I said that if your type fits in 63 bits, we can represent it as an immediate, and otherwise we have to represent it as a pointer to a block that’s allocated on the heap, right? And an int64 doesn’t fit in 63 bits, nor does an IEEE float. So those things we have to actually, every int64, we have to represent as a block on the heap that’s allocated. And this means they take, you know, you have to have the header word and then it turns out in the particular case of int64, there’s an additional word called the custom word, which doesn’t matter. And then the actual data, which means int64’s in OCaml are a three-word allocated block on the heap. Okay? This is not great for writing high-performance software.

So in addition to the fact that this makes data big, all these pointers mean that the prefetch is defeated. If you have an array of a type that’s not an immediate, you have an array of pointers and that means their data is sort of all spread out over the heap and you’re not gonna get good cache locality, which is really important in high-performance programming. And it also means that the garbage collector has to manage everything. It has to manage every int64 and that increases the amount of work the garbage collector has to do.

And OCaml has a very performant garbage collector. We write lots of extremely performant software at Jane Street that allocates lots, but it would be better if we could let it do less work by not, like, making it manage every int64. And as we talked about, unsafe parallelism is no fun. Even with a good memory model, you are going to write data races and those data races are almost always bugs. And so it would be better if we had a way to statically rule them out.

Also, the good memory model does cost something. So the way that we achieve data races being bounded in space and time is by adding a little bit of extra synchronization in the runtime, the part of OCaml that manages parallel processes. And these synchronization steps are pretty cheap. They did a lot of measurement and they arrived at a really performant parallel runtime, but they do cost a little bit. And if we could statically rule out data races, then we wouldn’t need those additional checks.

Okay, so we’d like to fix those things about OCaml, and we are working on that at Jane Street. It’s loads of fun. So I want to tell you about sort of our design goals as we add features to OCaml to try to improve the situation.

We are trying to give our programmers safe, convenient, and predictable control over the performance-critical aspects of their program. And I want to talk about each of those a little bit. So by safe I mean the kind of safety that OCaml has always had. OCaml is type safe and memory safe. If your program type checks, it is guaranteed not to segfault at runtime and it’s guaranteed that it won’t have a type error at runtime. And this is a really nice property that makes it easier to write OCaml programs and easier to review them and make sure they’re correct, which is really important, you know, in a financial setting as you all know.

And so we want to give people more control over their program’s behavior performance-wise, but we don’t want to give up the safety.

But also these features should be convenient. OCaml is a high-level declarative programming language that makes it easy to write software quickly and we’d like these features to have that character too.

And finally predictable. So I’m not going to talk today about compiler optimizations. This is something we do work on. We have a team working on compiler optimizations and they do great, great stuff. But a problem with relying on the compiler to make your program faster for you is that it’s very hard to predict whether that’s going to happen or not. Compiler optimizations are often based on heuristics or they sort of pattern match on the exact structure of your program. And this means that if you introduce small changes to your program, all of a sudden, the compiler optimization might start happening or stop happening. And it’s just very hard if you really care about the exact performance you’re getting to guarantee that these optimizations are giving you what you need.

So rather than relying on the compiler to make your program faster, what we want to do is give you the power to express the performance behavior that you desired out of the compiler and guarantee statically that that’s what you’re going to get. And also only where you need it. So there are some people at Jane Street that write software that needs to be as performant as you could possibly imagine, but actually the vast majority of Jane Street developers are just writing normal software whose performance we want to be good but doesn’t, you know, need to be measured in nanoseconds. And we want to make sure that as we’re adding these features to the language, we’re not making those users pay for it. We shouldn’t be adding complexity that everybody has to see. These features should only cost you something if you use them in terms of complexity.

And we’re doing this in OCaml. We think we benefit, as Mark said, a lot from the open-source community. We’re not like forking the language. We’re just building new OCaml features that we hope to put into main upstream OCaml later. And we want to make sure that we can continue to use public open-source OCaml libraries and that we can continue to release our libraries for other people to use. So our extensions need to be backwards compatible with mainline upstream OCaml.

So those are our design goals.

Okay, so now finally a solid, what, probably 20, 22 minutes in, I will start talking about some of the new things that we’re adding to OCaml.

So there’s three major features that I’m going to talk about today. One of them is narrow and flat data layouts to solve that problem of giving users control over the memory of their program. Another is support for stack allocation so that we can rely on the garbage collector a little bit less for data that does need to be allocated. And finally we’ll talk about race-free parallel programming.

So we’re doing this, you know, in, as I said, in OCaml and preserving type safety. And to do that we’re using two sort of fancy-ish features from the type system literature, fancy features, I should say. So these features are called kinds and modes, and if you have never encountered those words in a programming language context, don’t worry, I will explain them as we go.

Okay, so let’s start with narrow and flat data layouts or unboxed types. This part’s pretty simple. I mean, the goal here, hopefully, is pretty clear. We’d just like to add new types to OCaml that have different representations than the classic uniform representation that we talked about. So we’d like to add types like these, int32 hash, int8 hash, float hash, and I’ll pronounce these unboxed int32, unboxed int8, unboxed float. So these should just be the bits that you asked for, not like packed into 63 bits and then have a one added at the end, and an unboxed int32 should just be 32 bits.

And we’d like to somehow add these types to OCaml so that you can do all the normal things you do with OCaml types with them. So you should be able to put them in records. This record has an unboxed int8 and an unboxed int32. So maybe it even fits in one word because it only needs 48 bits, 40 bits, I can count. And in addition to that, maybe we can have even fancier kinds of unboxed types.

So how about an unboxed record? So this thing with a hash brace at the front is an unboxed record. What do I mean by that? So a normal record like the one before, as we talked about, is going to be represented as a pointer to the heap containing these values, right? An unboxed record is different. An unboxed record is kind of a fiction for the programmer. The programmer gets to manipulate it as if it’s one thing and they can project from it and update it, but under the hood it’s represented by just passing around the contents of the record individually. So when you pass one of these to a function, it probably gets passed in two registers. One holding the int8 and one holding the int list, rather than a pointer to a block on the heap.

And, you know, even more complicated layouts, like we’d like to have an array of unboxed records. So as we talked about in OCaml, if you have an array of data that doesn’t fit in 63 bits, that’s always an array of pointers. But maybe we can have an array of unboxed records like this. So we just have like the int8 and the int32 laid out next to each other in memory repeatedly like you would see in any normal sane programming language.

So the goal is to add these types to the language, but there’s an obvious problem which is that I explained at great length how we depend on OCaml’s uniform representation of values in order to implement things like the garbage collector and polymorphism. So what are we going to do about that? And the answer is we’re going to add a way for the type system of OCaml to track the shape of your type and use that information to help the garbage collector and to compile polymorphism. And we call those shapes layouts and layouts are a particular form of kind from the type systems literature.

So kinds you can think of as being like types of types. So in a normal programming language, types classify values, right? Your different values, each different value has a type. And so kinds are like that but for types, each type has a kind and that kind in our case is a layout, it classifies the shape of the type. So we’re going to use these layouts so the type system can track them and then the compiler can use them to know how to manipulate data appropriately.

So what kind of layouts are there? Well we have a lattice of layouts. Technically it’s a join semi-lattice but hopefully nobody will call me on that. So if we start at the bottom left there, value is the layout of classic OCaml values. So every type that existed in OCaml before we added unboxed types, that’s going to have the kind value. Bits8 is going to be the kind of types that fit in 8 bits. And bits64 is going to be the kind of types that fit in 64 bits.

And we also have float64 because bits64s are passed in general-purpose registers, but float64s are passed in floating point or SIMD registers. And we can even have a kind like void. Void is the kind of types that don’t have any representation at runtime. OCaml has a fancy type system, sometimes you pass around values to functions just for type information and we can now give those layout void and just erase them at runtime.

So there’s also this layout at the top called any, which is a weird one. So any is the layout of a type where the type system hasn’t figured out what its layout is. And this is hard to use because if you have a type whose layout is any, the compiler doesn’t know how to compile a program using that type, it just rejects it. But it can still be really useful in signatures. You might imagine that we have a signature for like types that we know how to print. So it’s a signature that has a function, like T arrow string or something like that. And the point is that signature makes sense for any T regardless of its layout. And so we might want to write down that signature once using the layout any. Now any individual implementation of that signature for a given type will specialize the signature to a given layout. But the signature itself can be shared across all layouts by using the layout any.

And we’re also going to add aggregate layouts. So here’s the layout of an unboxed record containing four things, one of which doesn’t actually need a representation at runtime, so maybe it’s three things. And yeah, product layouts like that.

Okay, so how does this work in practice in the type system? Well, the type system just keeps track of the layout of the types and we have to talk about how to implement the GC and how to implement polymorphism briefly.

How do we implement polymorphism? Well we do what other programming languages do. We just have to make a copy of your function for each different layout that you use it at. And this is still, I would say better than what C++ does. C++ is going to make a copy per type approximately. We only have to make one copy per layout, so per like class of types, which is going to be a lot better.

And as for the GC, we just have to teach some new tricks. So, for example, if you make a record containing some types that are classical OCaml values and some types that are boxed, we’re now going to modify that header word that I talked about that records have to record which values the GC is supposed to look at and which values the GC is not supposed to look at. And that’s how it knows which values to look at and it won’t ever look at a value that ends in a zero but it’s not a pointer.

That’s unboxed types. So that’s the first of the three features that I’m going to talk about today. The second is stack allocation. I have more to say about stack allocation, although we’ll see how we do on time.

Heap allocation versus stack allocation. So OCaml is a garbage-collected language and like most garbage-collected languages, all the data that it allocates, it allocates on the heap and heap allocation is expensive relatively speaking. I mean as I said earlier, OCaml is very good at this and we have a very performant garbage collector but it does cost something.

OCaml has what’s called a generational garbage collector. So there’s a quote-unquote minor heap where small and short-lived values live and a major heap where big longer-lived values live. And managing memory on the major heap in particular does really have an expense. The minor heap is better. The minor heap is basically just a bump allocator. So there’s just a big slab of memory and every time you allocate something on the minor heap, we increment the pointer to make some space for it and that’s very cheap but it’s still cache inefficient.

So there’s a real trade-off when you think about how big your minor heap should be. Every time the minor heap fills up we have to do garbage collection and like check which values in the heap are live and move them to the major heap and free up the memory and that costs something. So we want the minor heap to be big so that we don’t have to do that very often. But also if the minor heap is big, it doesn’t fit in the cache and so we have bad cache locality. So there’s just a trade-off here and there’s no good answer.

So stack allocation can be better, stack allocation in the sense you’re all familiar with from C or whatever your favorite programming language is. It’s kind of similar to minor heap allocation in that there’s just a pointer that gets bumped, right? But collection happens much more cheaply because you know in C when your function returns, we just decrement the stack pointer, we don’t have to scan the values. You’re supposed to make sure they don’t live past the lifetime of your function. And so we can just decrement the stack pointer, we don’t have to do any scanning.

Also, in general, this will touch fewer cache lines than the minor heap because we’re reusing the same memory every time you call and return from a function. So we’d like to add stack allocation to OCaml, but there’s a problem which is that we said we’re trying to keep OCaml safe, right, and stack allocation is not obviously safe. In order to make it safe, we need to follow a stack discipline.

So like turning back to our example of C, there’s a classic kind of bug in C where you allocate something on the stack and then you store a pointer to it somewhere and you kind of forget that it was allocated on the stack and now your function returns but that pointer is still stored somewhere. And if you look at it later, well, that data is now some other memory entirely, it’s representing some other value. This is a classic kind of use-after-free bug. We want to make sure you, you can’t have that kind of bug in OCaml somehow.

So what do you need to know about a value to know that it respects the stack discipline? Well mainly you have to not do that thing I just said. We should only stack-allocate values if we never create pointers to them from the heap because the heap lives for an indefinite amount of time and stack-allocated values are only going to live for the lifetime of one function. So if you create a pointer to a value on the heap, we don’t stack allocate that value.

Also, if you return a value from your function, we shouldn’t stack allocate it because when you return from your function, we’re going to pop the stack frame, right, so then it immediately goes out of scope, its lifetime ends, and we shouldn’t let it live past the end of this function.

Okay? So we need to add some features to the type system to track these properties of values so that we can figure out what can be stack allocated and what can’t be stack allocated. Before I talk about what those features are, I should explain why we don’t just do what Rust does.

So probably some of you know Rust and Rust is a great language, no complaints about Rust. So why don’t we use, well sorry, I guess I’m about to get to some complaints about Rust, so I just lied. Why don’t we use Rust-style lifetimes?

Well lifetimes are, so sorry I should say what lifetimes are. Rust is a manually managed language and the way it works in Rust is that every value gets associated with a lifetime that is like how long that data lives. And these lifetimes can be arbitrarily complex, tracking the lifetime of something across many function calls and like boundaries that don’t correspond to the lexical scope of the program at all. And this lets it, you know, free memory exactly when it’s dead rather than doing a collection step to figure out what’s dead later.

So, but lifetimes are complicated. So in Rust, the way stack allocation works is that functions take a parameter, which is the lifetime of that function. And if your value that you allocate during the function can have that same lifetime that the function does itself, if it’s polymorphic in the lifetime, then that value can be allocated on the stack. And this sounds pretty simple and straightforward but there’s a bunch of problems with it.

The first thing is you often trip into the need for what’s called higher-order polymorphism. So Rust has higher-order functions, that is, functions that take functions as arguments. And so now there’s like multiple lifetime parameters, right? This function has a lifetime parameter and those functions that it takes as arguments also have lifetime parameters. And it turns out that in the presence of this kind of polymorphism, type inference is undecidable. So there’s no way to write a type inference engine that could automatically figure out for you lifetimes in your Rust program. It’s inevitable that sometimes the user will have to add lifetime annotations.

And as we talked about at the beginning, we’re trying to preserve this character of OCaml having nice type inference and it being easy to use our fancy features. So that doesn’t fit in well. Also, these lifetimes get very complicated and they need to get complicated in Rust because every everything you allocate, we need to understand its lifetime exactly. But in OCaml we have a garbage collector. So if something has a complicated lifetime, we can just decide to allocate it on the heap and let the garbage collector clean it up when it notices it’s dead. We don’t have to figure out how to give it a complicated lifetime that tracks that exact period of time.

So with those things in mind, what we’re using is modes. So modes are another fancy system, a fancy thing from type systems literature. And the way we’re using modes in our version of OCaml is that there are properties that can be applied to any type. So earlier I talked about kinds. Kinds kind of divide the universe of types into different buckets by their shape in our case, right? Modes are different. Modes are orthogonal to types. So it makes sense to talk about any value of any type at any mode. And we’re going to use these modes to track properties like whether something lives past the lifetime of the function that it’s allocated in.

And in our case, these properties are all going to be deep properties. So if you have a deeply nested record with a bunch of pointers in it and we say that it has a certain mode, we’re talking about the whole value all the way down.

So to support stack allocation, we add two modes, global and local. Global is the default mode, this is the sort of legacy mode for classic OCaml values. And it means that the lifetime of something that has mode global is unconstrained. We make no guarantees about how long it lives. On the other hand, if something has the local mode, then we guarantee that it follows the stack discipline. So something can only be assigned to the local mode if it’s not like stored on the heap, we don’t create any pointers to it or return to it from function. And the idea is that we’re going to have the type system automatically figure out for you whether something has the local mode by tracking how you use values that you create.

Also, there’s a sub-moding relationship. So you can think about it this way. If something has the local mode, you’re constrained in how you can use it. You’re not allowed to store it on the heap or return it from a function, right? So if you have something at the global mode, it’s perfectly safe to treat it as if it were at the local mode because all you’re doing is taking away your ability to like store it somewhere. So we can set up the language so that anywhere something of the local mode is expected, it’s safe to use something of the global mode which makes programming with this feature a little bit easier.

Okay, so let’s have an example. Here’s our trusty map function. I have not changed it, but we can give it a fancier type now. So this type has two lines. In the first line I’ve written the classic type it has in standard OCaml like we talked about earlier. And I’ve written a second line with Ats and the second line records the modes of each of the arguments to this function. And here, dot means the legacy mode. I’m not making any promises about how this function interacts with the lifetime of this argument is with respect to this function. But the local mode means that the function map respects the stack discipline with respect to the function argument, in particular, it doesn’t let the function argument escape from your call to map.

So what that means, and by the way, that’s true, you can look at map, right. Map calls F, but it doesn’t like squirrel F away on the heap somewhere or return F, it just calls it. So then we can write a function like multiply by. So multiply by takes a list L and a number, mult, and it just multiplies every element of the list by mult.

Now it does that by creating a little anonymous function that just multiplies the number by mult. And a thing about OCaml is that we have to create a closure for this function. We have to allocate somewhere the data for this function object itself. And where are we going to do that? Well, in classic OCaml you would allocate it on the heap, but because we can see the map uses this function in a way that is consistent with a stack discipline, we can allocate it instead on the stack. And this means the garbage collector doesn’t have to get involved here at all and it’s automatically freed when multiply by returns.

Okay, I’m going to skip some parts of this section because I want to make sure I have time to talk about data race freedom.

So I’m going to skip this slide, but I do want to talk about this slide because I think it’s really cool. So a thing that we learned about local is that it’s not just good for stack allocation. This property of whether something escapes its scope is a useful property you might want to track just for safety or correctness reasons.

So this function with file is a function from the OCaml standard library and the idea is you pass it, a string, which is a file name and then some function that you want to run on the contents of that file. So that function receives an in-channel-dot-T, and then it does something, it computes something, tick-A, and your whole function returns that thing, tick-A. And the reason you would write a file manipulation function like this is that it can automatically manage the opening and closing of that channel for you. So with file will open a file handle and then run your function using it, and then it closes the file for you automatically when your function returns.

And this is nice because it means you won’t forget to close the file handle if you use with file. But that would actually be incorrect if the function we passed to with file stored the file handle somewhere. If it like put it on the heap or something, then with file is going to close the file and now later somebody may look at this file handle on the heap and thinks it refers to a valid file that’s still open. And that would be a bug in your program.

And we could use local in a way that has nothing to do with stack allocation to require that you pass with file a function that respects the stack discipline with respect to this channel, this file handle. So by giving with file this type, we guarantee that the function you pass it won’t like take that channel and store it away somewhere because that function has to not let it escape, has to respect the stack discipline.

So this is really cool, we didn’t see this coming originally, but this feature of being able to track whether something respects the stack discipline or whether something escapes its scope allows us to provide a little extra safety in classical OCaml functions like with file.

We should have mode polymorphism. I’m going to skip some things here because we have finite time. And I want to talk about data race freedom.

So I want to argue that modes, which are the technology we just finished talking about, are a great fit for data race freedom.

So here’s some things you might do to a value, and you might want to track whether your program does these things or not. You might want to make an alias to it, return it from a function, create a pointer to it, pass it to another thread. These are all like things you can do in your program. And if we can track whether you can do things like this, then we can maybe provide a way to provide a static guarantee of data race freedom.

So in our previous example, we talked about local and local is a mode that tracks whether you do something to a value, the thing that it tracks is whether you let it escape its scope. And we can add new modes that track things like this. And then as it turns out, we can put them together to guarantee data race freedom.

And these operations are all deep. So they fit into this category of modes that we’re talking about.

So what modes are we going to have? I’m going to introduce two new dimensions of modes.

The first one is contention. So contention tracks whether a value has been shared across threads. A value can be uncontended, which means it’s local to one thread or it can be contended, which means it has been shared with another thread. And we can use the type system to enforce that values that have been shared to other threads, values that are contended, you’re not allowed to read or write to the mutable parts of them. And this is going to give us a guarantee of data race freedom. This enforces that if you share a value to another thread, you’re not allowed to mutate it and so you can’t possibly have a data race. Okay? Of course, we do want to provide some way to mutate shared memory and I’ll get to that in a second.

First, I need to introduce a second mode dimension: Portability.

Contention was whether a value has been shared. Portability is whether a value may be shared, it’s a slightly different angle on the same kind of question. So values can be portable or non-portable. Portable values are safe to share, and non-portable values are not safe to share.

So what values are portable? You’re probably thinking, okay, so mutable data is not going to be portable. That’s not quite right. Actually, mutable data can be portable. Almost everything can be portable.

The trick here is that we already have contention. So we’re already preventing you from doing anything unsafe by sharing mutable data across threads. But there’s a loophole in that plan and that loophole is functions. Imagine that I create a function in one thread that mutates some data in that thread and that I pass that function itself to another thread. And that function is not mutable, but I’m sort of passing the capability to mutate data in the original thread by passing this function. So I need to ban that. I need to ban the ability to pass functions that do mutation to other threads. And the way I do that is by saying that functions that capture uncontended mutable state, that is mutable state that they have the power to mutate, cannot be sent to other threats. Those functions are not portable. And we can build a type system that tracks this property in functions.

Oh, I meant to reorganize my slides because the next slide is a doozy, but I guess I forgot. So I’ll go into it. Actually, we have a whole bunch of modes. Here’s the bestiary of modes. And so we have five modes with more, no, we have five modal axes with more coming. And these modal axes, it turns out, can be nicely categorized by the distinction that I talked about between portability and contention, which is whether they’re about what has happened to data in the past or whether they’re about what you’re allowed to do with data in the future.

So locality, which we already talked about is a future mode. It’s like portability. Locality is about whether you are allowed to let a value escape from its scope, right? And we’ve already talked about portability content. It turns out we’re going to have two other modes too, uniqueness in linearity, or two other modal axes, I should say.

So uniqueness is about whether you have created an alias to a value and linearity is about whether you are allowed to create an alias to a value. And here on this chart, I’ve indicated a bunch of things. The mode that is in bold, that’s the legacy one the classical OCaml data gets to have. And also there’s a minimum and a maximum because as we talked about there’s sub-moding here. There’s always, it’s always safe to promote values in one direction along these modal axes.

I won’t, I won’t bore you with the details of going through all these different things. I’m happy to talk about ‘em more later. But the point is we can track lots of properties using modes and we can put together to enforce data race freedom.

So let’s come back to data race freedom and let’s do that by looking at some examples of what functions might, what API we might give to functions using these modes.

So how can you spawn a thread? Well, we can give spawn this type. So the main point is that the function that you’re going to send to another thread needs to be portable. Portable functions are the ones that are safe to send to other threads, the ones that don’t capture and mutate data. So the spawn function simply requires that its argument is portable, problem solved.

And we can also have join. Now you might, so join like waits for a thread to finish and then that thread returns some value and it brings that value back to the original thread. You might expect that join the result needs to be contended because it’s moving across threads, but actually it doesn’t need to be contended because when join returns, the thread that you’re joining has finished running. So there’s no longer another thread that has access to this data. So we don’t need to track it as contended.

So okay, here’s the most complicated slide in the talk, well, I guess the slide of the table is the most complicated, but this is the most complicated one I’m actually going to try to explain. So this is like an API for safe parallelism using actual shared memory.

So, and I’m going to go through it in a little bit of detail, but the idea here is everything I’ve just talked about, portability and contention, what they do is they set things up so that there’s no way to share memory between threads and have that memory be mutable, right? But actually that’s something you want to do.

We want to provide some APIs that allow you to do that in a safe way. So this is an example, this is an example of one such API that uses locks. They don’t all have to use locks, but this is one API that does use locks to share memory in a safe way between threads.

So I’m going to have to go at a high level here because I don’t have enough room on the slide for all the details. But the idea is that this type T here, that’s the type that represents a pointer to immutable data, and the tick-A argument’s to Tm that’s the type of data that’s stored at this pointer. And the tick-K argument is a trick that we sometimes use in OCaml type system. It’s what you might call a phantom type. But basically that tick-K identifies the lock that is associated with this pointer.

So then if we look at create, create is how we create a new shared memory cell that is protected by a lock, and we know what lock is protected by because it returns we type or a T that has a K in it. So you’re going to need to have the key to this lock and that shows up on the next function, which is map. So map allows you to take a function and apply it to some data that’s in one of these shared memory cells.

So in order to do that, you need to have the key for the lock that is associated with this tick-K. And we’re going to set the system up so that keys are not portable, you’re not allowed to send ‘em across threads. So if you succeed in locking the lock and getting the key, then you know you’re the only person that can access the shared memory right now and you’re allowed to do things like map.

So map requires you to have the key and requires you to give it a portable function that’s going to run conceptually on this other piece of shared memory that might belong to another thread. And then you get back a piece of shared memory that has been updated. So it takes a type, a function from type tick-A to tick-B and it updates the value that’s appointed to be a tick-B. Okay. This API, as you can see, goes on further, and I’m happy to talk about it more, but I’m not going to, I’m not going to go through the whole thing in this talk.

Okay. So I think that’s all the type theory I have to show you today and I’ve just about finished on time. So just quickly to summarize some of this, how is it going?

Some of, so I should say some of this is more conceptual than other bits. There’s some Jane Streeters in the audience and I’m sure that they’re like, oh, I don’t remember seeing that before.

So stack allocation unboxed types are being used in production systems. In fact, I’m here in Hong Kong this week. I don’t live here, but I’m here to help one of the teams here use unboxed types in the system. And it’s been really fun. It’s going okay. There’s lots more to do though. These features, you know, one of the nice things about working on a compiler in a setting like Jane Street is that we can add features that aren’t quite fully fleshed out yet.

So there’s lots more work to do to make these features feel totally complete with respect to what you would expect to see in OCaml. But they are now usable and being used by teams. This slide is a little bit out of date. I would say the data race freedom implementation is finished, but we’re still in the initial stages of rolling it out. So we’re working with some individual teams to have them sort of beta test for us, this API for data race freedom.

And we think that at the moment nobody inside Jane Street is allowed to write parallel applications in OCaml because we don’t have a way to give them a guarantee of data race freedom. But very soon we’re going to turn that switch and it’ll be the case that people can write parallel applications and every parallel application will get this guarantee statically. And we’re very excited about that.

How’s it going? I think it’s going well. This stuff fits into OCaml pretty well. I can talk about some of this in more detail later if you’re interested, but I think, you know, there are people that have been writing very high-performance software in OCaml for a long time at Jane Street and I think they report that these features make their lives much easier. They don’t have to do as, as many crazy hacks or rely on like external function calls, things like that. They can write things that feel much more idiomatic by using the features that I’ve talked about.

The pay-as-you-go bit where we don’t want other people to have to pay for the complexity here seems to be working. We’re pretty nervous about this. As I mentioned, we’re in the process of rolling out data race freedom, which means we’re like in the process of annotating all of our standard libraries with those portability and contention modes and, you know, I think we are a little bit worried about the complexity, but so far so good.

And if you want to know more, we have a YouTube series, we have blogs, we have academic papers. I’m very proud of the “Data Race Freedom à la Mode” paper, which won the award at POPL, one of the top programming language conferences this year. And I think that’s all I have, so I’m happy to take questions.

Making OCaml Safe for Performance Engineering

Transcript

The next great idea will come from you