A Language-Oriented System Design
Nathan Linger
Jane Street
This talk explores the design of Ontology, a permissions management service developed at Jane Street. The design of the system is language-oriented in two different ways:
Internally, the state of the system is represented as a sequence of declarations in a dependently-typed language lifted straight out of type theory. This language is the unifying principle behind a deductive database, its query language, and the representation of requested state changes that together form the core of the system.
Externally, in the user interface for the system, the internal language is hidden everywhere, having been rendered in natural language (English). Because of this rendering strategy, key elements of the user interface are at once structure-editors for the internal language as well as natural language forms that allow even technically unsophisticated users to construct permissions requests and queries.
Transcript
So I’m going to talk about the system that we designed here called Ontology. So we’ll just get right into it. This is the picture, and when you see this picture, you’ll know that I’m talking about Ontology. And it is a system that we use for our permissions management. It has a slightly more general idea behind it than that, though, so we’re going to talk about it in a slightly more general way.
But with respect to the title about it being a language-oriented design, there are two languages involved in the design. One of them is something from type theory, so it’s a dependently typed language of data types. And we’ll talk about that and spend most of our time talking about that. And then also the other language is English, and because people don’t want to read this, they want to read things in languages that they know. So we spend efforts to hide away all of the internal language syntax in the user interface.
So the system, as mentioned, is concerned with permissions, but also other kinds of configuration at a high level that are meant to be shared between systems, as well. And so the idea is we’re concerned about, for some given domain, what things exist in that domain, what relationships exist between those things, and also what are the changes that people would like to make to both of those things, and whose approval is required before those changes can go through?
So the overall workflow here is… so there’s Ontology, and then there’s the rest of Jane Street that talks to it. And the direction we’ve been talking about here is that some configuration lives in Ontology and is somehow gotten out into the world, either through some subscription or some query that gets called every once in a while. And also, Ontology itself needs to know about other data that it doesn’t have control over, so that it can say things about them. So the stuff that we want to apply permissions to is sometimes under the control of some other system.
And so then there are users, and users are looking at Ontology, and they’re learning what are the permissions, what are the connections between the different things, and they are thinking well, maybe something needs to change about that, and I wish I had this other permission that I don’t have right now, or I wish this other user group existed that doesn’t exist right now. So they make a request, and Ontology takes that request and runs a little query that tells it who is responsible for approving this thing, before that request can go through. And it sends email to those people and says, “Please consider approving this request.” And then the user who is the approver, whoever are the approvers, they can come back and make the approval and it’ll go through, or they can reject it.
So that’s the background for what we’re talking about here. So the data that lives inside the system is the interesting thing that is represented in this language-oriented way. So let’s talk about that data. So there are a couple of different axes you can look at in seeing what sorts of data we’re talking about. One useful distinction that we wanted to make, because we’re eventually going to render this thing in English, is just at a very basic level, in terms of what part of speech are we talking about, it’s useful to distinguish between data that plays the part of a noun, so the things that we’re talking about, versus the data that has to do with the things that we’re going to say about those nouns, so the propositions that we want to be able to express about all the different entities.
So that’s the first dimension, is whether we’re talking about nouns or propositions. The other dimension is the way in which we are going to go about defining that data. So how is it represented? And we have three choices here. Either we put the data in a table, that’s one way. The other thing we can do is we can define data by some rules, some logical rules that we define. And then lastly, we can have some data in the system that is there, not because it was expressed in the language itself, but it was hard coded at the time that we built the system. So I’ll talk a little bit about that later.
I think this is a little bit hard to talk about at this high level, so I’ll just dive into some examples. So first, we’ll talk about nouns that are defined by tables. So these are the simplest data in the system, and these are just lists of things. So we have a list of users, and we have a list of HG repos, where we keep certain things like code and what we want for lunch. That’s not true. That’s just a made up example. The poetry is real. And user groups. And there’s lots more things besides this. So this is like I’ve cut the universe down to a very small size, to make it fit into a presentation.
And oh, right, so these are pretty simple. And a lot of these get imported from the outside world. The next thing we might want to do is make a more interesting table that has more than one column. So here is a table that stands for a proposition that some user is a member of some group. So each row in this table is an instance of that relationship and stands for some proposition. And we define it by just enumerating all of the things. So that’s what it means to define something by a table, it’s just to exhaustively list all the instances of it.
Okay, here’s another one where we have another table, but the first column has something slightly more interesting in it. So this is the data we haven’t seen before, this is a set of users. So I will talk about that next. That was defined by a rule. So we can also define nouns with some rule or set of rules. So here we have, the data we want to represent is a set of users. And we have two rules in the system for saying ways to name a set of users. So the first rule is well, if there’s some user group A, then we can talk about all the users in that user group, and that is going to be considered a set of users. So we’re just defining what it means to be a user set. So that’s one way you can be a user set, is we can import groups and consider those as user sets.
And then the other way to be a user set is to be a singleton of a single user. So sometimes, you want to give permissions to whole groups of people, sometimes you want to give permissions to individuals. So these are two useful things to package up together into this one type of data. So that’s what a user set is, and that’s a noun that’s defined by rules.
A proposition that you can define by rules is being a member of a user set. So user sets were defined by some rules, so now we need to define what it means to be in one, and we’re also going to use some rules for that. So not surprisingly, there were two different ways to form user sets, and there’s two different ways to be members of a user set. That follows the pattern of the underlying definition. So the two rules we have, if U is a user, well, then U is a member of the set of users containing only U. Not controversial. And if we have a user and a user group that that user belongs to, then when considered as a user set, the set of all users in G also contains U. So those are our two rules for saying what does it mean for a user to be a member of a user set.
Let’s see. Another one here is that a user may access an HG repo. Oh, right. So if you recall, back here, we had a way to say a set of users may access an HG repo. But now we’re interested in saying this individual user has access to an HG repo. So we’re going to define this proposition in terms of the other one. So it’s basically just unfolding the definition of the other thing. So a user can access… and individual user U can access a repo if there is some user set that it belongs to, that has already been granted access to the repo in question. So hopefully, none of these things are very exciting to read off. If they were exciting, then probably they’re wrong. We would like to have some comfort that this all sounds like it makes sense.
So the last example, before getting into how this stuff is represented, there’s this pattern that we have, where there’s some subset, of let’s just say users. So we have some type, some noun data, and we want to talk about some subset of those. So in this case, we have the noun user, and we’re interested in a subset of those users that have the ability to trade, that we’re trading from. So we do this in two steps. We first define a predicate, we define a proposition that says that a user can trade. And then we use that predicate to single out a subset of the users that have that property.
So the first definition here is the predicate user A can trade. And this is defined using rule, and the rule says if you’re a user and you’re a member of this special user group called traders, then okay, you may trade. And so that’s the definition of who may trade. And then definition of a trader is, again, it’s defined by rules, and we’ve appealed to this predicate. So if you U is a user and U may trade, then “U” is a trader. So this is a slightly different object here that we’re building up. So now we have the ability to talk about traders.
So in some other table, we might want to have some permissions that only apply to traders, so we would have the ability to say, well, we’re not talking about users in general there, we’re only talking about traders. And it would be data that we build up in this way.
So that was the map that we had set out for ourselves. Oh, I said I was going to talk more about primitives. So I’ll talk about them on the next slide. So here are all the examples that we just went through, and here’s where they fit in this map of… in this taxonomy of all the places where… all the ways in which we can define things. And it is usually the case that we define relationships, these propositions, in terms of nouns. And it is also usually the case that we define things by rules, in terms of things that we’ve already defined in terms of tables.
But I wanted to point out that that’s not always true here. So one example is we have this definition seven here was the table containing all of the user set and HG repo pairs, where all the users in that user set have access to the HG repo. So that was a thing that was defined in that table, but the data that lived inside that table was defined in the first column here, the user set, that was defined by some rules.
And the other example where it goes against the grain here is the subset type of traders. So it’s normally the case that propositions are defined in terms of nouns. But here, this trader noun here, number five, is defined in terms of the predicate that says who may trade. So the point here is that these things can all depend on each other in interesting ways. It’s not like there’s this certain hierarchy of these things. They can fold back on each other.
So now that I’ve said a bunch of examples, I want to relate this back to databases. So the primitives that we have in the system… I should have given some examples of that, too. The primitives are things like port numbers or IP address ranges or things that are of interest, when we want to configure something to have, that mentions some of these data. So maybe we want to set up some firewall rules. Or I think another one that we recently added had something to do with for giving a location, what’s the maximum bandwidth that we want the network to be able to support, coming into and out of that location. So those are types that are primitive nouns, and we can’t really define them in the system. They don’t follow this pattern. They don’t follow any of the patterns that we’ve shown so far.
So those come with the system, in the same way that when you write a program in a language, you don’t have to define all the types that you use. You get strings or integers, or some types are basic enough that they just come with the language. So that’s what we mean by primitives. And there are primitive nouns and primitive propositions. So both those things can exist.
So when you’re thinking about how does this system compare to a regular relational database, I think basically, if we just had tables, and we had some primitive types, then that would be… we would be approximating a relational database. If we threw away the tables, and we tried to define everything in terms of rules, then I think basically what we would have is some variant of prologue. So everything is defined by some inference rule, and that’s what that would look like.
But we have all of these things together, and so the mixture of all of these things together is… I think a great way to think of it is a deductive database. This is some term that used to be popular decades ago. But you don’t see very many of these.
[inaudible 00:18:23]
Yes. And so that’s basically what we have. And when you think about comparing this to a relational database, there are things like these subset types that I think are pretty straightforward to do here. And I’m not going to claim that you couldn’t do this in a relational database, but mostly that I don’t think I would know how to do it straightforwardly.
And the other thing that you can do is, it’s pretty easy to define transitive closures of things. So one example that we have internally are mailing lists that are themselves subscribed to other mailing lists. So if you want to know what are all the mailing lists that you belong to, you need to follow this chain of mailing lists until they stop being deeper and deeper mailing lists that you’re subscribed to. And this is a… just writing that proposition with a recursive rule is not a hard thing to do.
Okay. So now we’re going to talk about how are all these things represented. So now we’re going to get into the type theory stuff. But before we do that, I first want to talk about a simpler notion of types that exist in functional programming, so languages like OCaml or Haskell. They have a notion of a variant type. And I feel like I should say something about this, since this is very close to the flavor of types that exist in Ontology, but these are not very well represented in the world of programming languages.
So a variant type is basically like a union type in C, except with safety guarantees, which don’t exist with union types in C. You can also think of it as a generalization of an enumeration type.
So here are three examples. We have a type of booleans, a type of exit codes, and a type of trees. So booleans are defined to be either true or false. That’s all you get with booleans. Those are the two ways you can make a boolean. And so we’re going to call these two names true and false. We’re going to call these the data constructors, and bool we’re going to call the type constructor.
Exit code is only slightly more interesting, in that one of the constructors named the error has some extra data that comes along with it. But again, these are the two ways that you can form an exit code. You can be the success exit code, or you can make one that’s an error, if you happen to have an it that you want to consider as the code for the error.
And then the most complicated one here is a binary tree. So this is how we can, if we allow ourselves to make recursive reference to the type we’re defining, then we can define a tree structure like this. So a tree is either a leaf, containing some character, or a node with a left sub tree and a right sub tree and some integer data there at the internal node.
And so we build up trees by applying these constructors. So here’s a pair of trees that I’ve defined. So the first one is that tree. And then the second one is slightly larger, and it’s that tree. So those are variant types, and the other interesting thing that you can do with variant types, besides building them up, is tearing them down. And the way that you do that in languages like OCaml and Haskell is with pattern matching. But because that’s not what happens in Ontology, I’m not going to talk about that. So we’ll just talk about forming the data.
So now let’s take these three types and try to translate them into Ontology’s internal language, which we’re now starting to talk about. So like I said, when you’re defining booleans, the boolean type, you’re really defining three constructors. The first constructor… or three constants, if you will.
The first constant is a constant of type type. A bool.t is a type. And so that’s what the colon means. You can read the colon as an is/a relationship. And then bool.true, we’re going to be precise about the scopes of things here with these. The bool dot is the path that gets you to this constant true. So bool.true is a bool, so it has type bool.t. and then false is the other constant that we’ve defined. So that is the same content of the definition from the previous page. We’ve introduced these three constants, and that’s that.
And then with exit codes, it’s the same thing except in the case of this error exit code, in order to get an exit code, you have to first apply the error constructor to an it a value of type it.
And then the trees are the same way. It’s just that now we can build up larger and larger trees by nesting these applications more and more deeply.
So that’s the syntax that we have. It’s a very S expression inspired syntax, since we’re very fond of those around here. But other than a few extra parens, it’s something that you might see in some type theory literature, where you have types, and you have bindings that are assigning those types to certain names.
So let’s convert some of our data that we had before into this new form. We have this user data, so we have this table of users. So this is pretty straightforward. We can just think of this as some sort of enumeration type. What is a user? Well, a user is either the user admin or it’s the user B. Smith or C. Jones, et cetera. And we have this finite list of all the different users that came from our table, and that’s how we’re going to define it in this notation.
One thing that changed from the previous slide to this one is here, I had been saying type in a few different spots. That first distinction in our taxonomy of data, of nouns versus prepositions means that we have two ways of saying the word type in Ontology. We can say noun, and we can say proposition. And both of those are ways of saying that we’re declaring a type. And then additionally, we have this extra annotation here that says essentially whether we’re defining a table or we’re defining something by rules. And so the way that you say this in the language, there’s a key word extensionally, which means you’re defining a thing that’s supposed to act like a table. And then there’s another key word, intentionally, which means you’re defining a type that’s supposed to act like it’s defined by rules.
So HG repo, same story. User group. This is the exact same pattern. It’s here just for completeness. So now, let’s talk about taking a proposition table and representing that. So this one is going to look a little bit different. So first, we have… this is the relation that we’re trying to capture, that user A is a member of some user group B. So that proposition isn’t really complete until we supply an A and a B. So the constructure that we’re going to define to represent this table is one that takes two arguments before it returns the proposition. So user.group.memeber.t has to be supplied to a user group and then a user before what results is a proposition or a type. And again, we had the annotation here extensionally that says we are going to treat this as a table.
So the way that we convert our table here into these declarations is each row stands for some proposition. The first row stands for the proposition that the user B. Smith is a member of the user group human resources. And so that proposition we represent by applying this type constructor to first a user group and then a user. So this is the same data from that row, but this thing is a proposition. It’s a type. So we need, in order to have a declaration here of a constant, we need some name for the constant that has that type. So we just make one up. It’s not really important. All the information here is really in the type. The name of the constant that we pick here is something that is picked by the system. So someone at some point came and said, “I want this to be true,” and they told the system, “Please, make it true.” And the system received this proposition that it could parse, and it said, “Okay, I’ll add it to my list of declarations, and I generate for you this identifier.
An identifier is basically just some name that stands for this fact. It can be referenced later on, if we need some evidence that this fact is true. We can have this name as a shorthand, and then just look it up and make sure that the proposition that we expect is the type that that name has.
Now this no longer looks that much like variant type, like I was talking about. So in a sense, it is still like an enumeration type. None of these constants take any arguments. But what’s weird about it is not all these constants have the same type. So this is reminiscent of another feature that comes from functional languages called generalized algebraic data types, which are like variant types, but you relax the restriction that all of the constructors have to return the exact same type. And instead, you merely insist that they all return types that start out the same. So all of these constants here have types of the form user group, member t, something something, for different values of something and something. So we’re still in variant type world. It’s just that we’ve generalized it now. So we’re in G80Ts is what’s going on here.
Okay. All right. So that’s tables. Now, let’s talk about how we can define things by rules. So the first example we had of a noun defined by rules was a user set. And we said there were two ways to form a user set. So what we’re going to do here is we’re going to turn that into a type definition again. So here, we’re saying it’s a noun, and it’s defined intentionally, which again just means by rules. And each one of our rules on the left hand side is going to become a constructor on the right hand side.
So this first rule that said, “If you give me a user group, then I can relabel it and call it a user set. So that’s what this constructor says. It says, well, one representation is a user set is just a user group, wrapped in this particular constructor that we’ve chosen to name in a way that’s suggestive of the way we’re thinking of that type of set.
And then the other rule says, “Well, if you give me a user, then I can rebrand it and tell you that it’s a user set.” So that’s exactly what we’re doing here. We’re saying singleton set for a particular user can be represented using this singleton constructor. And now we’re back to normal variant type territory here. All of these constructors are returning the same type, user.set.t, so this is a thing which we could have defined back on that previous page with the variant type examples. We could have represented this in a usual language with variant types. So that’s that. That’s how we represent user sets, so that’s an intentional noun.
Oh, and now that we have… there was one extensional proposition that we had left behind. We couldn’t really talk about it yet because we didn’t have user sets, but now that we do, we can talk about it. So HG repo membership, specified at the level of user sets, can be represented this way. So again, we are dealing with a table. And again, we have the pattern that all of the constructors that we use have these obviously system-generated names, and the only difference here is that this second argument to the type constructor is a user.set.t value.
So here, the first row, we had the user set in question was one of these all in group user sets. And here, we can see that clearly on the right hand side. And then all the other ones were singleton sets that just granted permission for an individual user. And so that’s what we see here with using these singleton constructors.
There’s some inference going on here of the path. So when the type checker comes along and says, “Let me make sure this thing is well-formed.” I know that hg.repo.member.t is a thing that returns a proposition, so that’s good. That’s something that’s allowed to appear on the right hand side of a colon, since prepositions are types. But let me make sure that this thing is applied correctly. So let me make sure that this first argument is an HG repo and its second argument is a user set. Well, ocaml-code, when we say that, we’re really talking about this thing. So the type checker at that point knows it’s looking for a thing of this type, and it’s only told this part of the name. But there’s this invariant that all of the data constructors for a type and the type constructor that they all inhabit all have to live at the same scope. So this bit of the name is being inferred here, which is why we can just get away with writing OCaml code here, rather than hgrepo.ocamlcode. And a similar thing is happening with the singleton constructor and with this one, as well. So really, the only place where we have a bunch of path information is at the beginning.
Okay, so that’s an extensional proposition. We already saw one of those, but that’s another one. Okay, now a new thing. And intentionally defined proposition. So here we’re defining a proposition, but we’re doing it with some rules. So again, these propositions are parameterized over a user and a user set, so we have those two arguments to our type constructor, which returns a proposition, then, and it’s defined intentionally. So we’ve got that notation here as well. And it’s the same thing. We have each of these two rules becomes a declaration of a new constant. And so let’s look at the definition of the first one. So this one says if U is a user, then U is a member of the user set, which is just the singleton containing U.
And here is that same thing, said as a type. So the type of this singleton constructor says for all users U, it is the case that this proposition is true of U. And this proposition says that U belongs to or is a member of this set, which is the singleton set containing U. So this is how we represent all these things.
Something new is happening here, where the type, all the types that had arrows in them before looked like this. There was some type and then an arrow and then another type and then an arrow, and then eventually, we run out of types. But now, we have an interesting thing where instead of just a type on the left hand side of this arrow, we have a name, as well. So what we’re doing here is we’re saying the type returned by the singleton constructor here is going to depend on the argument that we pass in to that constructor. So that is something new and interesting that goes by the name of dependent types. So we have a… now, it’s not even the case that if we define, apply a singleton to… if we apply it to different arguments, we’re going to get different types out. And that’s really what we want. We want this to be some sort of a logical rule. This is an inference rule, where we just happen to instantiate it to a particular user. But it really says this is our way of proving a bunch of facts of this shape, by supplying different values for U.
And the second rule is similar. So again, we’re saying for all user groups G, for all users U, such that U belongs to G, then if all that is true, then we have a situation where U is a member of the user set G. All right.
Oh, and this is again just us unpacking the definition. So we had a way to say that whole sets of users had access to a repo, and we want a way to say that individuals have access to a repo. So I hope by now, these things feel pretty straightforward. We’re just converting all of this English into the language that we’ve built up for ourself. And again, this constructor has several of these arguments, where the value that we pass in is something that we can mention later on, in the types of the other arguments that come after it.
So all right, let’s see. Here’s our subset type. So our subset type, we had this may trade relation. It’s another intentionally defined proposition. And then we have an intentionally defined noun that just packs this up. So if you think of this as a data constructor, we have our type, it has a single constructor, that constructor takes two arguments. The two arguments are a user and something of type user may trade to U. So we should think of this as some evidence for the fact that this user may trade. And that is going to be built up, using all the constructors we gave for that type. But basically, when you’re building up a term whose type is a proposition, you’re essentially building up a proof. And so what this constructor says, is what is a trader? A trader is a pair of a user and a proof that that user may trade. So that’s the way that we build these subsets, is we’re embedding these proof objects inside the data.
All right. Well, that was a lot to say. So [crosstalk 00:42:07] yes?
[inaudible 00:42:09]?
This one?
Yeah. Thank you.
Okay. So we have this way of representing all these different things, and what we’re doing is we’re representing it as a sequence of declarations in this type language, and that means that we have implied in what I’m saying is there’s some type checker that we can run over all this data. And it can say no, your data’s bad, or it can say okay, this all checks out. So that’s useful, and obviously, we’d like to maintain some invariant that at all times, all of our data are well-formed and the type checker would say, this looks great.
And an interesting consequence of that, I think, is that… there’s one very simple way that the type checker might complain, is if we try to reference a thing that is no longer defined. So if I mention some constant, and that constant is no longer in the set of all the declarations of constants, then the type checker would say no, this is no good.
So what that means, as a consequence of this invariant that we want to maintain, is that if I ever delete a user group, say, I need to go and delete all the mentions of that user group. Meaning all the statements of fact that this user belonged to that particular user group, or all of the user sets that mention that user group and all of the HG permissions assigned via user sets, defined in terms of that user group.
So there’s a bunch of downstream cleanup that has to happen if we’re going to maintain this invariant. And this turns out to be a useful thing because it’s not good to have all this meaningless data lying around, and so having this constraint at the heart of our system means that it forces us to clean up stuff when we get rid of things. It also means that it can be particularly expensive to delete things, which is why we don’t really, really delete things until two weeks have passed. We remember all the things that we deleted, so there’s a little trash can, I guess.
Another thing that happens because of this cascading delete idea is when we assigned read permissions to these different constants. So different users in the firm have the ability to see the data that lives in the system. And you can say, “Oh, this particular constant can’t be viewed by such and such.” And what that means is whenever we go to look something up, we have to make sure that whoever is doing the looking up is allowed to see the thing that they’re looking at. And this has this transitive behavior where we go back in the other direction. Well, am I allowed to see this thing? Well, what is it defined in terms of? It mentions that and that. Let me check, am I allowed to read those things? So we have to go back in the other direction, to make sure that we’re allowed to read the things that we’re reading. So the system takes care of that.
And what this means is if you have a bunch of interesting read permissions in the system, it says though different users have their own personalized view of the data, in which everything that they’re not allowed to read has been cascadingly deleted before they got there. Or at least, that’s the promise that we make.
Okay. So now we have all this data. We’d like to query it. So here’s an example of a query. And again, we’re going to talk about the queries in English first, and then we’ll talk about how they’re represented in the internal language. So one query here is I want you to find me a B and a D and an A of these types that have the following relationships. So these three things up here are all nouns, and then the things down below here are propositions that are supposed to hold, of those nouns. So what are we looking for? We’re looking for a user, A, and some user set that they belong to, B, and some HG repo, D, that they have access to, via that user set.
All right. So we run that query, we get these results. How will we represent that in our internal language? So this is a new thing we haven’t seen before, this exists keyword. And so the way that we represent a query is just a logical formula. There exists and A and a B and a C and a D of these types, such that this thing is true of A and C. So this is a thing that could be either true or false. It’s false if those things don’t exist that live in that relationship to each other. It’s true if they do. So evidence for the truth of this thing is, find me an A and a B and a C and a D, such that this last thing is true, and also, prove it.
So there are our expanded query results. So this is the same data as what I showed you here, except with two additional columns on the right that correspond to these last two proposition typed things. So now those are where the proof objects are coming into it. So let’s just look at this second row here. So this is a user set, this is an HG repo, and B. Smith is a user. And C201 is the name for the fact of this shape for the B and the A that came from columns two and one, respectively. And then lastly, singleton B. Smith is also a term that has this type. So singleton B. Smith has a type that says that C is a member of A. Which one’s C again? C is B. Smith, and A is the singleton B. Smith user set.
So this whole thing, this is all well typed data that comes back from the query. But the thing that we see is just the noun data. So okay, all right. And now that we have all this data and we have the ability to query it, we also want to be able to make changes. So I’m not going to talk that much about that, but there’s basically different ways to change data, based on what type of data it is.
So extensional data, some of it is data that is under the jurisdiction of Ontology. And for those, you can submits a request, and that’s the story that we told earlier, where someone has to approve it, perhaps, and once that’s all fully approved, then the change goes through. But some data is data that we don’t have jurisdiction over, so we just import that on some regular schedule. That’s extensional data. Did I say all these things? Oh, right. So what that means is for the extensional data, you can change the world by changing the data that’s there and then waiting for something outside to sync up with Ontology. So depending on how many different things are listening to what Ontology says, this means you can create a new user group or destroy a HG repo or add new relationships or remove them or whatever. And then you find that okay, now I can run HG clone and it works because the permissions got synced up to the world.
So for non extensional data, those are not changeable by users. Those are changeable only by basically bumping the version of that type into some other type and doing some upgrade step, where we migrate the old data from the old representation to the new representation. And at the same time, we flip which one of these, which version of the data is visible. And that’s a thing we have to be careful about because we need to know which clients are paying attention to the old data. So there’s a whole story about that.
All right. Let’s see here. So now we want to render all this stuff in English. So this part of the story could be more interesting, but it isn’t. This part of the story is basically we have a bunch of strings that we punch holes in, that we just slap together in some sort of template instantiation way, and we end up with something that looks like English. So we know the bare minimum possible about English grammar, in order to get this job done. And that means that sometimes, there are constructions that probably aren’t the best. This part could be done better, and there are people who have whole careers devoted to type systems for natural language grammar generation stuff.
So I bought one of their books. And then I thought, no, I’m just going to do a very dumb thing here. So basically, every one of these constructors has one of these strings. And if you have strings for the arguments that you’ve determined recursively, you just splice it into the string template for the constructor that you’re trying to render in English.
So okay, let’s see. I wanted to show you a few things. So I made a prompt… so I’ve made a copy of the system, which is teeny and only has the data that you’ve all listened to me tell you about, so you know it by now. So it’s just a small amount of data. And I also have a particular setup here where I would normally have to type Ontology and then the query sub command to do a query. But I’m in a particular environment where I’ve set everything up so that all of those sub commands are just top level things I can call, and it’s obvious to the context that I’m only talking to my tiny little copy of the system. So [crosstalk 00:54:30] yes?
Is this a bash prompt or an ontology prompt?
This is a bash prompt, but it’s masquerading as an Ontology prompt. Okay. So this is a private instance. Okay, so I typed query. I’m going to make a query, and so I’m being asked, well, my current query is empty. Please pick some template for a query that you want to talk about. So I’m interested in working out what are all the HG repos that anybody in the developer’s user group has access to. Okay. So first thing, I know I’m interested in HG repos, so I’m going to… this is all based on some nice SZF tool that we’ve wrapped up, so some fuzzy finding thing. So I say HG repo access. I’m interested in that.
So I selected that, and now my query got bigger. And this is the query that we were looking at a couple slides back. So I won’t say too much about it except that this tells me about HG repo access, but I also need to get user group membership into the mix somehow. So I’m going to introduce another clause, and this clause is going to have something to do with user group member. Okay, there we go. There’s only one of those. Okay.
And I haven’t said anything yet about which user group I’m interested in. But maybe I get too excited and I jump the gun. I’m like, oh, my query’s done. So I say I’m happy with this query, and then it says wait a minute. That’s not really one query. That’s two queries. You haven’t connected these two queries in any particular way yet. So maybe you want to join some clauses.
So joining, here I want to join… I have this D user and this B user. Well, I want those to be the same, so let me join those two together. That’ll unify them. So join two clauses. B, a user… and after I say B, the only thing of the same type as B was D, so that’s my only option now. Okay. So now, this time if I hit enter, it would run because this is a well-formed query. All these letters above the such that are somehow connected underneath the such that. But I haven’t yet told which user group I’m interested in, so I’m going to refine one of these clauses about the user group. And here are all the user groups. Developers is the one I want. Okay.
So now what does this say? I find a user that is a member of developers. It’s also a member of a user set C that has access to E. That looks right. Okay. So here are my developers, and there’s a problem that none of them have access to any code repos. They’re only reading poems and ordering lunch. So now I’m going to solve that problem. So I’m going to go to the Ontology website, and I’m going to say, I’m going to grant some HG user, HG repo permissions. Access. There you go. Grant users and user set access to HG repo. That sounds right. So I’m going to do this.
So I want to grantt access to the OCaml code repo. And now here, there’s two ways that I can form a user set, remember. I can be talking about some individual user or some group of users, so I’m interested in giving all the developers access. So okay, so that was successfully submitted. And now you’ll see here, approval will be requested from the following users, this very conspicuously named user, owner. Owner owns the data. So I’m going to switch over to owner’s login here. This is an incognito window where I’ve logged in as owner. Now owner has one request awaiting his approval, and I’m going to approve it. Okay.
Now let’s go back and let’s run our query again. Let’s see. And the other thing that happened here is I went through that whole interactive construction of a query. I don’t want to do that again. Good. Here’s the query that I can just paste in now, in the language that I don’t have to know because I just went through the interactive thing. Okay, good. Developers can access the OCaml code repo.
The last thing I want to show is what is it like to browse around on the website here. So this is where I think this English rendering UI is the only thing you see, and you feel good because you’re not seeing the other thing. So I’m interested in this traders user group. I’ve heard about some traders. I want to find out what’s going on. Okay. So traders is a user group. Let’s see. Here’s some other user groups. All right. I’m going to go back. So here’s some things that the system knows about this particular user group. So it’s mentioned in this particular rule, this rule that defines who is allowed to trade. So here’s the rule that we had seen earlier. I can go see everything of this type. This says this is one way to conclude this one may trade. Oh, well, it’s the only way.
And let’s see. There’s only one member of traders, this S. Welch. So I can go and look at S. Welch and see what’s going on there. Looks like she has access to the poetry group. So you can just poke around. Here, we can see all the user sets that have access to all the HG repos, just to look at that. We can… that’s the website. So basically, it’s just a bunch of English rendering of all of this data that is inside in this internal representation. So that is that.
So to wrap this up, we designed this whole system on this little DSL that we plucked out of type theory literature. How did that go? Well, I’ve talked a little bit about what I think are some of the upsides to this, so we get this deductive database power, and I talked a little bit about what things we get for that. We were able to come up with this UI that’s data-driven in a way that I think turned out to be pretty nice. Again, there’s probably still some rough edges. Maybe we should have done something smarter than just slapping strings together, but it seems to work okay. And one thing that I didn’t emphasize here, but hopefully it was obvious from seeing the demo, is that whenever we were constructing a request or constructing a query, we were always presented with choices that were the full range of all the available valid things that we could do. And in particular, we were never allowed to go wrong, in a sense. So when I’m asked for please give a user group, I’m given the list of all the user groups. When I’m asked for a user set, I’m given a list of all the ways to form a user set. It’s hard to… Another way of saying this is that the UI, because it’s driven by the types, it’s guaranteed to respect the type system. So the things it produces, in the end, are well-formed.
So adding new types of permission and config, because of this data drivenness, means that the heart of the system doesn’t need to be upgraded. We can just upload these new declarations into it, and that’s enough to get new types of permissions in.
And so one example of this very general approach that was a nice thing that came up the last year or so was we had a variety of places where you could ask for permissions or you could also ask for permissions to be taken away. And you can ask for permissions to be taken away from yourself. And you should be able to just do that. If you were like, oh, look, I don’t need this anymore, we shouldn’t stand in your way. You shouldn’t need someone’s permission to give up your right to do a something that… if you’re an adult and you think you don’t need to do that anymore.
So we spot-checked a bunch of these things, but we were able to basically use this very general structure of the data to write a test that knew whether or not… for all the different request types for getting rid of permission, it could just say well, let me try to instantiate this to the person who’s doing the requesting, and then are they then allowed to push the thing through without any additional approval? And that was just a test that we could just write once, and that’s been a useful thing to catch bugs.
So what are the downsides of this? I think the biggest downside that we don’t yet have a great solution to, although we have some ideas, are so this prologue thing, in the heart of this language, is [inaudible 01:06:00] complete. And that’s not really what you want to live in the heart of your system. Any time a request comes in, we run one of these little queries to decide who has the ability to approve the thing, and one thing to be concerned about is that those queries might just run forever. And so we have some abstract notion of the execution budget of one of these queries, so that we can limit it and say no, you can’t run more than X units, whatever the units are. And this is fine, and it lets us sleep at night, but it doesn’t… sometimes the error mode is bad when you violate the thing because oh, that query was just longer than the ones we’ve seen so far.
So we have some ideas about this that zero in on the actually recursive types. And the budget ought to be more custom toward preventing [inaudible 01:07:04]. I think it’s a little bit too fine grained a notion of a budget right now.
Another downside of the DSL is just have to implement DSL, and it’s meant that in our case, it was this hodge-podge of different things. It was kind of a database, kind of a programming language, and so we had to re implement various small things from each of those domains. However, one saving grace, I think, the only thing that made this reasonable at all was that this DSL is not at all an original thing. Basically, this notion of having a dependently typed language of constants and running a prologue, backtracking search over it is something that’s a few decades old and is pretty well understood. And there are systems out there in academia that were even more ambitious, and we didn’t even go that far. All of the data that we represent is first order. And all of the interesting academic work is about how do you deal with higher order data. So we didn’t even touch that. We wanted that to be a simple aspect of it. So that’s that.