Monday, September 20, 2010

Namespaces (part 1) - the OB namespace

I've been getting a lot of hits to my page about name usage in file system filters so I've decided to expand on the subject of names a bit further. This blog post is more about software design (and especially about OS design) and less about file system filters.

The role of language in shaping the way we think is a very interesting subject and one I've been interested in for a while. The book "Language in Thought and Action" is a very good introduction to the subject. One of the ideas in the book is that the mapping of names to objects changes the way we think about the object. While this is true to a certain extent in programming (think about how often you heard the phrase "well, this API would have been better named BlahBlah …"), computer science as a discipline has a completely new class of problems that I'd like to focus on in this post. The problems associated with actually designing namespaces. I'm not sure why designing and identifying namespaces isn't as popular in computer science circles as other concepts like indirection and variable scope because it's at least as important.

I don't think writing a formal definition of a namespace would actually be very interesting so I'll go straight to some examples of namespaces.

Probably the best known one is the file system namespace. The main elements of this namespace are file and directory names and the namespace serves to map file paths to streams of bytes. Also quite well known is the registry and it serves a very similar purpose. For people writing kernel mode drivers in windows also a pretty familiar one is the object manager namespace (or the OB namespace), where object names are used to identify kernel objects.

In some operating systems users are used to see and work with some other namespaces grafted into the main OS namespace (in windows users don’t usually see the OB namespace, but it can be explored using tools like WinObj ). For example, the storage devices namespace, the COM ports namespace or the running processes namespace.

For developers some familiar namespaces are the types namespace and the variables namespace (in the compiler).

But there are others even more interesting. For example, a namespace doesn't have to use ASCII or UNICODE strings to identify objects. If one were to use numbers, like 1,2,3 and so on the namespace would be an array. Similar, the process handles form a namespace, where the handle is used as the name. By now it's probably pretty clear that any key-value type of structure is a namespace. Even memory is a namespace as well, where the name is the address.

Now that we have some examples of namespaces we can look at some choices the designers of these namespaces made and what is the impact of those choices on the way they are used.

First, let's look at the object manager namespace in windows (which, as I said before, I'll refer to as the OB namespace).

I'll start by listing some of the properties of this namespace. The names in the OB namespace are UNICODE strings. As is usually the case with namespaces where the names are strings, the namespace implements a hierarchy of names and it is public. Some interesting features are that it supports links from one point in the namespace to another part and that it supports objects that don't have a name (we could treat anonymous OB objects as a different namespace but that's not particularly interesting).

Support for anonymous objects is by far the choice with the biggest impact because it means that whoever implements the namespace can't use the fact that the object is removed from the namespace as an indication that the object needs to be deleted. So they must use some different technique to track object usage and in the case of OB that technique is reference counting. From a user's perspective this means they have to do the little dance that involves increasing the reference count before sharing the object with anyone and decreasing the reference count when they're done using it. It also means that removing an object from the namespace (a delete) can happen immediately on an object (as opposed to it happening when the object is closed, like in file systems). Another implication of this architecture is that it's hard keeping logs of things because an object might not always have a name, so how does one log it ? The memory address doesn't usually convey any information about the object.

The fact that a namespace supports links is also quite interesting. The designer needs to decide whether they support links to directories in the namespace or just links to "leaves" (like files). For example NTFS supports hardlinks only between files, not directories. The OB namespace however supports links to directories, which means the OB namespace can contain loops. So the designer must come up with a way to deal with potential loops in the namespace. Another interesting implication is the fact that the caller might need to remember which way they arrived at an object in the namespace (the path to that object) in a way that takes links into account. The OB namespace doesn't do that but it is required for some features (like file system symlinks) so the users of the namespace must implement that themselves.

One final characteristic is that the namespace is hierarchical. Hierarchical namespaces have some advantages from the perspective of the implementer since they allow grouping objects that belong together. The main advantages are security and support for isolation. A flat namespace on the other hand is easy to implement, but it's very limited as it is basically just a hash.

To get a better picture of the implications of implementing a hierarchical namespace versus a flat one, let's consider some namespaces that don't support hierarchies, like the named synchronization primitives namespace in windows (events, mutexes and so on). It's easy to get name collisions so each Windows application must make sure it's using a name that no one else is using. And then from a security perspective there is no way to limit listing them. Basically, you can either prevent someone from seeing any of the names or allow them to see all the names. Access control is possible, by only on a case by case basis, and there usually isn't a way to inherit security permissions from another object.

The isolation part is also pretty important. For example, consider the fact that Windows supports sessions. If helps to keep those resources that are semantically linked into a directory, so they can be easily enumerated and operated on (even if they are just links to the actual object). Isolation is really useful in virtualization because the user of that part of the namespace doesn't necessarily see all the available objects, just the ones they're supposed to see.

This is getting pretty long so I'll stop here and talk about the file system namespace in a different post. If there is enough interest I might talk about other namespaces like the processes namespace (please leave some comments if this sounds interesting to you).