I've been getting a lot of hits to my page about name usage in file system filters so I've decided to expand on the subject of names a bit further. This blog post is more about software design (and especially about OS design) and less about file system filters.
The role of language in shaping the way we think is a very interesting subject and one I've been interested in for a while. The book "Language in Thought and Action" is a very good introduction to the subject. One of the ideas in the book is that the mapping of names to objects changes the way we think about the object. While this is true to a certain extent in programming (think about how often you heard the phrase "well, this API would have been better named BlahBlah …"), computer science as a discipline has a completely new class of problems that I'd like to focus on in this post. The problems associated with actually designing namespaces. I'm not sure why designing and identifying namespaces isn't as popular in computer science circles as other concepts like indirection and variable scope because it's at least as important.
I don't think writing a formal definition of a namespace would actually be very interesting so I'll go straight to some examples of namespaces.
Probably the best known one is the file system namespace. The main elements of this namespace are file and directory names and the namespace serves to map file paths to streams of bytes. Also quite well known is the registry and it serves a very similar purpose. For people writing kernel mode drivers in windows also a pretty familiar one is the object manager namespace (or the OB namespace), where object names are used to identify kernel objects.
In some operating systems users are used to see and work with some other namespaces grafted into the main OS namespace (in windows users don’t usually see the OB namespace, but it can be explored using tools like WinObj ). For example, the storage devices namespace, the COM ports namespace or the running processes namespace.
For developers some familiar namespaces are the types namespace and the variables namespace (in the compiler).
But there are others even more interesting. For example, a namespace doesn't have to use ASCII or UNICODE strings to identify objects. If one were to use numbers, like 1,2,3 and so on the namespace would be an array. Similar, the process handles form a namespace, where the handle is used as the name. By now it's probably pretty clear that any key-value type of structure is a namespace. Even memory is a namespace as well, where the name is the address.
Now that we have some examples of namespaces we can look at some choices the designers of these namespaces made and what is the impact of those choices on the way they are used.
First, let's look at the object manager namespace in windows (which, as I said before, I'll refer to as the OB namespace).
I'll start by listing some of the properties of this namespace. The names in the OB namespace are UNICODE strings. As is usually the case with namespaces where the names are strings, the namespace implements a hierarchy of names and it is public. Some interesting features are that it supports links from one point in the namespace to another part and that it supports objects that don't have a name (we could treat anonymous OB objects as a different namespace but that's not particularly interesting).
Support for anonymous objects is by far the choice with the biggest impact because it means that whoever implements the namespace can't use the fact that the object is removed from the namespace as an indication that the object needs to be deleted. So they must use some different technique to track object usage and in the case of OB that technique is reference counting. From a user's perspective this means they have to do the little dance that involves increasing the reference count before sharing the object with anyone and decreasing the reference count when they're done using it. It also means that removing an object from the namespace (a delete) can happen immediately on an object (as opposed to it happening when the object is closed, like in file systems). Another implication of this architecture is that it's hard keeping logs of things because an object might not always have a name, so how does one log it ? The memory address doesn't usually convey any information about the object.
The fact that a namespace supports links is also quite interesting. The designer needs to decide whether they support links to directories in the namespace or just links to "leaves" (like files). For example NTFS supports hardlinks only between files, not directories. The OB namespace however supports links to directories, which means the OB namespace can contain loops. So the designer must come up with a way to deal with potential loops in the namespace. Another interesting implication is the fact that the caller might need to remember which way they arrived at an object in the namespace (the path to that object) in a way that takes links into account. The OB namespace doesn't do that but it is required for some features (like file system symlinks) so the users of the namespace must implement that themselves.
One final characteristic is that the namespace is hierarchical. Hierarchical namespaces have some advantages from the perspective of the implementer since they allow grouping objects that belong together. The main advantages are security and support for isolation. A flat namespace on the other hand is easy to implement, but it's very limited as it is basically just a hash.
To get a better picture of the implications of implementing a hierarchical namespace versus a flat one, let's consider some namespaces that don't support hierarchies, like the named synchronization primitives namespace in windows (events, mutexes and so on). It's easy to get name collisions so each Windows application must make sure it's using a name that no one else is using. And then from a security perspective there is no way to limit listing them. Basically, you can either prevent someone from seeing any of the names or allow them to see all the names. Access control is possible, by only on a case by case basis, and there usually isn't a way to inherit security permissions from another object.
The isolation part is also pretty important. For example, consider the fact that Windows supports sessions. If helps to keep those resources that are semantically linked into a directory, so they can be easily enumerated and operated on (even if they are just links to the actual object). Isolation is really useful in virtualization because the user of that part of the namespace doesn't necessarily see all the available objects, just the ones they're supposed to see.
This is getting pretty long so I'll stop here and talk about the file system namespace in a different post. If there is enough interest I might talk about other namespaces like the processes namespace (please leave some comments if this sounds interesting to you).
Monday, September 20, 2010
Subscribe to:
Post Comments (Atom)
Alex,
ReplyDeleteInteresting article. I need to raise a couple of questions though.
In my view filesystems are not a namespace, they are just namespace _providers_ (in the Ob namespace). So
/device/hardisk1/dir/file.extension:stream$Attribute
Is an address in the Ob namespace. This is why they are mapped to file *objects* and we use object attributes to look them up. Pretending that they are different is confusing (to me). Of course different providers have different restrictions on what you can and cannot do.
Registry (Cm) of course is a different namespace, although why that was the case has always been a mystery to me, but I suspect that it's because the registry came from Windows and the Ob stuff came from ideas in Mica...
I'm also intrigued by your observation that top level name provider in the Ob namespace (the ObProvider itself) supports hard links. Obviously we are all used to the softlinks like /??/C:, can you give details and even APIs?
We then move on to hard links in NTFS. Quite rightly they do not point to directories - I once had to work under a Unix filesystem with hard links to directories and the pain from the wounds has outlived memory of the reason for the pain.
NTFS has also (since day 1) supported soft links (or actually something almost but not quite softlinks in the unix sense). As an aside DFS also supports some sort of links but die, no doubt to insufficient integration, does not rely on the object manager to deal with them. This still causes huge pain for people trying to deploy against DFS in XP and earlier.
One of the more regrettable implementation decisions in Longhorn (to my mind) was the hack that was introduced into IOCFSDH to sometimes allow invisible traversal of symbolic links outwith the object manager. This has the side effect of making it possible to have something like hard links to to directories - with all the consequent issues for people who try to do things in the name provider space.
To summarise - NT 3.5x had (at least) two namespaces Ob and Cm. The Ob Namespace had softlinks (with remote resolution) and hardlinks. These had reasonably easy to understand semantics.
DFS came along and added a different type of softlinks (also with remote resolution) but with no documentation.
In Longhorn we saw the distinction between symbolic and hard links become blurred resulting in significant work for people who affect the namespace.
Rod
=============
Rod Widdowson
Consulting Partner
Steading System Software LLP
+44 1368 850217 +1 508 915 4790
https://steadingsoftware.com
Hello Rod,
ReplyDeleteI realize I wasn't clear when talking about links. I didn't mean to imply that OB supports hardlinks. What I was trying to say was that NTFS supports links in its namespace, and those links are hardlinks. I was trying to differentiate this case from the case of symlinks in NTFS, where NTFS simply makes use of OB symlinks and the OB reparse mechanism.
I see your point about filesystems being just name providers for the OB namespace, but from the perspective i'm trying to look from in this article, they are namespaces in that they implement a name to object mapping, even though the object is not a FILE_OBJECT but rather an SCB. So I'm not talking about the path /device/hardisk1/dir/file.extension:stream$Attribute as a name for a FILE_OBJECT but rather about the /dir/file.extension:stream$Attribute path as a name for an SCB.