October 10, 2012

Genesis: On History and Computer Science

This will hopefully be a blog describing my experience in writing Cute: a 64-bit multi-threaded hobby Operating System kernel. The standard question that typically comes to people's mind is, why yet another operating system? That's a fair question to ask, with a long story to answer for.

Delving into the core

Context switching and per-CPU memory areas
Cute's context switching and per-CPU memory areas

I've had my first computer exactly 10 years ago: a Pentium-4 machine with a nice installation of Windows XP. From that day, I've been looking to the computer as a magical device; the way that beautiful navy Login screen appears, that green start button, the C: and D: partitions and how files get organized inside them.

I badly wanted to understand how such stuff work, and thus entered a computer science college here in Cairo. In the first two years there, I learned the bare basics to a not-so-bad level. You can write in C & C++, you have a nice idea about data & file structures, pointers & their arithmetic, and so on. You can in fact write moderate-size programs, and it was all not so bad.

With that knowledge above, I still could not understand the internal workings of my system, and that kept me frustrated. One day a friend of mine offered a 4-CD Linux distribution. I did not know what Linux was by the time, and so I asked. I remember being told that it was a "nice OS, much more secure than Windows XP; all the source code is there, in the open!". So I wondered, "all of it!?", "Yes, and you can even modify it if you want!".

That was kind of magical opportunity for me, and so I rallied back home installing that software instantly. It was a Mandrake Linux distribution with a 2.6.9 kernel. Studying some nice books from the Linux Documentation Project made the magic of the system appear: it was scriptable! — and automation is the essence of computing. I learned Bash/sed/awk scripts and could thus do nice stuff with the local environment. Accordingly I began to understand the OS environment more, aligning very nicely with my original quest of finding "what's exactly happening under the hood?".

Days pass and I compile and build my own small Linux distribution, and as a result, I finally understand how each and every system binary fits in! But by then, I wanted to delve deeper to understand how the operating system kernel manages the computer hardware so well. I studied that well-known dinosaurs book first 12 chapters cover to cover, and then grabbed the Linux kernel source tree. With the help of a great book from O'Reilly, I was able to follow the Linux kernel code line by line and submit some code fixes in the process. My first patch fixed a minor error-handling issue in the ‘cpufreq’ power governor, the second patch handled redundant extra checks while adding hardware interrupt sources. I was very proud!

After working in kernel code clean-ups for a while, I was lucky enough to join the Smack project in its early days: basically providing little help in making the code upstream while sending bug-fixes in the process. It was all very nice, especially cause of the first-hand experience in upstreaming kernel code, along with all of the interesting arguments!

Hitting the Wall

In spite of the nice and dandy story above, I was not able to go further in my journey for understanding system internals. A big and coherent view of the operating system kernel was deeply missing. For instance, I could open the file system code (or read about it in a book) and certainly get a good idea about its current design and algorithms, but bigger questions always remained unanswered.

Namely, why was the file system designed that way? I mean, yes, I knew inodes, files, pathname translation, and such, but why? What was the origin of these terminologies anyway? What was the problems initially faced so that Unix authors arrived at that particular solution? Why did they settle on hierarchical pathname translation in the first place? Was there alternative models for file access? What is the historical context of their design, that is, their previous experiences in that domain and any technical (and sometimes political) limitations of their environment back then, etc.

By getting those questions answered for each major OS component, hard and dry topics turn into a cinematic kind of presentation. This is in fact the main thesis of my entire essay: good ideas and designs rarely come out of the blue, they reflect a historical context and a non-linear progression.[1] Without understanding that historical context, and without getting into the mindset of our field's original pioneers, understanding our current state and improving it further upon becomes much harder. This is especially true in complex domains like math and engineering.

What's Been Done & the Road Ahead

Eventually all of this brings us to the original question: why yet another kernel? For me, that hobby project is a vehicle for examining operating systems historical and new research: coding a certain subsystem while carefully studying its original literature provided all the necessary context I badly needed for a big coherent view of Operating System kernels. At last!

So before tackling a major component in Cute, I first had to research that component's original literature: papers which have set the field's main theme and ideas. Afterwards, "mile-stone" papers get published which refine these ideas further and further till they reach the ways they're currently implemented in our systems. Study notes on most of these papers, along with the PDFs themselves, are collected in a nice git repository. Such an effort (plus the code) has now been done for memory management, scheduling, concurrency, and file systems; it was definitely a very fun ride.

Finally, someone may argue that we're in a fast-paced field and there's no time for such a deep presentation. The whole purpose of this series of articles will be to counter that deep-rooted fallacy. In fact, math and computer science should've always been taught in that matter, and you'll definitely be interested in Thucker's Turing Award lecture and "Cauchy's Famous Wrong Proof" paper for further rationale.

Thanks to A.B. for carefully reviewing drafts of this article.

Footnotes:

1. ^ You can argue that human scientific progression is a partially ordered set of events. What history does is trying to make a story out of it: basically a topological sort which provides a totally ordered set of events (a story) that hopefully respects the original partial order.

2. For comments, please check the reddit discusssion.