?

Log in

No account? Create an account

Ptah - Journal of Omnifarious

Aug. 1st, 2005

12:28 pm - Ptah

Previous Entry Share Next Entry

I'm working a bit again on Ptah, a project I started a few years ago, then abandon when my scattered design notes and half implementation failed to garner sufficient interest from the people running the build tool contest I'd entered it in. The codebase I'm working with at TeraCloud is in need of a good build tool, and so I'm resurrecting this project and doing some work on it in my spare time.

Anyway, here's a random diagram:


One goal of this tool is to be able to produce a shell script that will build whatever it is from scratch with the specified set of build tools. The build won't be able to do any dependency tracking, but if you're trying to build from source to create a package, this will set it up so the package build doesn't require Ptah to be installed first.

Another goal is to parallelize everything as much as possible. This even includes checking files to see if they've been changed since the last build. I suspect a lot of time is wasted waiting for drive heads to seek around in order to satisfy calls to stat(2). If this were parallelized, I think it might improve things.

Current Mood: [mood icon] accomplished

Comments:

[User Picture]
From:elizabetta1
Date:August 1st, 2005 07:48 pm (UTC)
(Link)
HAPPY BIRTHDAY OMNI!!!
(Reply) (Thread)
[User Picture]
From:omnifarious
Date:August 1st, 2005 08:01 pm (UTC)
(Link)

Thank you! :-)

(Reply) (Parent) (Thread)
From:rosencrantz319
Date:August 1st, 2005 08:06 pm (UTC)
(Link)
(Reply) (Thread)
[User Picture]
From:omnifarious
Date:August 1st, 2005 10:00 pm (UTC)

Re: Ooh pretty lines.

(Link)

Well... Programs are generally built out of pieces. And one of the annoying problems in software is putting all the pieces together to make the program. This is generally solved with a tool called 'make'.

It's sort of analogous to constructing, say, a house.

You have to make the foundation before you can make the rooms. But, once you have the foundation and frame up, you can often work on the rooms in any order you like, or bring in a whole bunch of people and have them each work on one of the rooms.

Unlike a house, a computer program is often built many times while it's being worked on. Often, only some small thing will be changed, and many steps in the building process can be skipped. The dependencies help with this. If you change the foundation, you may have to rebuild everything, but if you just want to change the plumbing for the bathroom, there's not much outside the bathroom that needs to be re-done.

So, that diagram sort of represents a fictitious set of dependencies for building a program. An example. For example, a.o depends on a.c and a.h, so if either of those change, a.o, and anything that depends on it needs to be rebuilt.

The diagram on the right represents an order of building things such that all the fundamental things will be built before the things that depend on them. If I were going to print out the instructions for building the program from scratch, that's what the instructions would look like.

(Reply) (Parent) (Thread)
From:hattifattener
Date:August 2nd, 2005 12:37 am (UTC)
(Link)
I've often thought it would be nice to be able to queue up a bunch of I/O operations (or, more generally, system calls) and let the OS schedule them in whatever way it likes. Except for networking, the only way to do that right now is with a zillion threads or forked tasks, and that's a PITA.

Going down this road very far makes me think that the nature of the task/kernel interface needs to be changed. One way of thinking about operating systems is that they present a virtual machine to the running task: a virtual machine with a handful of very complicated instructions like "open a file" in addition to the native hardware instructions. Or the operating system can be thought of as a sort of peripheral, with signals taking the place of hardware interrupts. But this kind of peripheral control is very old-fashioned — modern high performance interfaces can do command queueing and split transactions and DMA. So why not update the UNIX "virtual machine" to include a few of the last forty years of hardware developments?

Where this would really shine would be in tightly-coupled multiprocessors, like multi-core or hyperthreaded chips. Leave the cpu-bound task running on one CPU, and the io-bound kernel running on the other, with nonblocking communication through some in-memory queues. We don' need no steeking context switches!
(Reply) (Thread)
[User Picture]
From:omnifarious
Date:August 2nd, 2005 03:25 am (UTC)
(Link)

I too have thought this. The other thing that needs to be changed is transactions. There needs to be some sort of transaction support built into the kernel. IMHO, this is a major reason why we don't see any good distributed filesystems for Unix.

(Reply) (Parent) (Thread)
[User Picture]
From:ohpun
Date:August 2nd, 2005 09:02 pm (UTC)
(Link)
I assume you've looked at Ant http://ant.apache.org/ and decided not to modify it because ... ?

If all the stat calls were done at the beginning and stored in a cache, then I think some speed improvement could be achieved. But stat is fast since (on unix) it jumps to info in the i-node. Parallelizing won't help because the disk heads can only work serially (or pseudo-serially in the case of RAID).

I once wrote a C program that, given an input and output spec, wrote a C program which translated from the input format to the output format. I like programs that write programs (or scripts in your case).
(Reply) (Thread)
From:hattifattener
Date:August 2nd, 2005 11:24 pm (UTC)
(Link)
The reason that parallel invocation of the stat(2) calls speeds things up is that you don't know what order is most efficient to issue them in. Seek time dominates disk access; if the OS has many disk requests, there are lots of well-understood algorithms for ordering them for fastest throughput. (Some of them aren't possible on the broken PC-clone architecture, IIRC, but that's another rant.) Normally you sort them so that you start at one edge of the disk and seek across it, reading whatever needs to be read as you get there, doing everything in one pass of the disk arm.

But if the seeks are serialized, the OS has to read each inode in exacly the order specified, since it won't receive the next request until it's actually finished this one. This results in lots of random seeks all over the disk, which is slow. (And noisy.)

Of course, as you point out, if all the inodes are in cache it's not too important. But getting them into the cache efficiently is difficult with the POSIX APIs as they stand.
(Reply) (Parent) (Thread)
[User Picture]
From:omnifarious
Date:August 2nd, 2005 11:31 pm (UTC)
(Link)

Yes, exactly. :-) Once they're in cache, it doesn't matter a whole lot anymore. And with modern memory sizes, it's quite concievable that you'll end up with all the inodes of even a large build in cache. But that doesn't help you the first time you build.

(Reply) (Parent) (Thread)
[User Picture]
From:omnifarious
Date:August 2nd, 2005 11:29 pm (UTC)
(Link)

I don't like because the idea of an XML based programming language deeply offends me. I've looked at ant files, and they seem much more suited to creation by a tool than reading by a human.

(Reply) (Parent) (Thread)
[User Picture]
From:nikith
Date:August 3rd, 2005 05:24 am (UTC)
(Link)
Ok. Realize that I speak both medicine-ese AND geek 101. Maybe even geek 102, since I'm "just a girl" and know how to flash my bios and used to build games on my atari-like system (I was about 8, I have no clue what I was on, only that the games I made were all variations on space invaders...)

With that being said, holy crap. It's a whole other world out there.
(Reply) (Thread)
[User Picture]
From:omnifarious
Date:August 3rd, 2005 06:08 am (UTC)

That's pretty fluent geek :-)

(Link)

You might like the posts I have that use this icon: :-). They're more about nuts & bolts getting computer stuff to work than about programming.

(Reply) (Parent) (Thread)