?

Log in

No account? Create an account

Weird CVS rules - Journal of Omnifarious

Aug. 11th, 2007

09:28 pm - Weird CVS rules

Previous Entry Share Next Entry

I've been looking at exactly what CVS does to handle various interesting situations, and it's truly bizarre in some cases. It just illustrates to me what a huge hack CVS is.

Anyway, by way of just having somewhere to record them because I'll be needing them later and other people might care, I'm going to write them down here...

Revisions

CVS names revisions in a particular branch sequentially. This means they will be named prefix.1, prefix.2, ... prefix.n. Prefixes are always an odd number of numbers long.

Branches

CVS names branches rather oddly. First there is a 'branch tag' that doesn't name any actual revision. It is always of the form prefix.0.<branch number (always even)>. Prefixes in this care are always an even number of numbers long.

For example 1.1.0.2 can be the name of a branch, but 1.1.1.0.2 can't be. Then all revisions in that branch are named prefix.<branch number>.x where x starts at 1 and goes up in the manner described previously.

Deleting a file

A file that is deleted is recorded as a new revision with a state of 'dead'. If it's state at the tip of mainline is 'dead' that means the file is in the Attic.

The prefix may or may not have some relation to the version of the file that was originally in the branch. More precisely, if the file was originally in the branch, the prefix is the version that was branched from. If it wasn't originally in the branch, the prefix is some arbitrary random version picked from the HEAD (main) branch. If no versions are in HEAD (i.e. the file was added in a branch and no file of the same name (including the path) had ever been seen before) then a fake deleted revision is created in head so the branch has a prefix to use.

Interesting cases

This leaves some interesting questions.

What happens when you add a file on a branch?
The file never existed before in the repository
If the file never existed before in the repository, a fake revision 1.1 is added in the 'dead' state and a stock log message stating which branch it was created in. Then a branch tag is added for 1.1.0.2 for the branch it was added on. Finally, the new file is given revision 1.1.2.1.
If the is new in the branch, but already was in the repository
If the file existed at all, but not in the branch, a few things happen:
  1. Call the very last revision in mainline (always a 1.x revision) latest
  2. The branch tag for the branch this file is being added on is created with the value latest.0.<new even branch number>. Lets call the new even branch number branch_number.
  3. A new revision is added at latest.branch_number.1 in the 'dead' state with a stock log message about which branch it was added on and when.
  4. Finally, the new revision as added at latest.branch_number.2 with the log message given.

This basically means that any x.1 revision in the 'dead' state doesn't really exist.

There may be other interesting rules I'll record here too.

Current Mood: [mood icon] accomplished

Comments:

[User Picture]
From:eqe
Date:August 12th, 2007 07:03 am (UTC)
(Link)
You know, I'm sorry to say that this is useful to me. Also that I know what perl -pi -e command to run over a repo to change all the files from -kb to -ko.

(Yes, I do have a multi-year-old multi-branch CVS repo that I'm trying to convert to a real source control system. Whatever gave you that impression? :) )
(Reply) (Thread)
[User Picture]
From:omnifarious
Date:August 12th, 2007 07:12 am (UTC)
(Link)

*laugh*

Well, my goal is to make hg convert actually work reasonably well for CVS and to arrange it so that it can be used to keep CVS in sync with Mercurial.

My strategy is going to be to create a revision graph for each file with a date range for when that revision is 'active'. Then I'm going to use the changelog messages, authors and 'active' windows to create a set of discrete revisions.

And, unlike cvsps, I'm going to use a hash of some deterministic, unique and unchanging property of the revision to name it. Then I will create a dependency graph between all those names like you have for any of the Monotone model distributed SCMs.

And also unlike cvsps, this will be done in Python so it won't be beset by memory errors from people who put & in the wrong place in their hand-rolled linked list code.

(Reply) (Parent) (Thread)
[User Picture]
From:eqe
Date:August 12th, 2007 03:52 pm (UTC)
(Link)
The most intimidating thing that cvsps does is the heuristic to figure out what file changes belong to the same changeset; according to the documentation, a single cvs ci command checks in its argument list sequentially, giving individual files potentially different checkin times.

In my experience so far, cvs2svn does a pretty good job of handling even our fairly grotty repo. And it's written in python, so it might have stuff for you to gank.

So where's the & bug in cvsps? Don't see it in 5 minutes of poking.
(Reply) (Parent) (Thread)
[User Picture]
From:omnifarious
Date:August 13th, 2007 12:55 am (UTC)
(Link)

Well, it may not be in the release version of cvsps, but the release version of cvsps also only allows one tag per changeset. I'm talking about the bug I fixed in changeset 1e4ea9dbbf68 based off of a mirror if this cvsps git repository.

(Reply) (Parent) (Thread)