[cvsnt] Re: Performance problems

Tue Dec 28 11:29:14 GMT 2004

On Tue, 28 Dec 2004 12:35:33 +0200, "Nitzan Shaked"
<calius at netvision.net.il> wrote:

>> If you're only saving 1 extra revision there isn't any point in the
>> complexity of trying to handle something like that.
>
>Not one extra. One more copy whenever the the number of revisions doubles it
>self. Or is 1.7 from what it used to be, or 1.3. Configurable, but
>logarithmic. Again, the idea is to have a fast "update" from old versions.
>If this is not used a lot then I suppose the current way is not bad at all.
>Somehow it still strikes me as more "elegant" to do an "update" by doing an
>O(logN) patches than O(N).

The most common case is that people work near the top of the tree
rather than on years old revisions. This means that if there is any
optimization needed it would be to make sure the closest revisions to
HEAD could be quickly retrieved.
And this is exactly what CVS does...

.. snip ..

>> Commits are often done on thousands or tens of thousands of files in
>> large repositories.
>
>This is really out of scale for me. But even then I find it hard to believe
>that people commit thousands of files at once. Most likely most of those
>files haven't actually changed. That can be avoided, then, but not comitting
>un-changed files. (Compare timestamp, or save a checksum of the check-out
>file in the metadata dir).

CVS does *not* ever commit unchanged files (unless you put in the
override flag to the command). Only edited files are ever committed
even if you explicitly name also unchanged files in the command...

But I tend to agree with your observation but on different grounds:
If a developer has a thousand of edited files to commit at one single
instance then he is doing something seriously wrong....

Firstly, he must have worked for a very long time without any commit
to CVS for this to happen and that is not the way I see version
control system usage. It is meant to actually help the developer by
safeguarding his different versions of the working files, so he should
commit rather often. At our place the developers commit daily for just
this reason.
Secondly, I have the view that any commit shall be accompanied with a
log message describing the actual reason for the commit down to fairly
detailed change notes on each file. Thus such a commit operation would
involve a single or at the most a few files involved in a development
task.
A log message covering the changes across a thousand files would be
unwieldy to say the least! So in actual fact it would probably contain
"stuff" or "changes since last commit" or other unusable garbage.

>> > Tagging at a file level is important, at least for me. But isn't there a
>way
>> > to do so without writing to each file? I suppose you could store the
>tags in
>> > a linked list in the file, so that adding a tag to the file won't have
>to
>>
>> How do you suggest doing this?
>
>There's an easy way to do so, but it involves modifying the file instead of
>creating a new one as you write below that you like. It *is* safe, though
>(similar to 2-phase commit) and I'd be happy to discuss it if it's really
>interesting. I myself implement something similar on a flash-based device
>which I have to protect from power failures:  I have a list of <something>,
>and when I need to append to it I don't copy the whole list on flash.
>
>> You still have to rewrite the file.  CVS *never* just modifies a file -
>> that would be unsafe on disk failure/powercut etc.  It builds a
>> completely new file (mostly by doing a copy of the unchanged elements
>> and patching the new ones in) then at the last moment does a (hopefully)
>> atomic rename of the file on top of the old one.
>
>If you are appending (and you are, I think, when I commit to the tip of
>HEAD, right?), then you can say the above idea to do so without re-writing.
>I understand I am suggesting a non-trivial thing here, and don't presume,
>again, to have thought everything over -- just an idea I'd be happy to
>develop.

Hold off!
You are suggesting changing the underlying RCS format paradigm and
that would make CVSNT a non CVS version system!
One of the great things about CVS is that it has all of the file
history inside each RCS file including all revisions and all tags etc!
So the RCS file can be moved or copied while retaining the file
history intact. Storing data in separate files makes this impossible
and thus will break the CVS-RCS heritage completely....

/Bo
(Bo Berglund, developer in Sweden)