[cvsnt] Re: Performance problems

Tue Dec 28 10:35:33 GMT 2004

> If you're only saving 1 extra revision there isn't any point in the
> complexity of trying to handle something like that.

Not one extra. One more copy whenever the the number of revisions doubles it
self. Or is 1.7 from what it used to be, or 1.3. Configurable, but
logarithmic. Again, the idea is to have a fast "update" from old versions.
If this is not used a lot then I suppose the current way is not bad at all.
Somehow it still strikes me as more "elegant" to do an "update" by doing an
O(logN) patches than O(N).

> In the example of the CVSNT_2_0_x branch that's close to a worst case
> (since it's been branched for longer than most branches would normally)
> and the slowdown isn't noticable.

There you go -- exactly my case: most of my guys use a tag that's usually
1-2 months old (on most of the files), if not more. In such a case I prefer
a quick update. If you say there's no real speed difference then we can
leave it I suppose...

> Commits are often done on thousands or tens of thousands of files in
> large repositories.

This is really out of scale for me. But even then I find it hard to believe
that people commit thousands of files at once. Most likely most of those
files haven't actually changed. That can be avoided, then, but not comitting
un-changed files. (Compare timestamp, or save a checksum of the check-out
file in the metadata dir).

> > Tagging at a file level is important, at least for me. But isn't there a
way
> > to do so without writing to each file? I suppose you could store the
tags in
> > a linked list in the file, so that adding a tag to the file won't have
to
>
> How do you suggest doing this?

There's an easy way to do so, but it involves modifying the file instead of
creating a new one as you write below that you like. It *is* safe, though
(similar to 2-phase commit) and I'd be happy to discuss it if it's really
interesting. I myself implement something similar on a flash-based device
which I have to protect from power failures:  I have a list of <something>,
and when I need to append to it I don't copy the whole list on flash.

> You still have to rewrite the file.  CVS *never* just modifies a file -
> that would be unsafe on disk failure/powercut etc.  It builds a
> completely new file (mostly by doing a copy of the unchanged elements
> and patching the new ones in) then at the last moment does a (hopefully)
> atomic rename of the file on top of the old one.

If you are appending (and you are, I think, when I commit to the tip of
HEAD, right?), then you can say the above idea to do so without re-writing.
I understand I am suggesting a non-trivial thing here, and don't presume,
again, to have thought everything over -- just an idea I'd be happy to
develop.

> > As an idea: the client knows which revisions of which files it is
currently
> > holding. Just send that information to the server (recurse over all
> > client-side directories) and call that a tag. Put in a file of it's own.
> > Scalable, and quite fast.
>
> Not really..  you're still having to write the file, which is the slow
> part.  You're saving little or nothing on the current scheme, unless the
> RCS files are *really* big, and in that case other factors are already
> slowing you down.

... disagree here. I would write one file, and do it once, instead of
writing to multiple files, and for each create a copy of, rename, delete,
etc. Just to re-iterate the idea: the client collects the version info of
all files it wants to tag, and sends that data to the server which then
creates a file with that info. One file per tag. The idea of course is not
complete: need to think about files which don't exist in the tag, but that
can be solved using an idea similar to your hierarchical tags. Can that be
slower than writing the tag to each of the tagged files?

> With a heirarchical tag, you don't recurse down directories on rtag, you
> just tag the directory with the exact moment of the tag (this requires
> high-granularity timers in the files... per-second isn't nearly good
> enough).  Every file/directory below that is deemed to have a tag that
> is the current version at that moment, unless overridden by a lower down
> tag (on a subdirectory or on the file).

Good only for rtag, though, no? I use exclusively tag, not rtag, since
version on *my* machine is correct and works, not something in the server.
(at least that's something I can verify). Using rtag would mean telling
people to stop work and check the tip of some branch on the server, or
create a new branch which is as long as tagging.

N.