Maintaining a Web-Site with CVS
by Lars Poulsen (lars@beagle-ears.com)
I am responsible for maintaining several websites, including two
at my job and a couple related to volunteer organizations,
including my church.
Some of these are updated by multiple people, so I have developed
a system to do this safely using the
Concurrent Version System (CVS).
CVS, the Concurrent Version System, is a type of software known as
a source code control system. It provides a general mechanism whereby
a collection of text files (original computer programs in source code form)
can be stored in such a way that updates can be tracked, and the state
of the system at any time in the past can easily be recreated.
The "concurrent" aspect releates to the fact that CVS allows several members
of a group to make updates, and CVS will help resolve the possible
abiguities when two different people are updating the same files
at roughly the same time.
In the regime that I have developed, the CVS repository lives on a
server somewhere on the Internet. When you need to participate in
website updates, you get an "account" on the repository server,
i.e. an identity so that the server knows who you are. This
takes the form of a username and password. You also need to install
a software bundle on your home PC. (Although CVS is available for
almost any computer, including Apple MacIntosh systems, I have not
used it on Macs, and I don't have a prepackaged bundle for them.
My Mac experience has all been on older, slower 68000 based Macs,
anyway.)
There are four steps to this:
Five steps:
- Pick up any updates that someone else may have done
since the last time you worked on the website:
- Open a command window (MS-DOS command prompt)
- Point the command window into the working directory:
cd \web\sitename
- Tell CVS to get the latest versions of all files:
cvs update
- Update the files by whichever tools you prefer
(Notepad, MS Word, Netscape Composer).
This is the part that gives beginners the most problems,
so look below for some hints.
- Proofread the updated files by means of the browser on
your own machine (File:Open Page:Browse). Correct errors
as needed.
- Send the updates to the repository server:
cvs commit
If you added any new files, the CVS server must be told i
out them:
cvs add newfile.htm
cvs commmit
- Wait patiently for the next update cycle on the public server,
(this happens at 1:05 AM Pacific time)
then check that the files made it all the way.
The power of the World-Wide Web is that a very simple mechanism has
been defined that is a common standard, independent of any specific tools,
and as strange as it seems to many, the best tool for beginners to
use to create and maintain a website is a straightforward text editor
such as NOTEPAD.
How can that be? When there are dozens of tools, some free, some sold
for up to 500 dollars per machine they run on; if all you need is
NOTEPAD, how can they sell all this other stuff?
The answer to that, of course, is that it depends on what your goals are.
For some people, the goal is to prove how intelligent they are, by
demonstrating that they are able to do stuff that looks like it was
produced by the advertizing department of a major movie studio.
For this, the tools can be very helpful. But if that is your goal,
I have no time for you. After all, any time I spend pulling you out
of the holes you dig for yourself would serve only to boost YOUR ego,
not mine.
But if your goal is to present some information in a straightforward
way, with the focus on the information rather than the presentation,
the amount of "markup" commands that need to be inserted is minimal,
and by taking the time to understand the basics, you will be able
to work much more effectively.
To get started, look at a few simple web pages such as this one,
then from your web browser's menu bar, select "View:Page Source".
This will open a new window, where the text you see is interspersed
with a modest amount of commands in "angle backets" (also known as
"less than" and "greater than" signs).
This page will explain what those
commands are.
After you have seen a few pages like that, you may find it
instructive to look at a page created by an advertizing
agency, such a the Disney Company
CNN or
"the Avon Lady".
When you look at a page like this with "View:Page Source"
you may be completely unable to find anything resembling the
information content of the page.
In fact, it is not unusual to discover that the "text" you see
on such pages is represented by pictures of the text.
Such pages are often abusing the standardized command language
to such a degree that they cannot be made to look the same
on differnet browsers, leading either to a footnote that
you must use Netscape (or Internet Explorer) to view this page,
or to having different pages selected, depending on which browser
the viewer is using. In my (strongly held opinion) this runs
completely counter to the Internet philosophy of information
sharing.
Once you understand the basics of page markup, you can start using tools.
The reason that you need to understand the basics before you can
safely use the tools, is that the tools often make silly mistakes
which are easy to fix if you can read the HTML that was produced by
the tool.
If the tools are supposed to be helpful, why do they make errors?
Why did the people who made the tools not fix the errors?
These are good questions, but the answers are not so simple.
First of all, you need to understand that what you see on a single
page on your web browser is usually not a single object on the
server that sent you the page. Most pages have pictures on them,
and there are separate files for
- The overall description of the page. On the pages that I write,
that is the text of the page, but that is not always the case.
- Optionally, there may be a reference to a STYLE SHEET, which
is a higher level description that describes what fonts,
colors, etc are used for each level of headline, for
footnotes, etc. By putting this information in a separate
stylesheet file, it is easy to change it when you want a new
look for the website, without having to edit every separate
page. (There are about six different ways to accomplish this
overall goal, though, so style sheets are not yet all that common.
And most browsers have bugs in the way they handle style sheets.)
- Every picture on the page is a file by itself. If a picture is
broken up into a mosaic of fragments in order to allow you
to link to different pages by clicking on different parts
of the picture, then every fragment is technically a separate
picture and lives in a separate file.
- Some designers use a feature called FRAMES where the browser window
is divided into parts that can be scrolled separately. In this
case, each frame is sort of a page of its own, and lives in a separate
file, while the first file only defines how the page is divided.
(If the page is displayed on a browser that does not know
about frames, the definition of the frameset is ignored,
and then the text in the main page is actually displayed.)
The advantage of a webpage editing tool is that it will collect all
of these pieces and show them as one document. This is a lot like what
happens when you import pictures, spreadsheet pieces etc into a
word processing document. But when you do this is MS Word, or Star Office,
or WordPerfect, the imported pieces actually become part of the
single file that contains the document. When you do this in a web
page editor, the web page file only gets to contain a "tag", i.e. a
pointer to the imported object, so that the browser can pull it
in when you display the page. And here is the rub: Such reference tags
(whether for links or for pictures) can be done in two ways:
- a relative tag says "that picture lives in a file
named so-and-so in the same folder as this page" (or "in
a folder named so-and-so within the folder above the one where
this page lives".
- an absolute tag says "that picture is in so-and-so
place", where the so-and-so-place can be
- "in a file named so-and-so in folder so-and-so on this
computer"
- file:///C|/www/cmcweb/lars/engineer/ww-html/cvsweb.htm
- file:///f|/win98/tools/icon98.gif
- "on web server so-and-so with URL so-and-so"
- http://www.beagle-ears.com/lars/engineer/www-html/cvsweb.htm
- http://www.cvshome.org/graphics.cvs.gif
In most cases, you want to use the relative form; then the links or
picture references work both when you test the pages on your own machine,
and when you move the whole folder together to the server where it
is visible to the public. But sometimes, you need an absolute reference.
So the editing tools can do either, and sometimes they get it wrong,
especially when you use point-and-click to put the references into your
document. Then your page gets an absolute reference to a file on your
PC, and when someone else displays the page on their PC, they don't have
that file in that place on their hard drive, and the browser displays a
broken picture. This is very simple to fix if you know the
tags used for links and images, but overwhelming if you don't.
Another common class of errors, is that Microsoft uses slightly different
character set tables than what is standard, and tools produced by
Microsoft often put out HTML that converts quotes to accented vowels
or something equally silly. Again, this is easy to fix with NOTEPAD,
but only if you know enough to spot the error.
The three most common tools used by beginners are:
- Microsoft Word (or rather "File:Save as HTML" in the MS Office
program suite)
- Netscape Composer (which is part of the normal Netscape Communicator) and
- Microsoft Frontpage (which comes free with several Microsoft software
packages; unfortunately, it changes fairly randomly which ones it
comes with).
Of these, my recommendation is for Netscape Composer.
MS Office
The tools in the Office suite have been there since Office 4.2, when
they were a free download (called Internet Assistant for Word, Excel etc)
from the Microsoft Website, released at the time
Windows 95 came out. They have been included in Office 95, Office 97
and Office 2000. Every new version is better, has fewer bugs and is
easier to use. Unfortunately, the things you have to know to use them
change with each version, because each version works a little differently.
If you know HTML, you can see what is happening and what you must do,
but if not, you will have problems. For these reasons, I hesitate to
recommend them.
Early versions of the Office tools would create a folder for each
document, and save both the main HTML file, and the embedded pictures
(converted to GIF files) into that folder. The version in Office2000
saves the HTML document is the folder where you ask to save it into
(typically the folder where you had the DOC document) but maintains
a common sub-folder within that folder to contain all the pictures
from all the documents in the folder. At least I think that is what it
does. This means that when you update the HTML document, you need to
transfer everything in the subfolder to the server along with the
HTML document.
If these things were better explained (and more consistent between
versions of Office) I think this might be my preferred tool set
to recommend to beginners, but as it is, I find that they need
too much hand holding when trying to use this.
Netscape Composer
Netscape Composer feels a lot like a word processor.
I have found it easy to learn and to teach, and for beginners that
need to maintain pages with TABLES. this is the easiest way to do it.
It is not quite as easy as to use Excel, buty certainly easier than
to set up tables in WORD.
I very much recommend Netscape Composer.
Frontpage
Frontpage is intended to do for Internet Explorer what Composer does
for Communicator. Unfortunately, it has some major problems:
- It attempts to solve not only the problem of editing the files,
but also the problem of how to get the files uploaded to
the server. It does this, not by providing an easy to use
interface to the standard tools that exist on every web server,
but by requiring a server to have a set of non-standard tools
installed on it to accept the upload. The idea was to sell
Microsofts web-server software and get the desktop users
to ask the internet service providers to install it.
- It simplifies certain types of pages by interacting with yet
more of the special server extensions.
- It is hard to find in many of the installation kits that include it.
For example, I have both Office97 and Windows98 (Second Edition)
installed on my PC. Frontpage is supposedly included with both
of these, but in a casual look-around I have not found it.
I am sure that almost everything that can be done with Composer
can be done about as easily with Frontpage, but I am not eager to
switch.
$Log: cvsweb.htm,v $
Revision 1.2 2001/10/26 13:28:29 lars
Replaced CMC -> Beagle-Ears
Revision 1.1 2001/01/15 07:08:11 lars
New document: Maintaining websites with CVS