What is DocBook?
Why is it good?
Who uses it?

Getting DocBook tools on Linux.
Using them.
Where to find examples.

Introduction

In this article I hope to explain a little about DocBook and how you
might start using it. Like many free software documents (and programs),
this exists because I wish I'd had it when I started investigating all
of this with the intention of writing documentation. I hope it helps.

All the introductions I read began by explaining that DocBook is a DTD
defined in terms of SGML which made use of DSSSL-compliant stylesheets 
and which was under the control of the Technical Committee of the 
Organisation for the Advancement of Structured Information Standards. 
Whilst this was accurate, it didn't help me a lot.

Then there were the approaches which started with HTML and tried to
'translate' its concepts into DocBook concepts. Since my HTML is entirely
at the "open text editor, type the tags in, run it through the validator,
sigh, start again" stage and I didn't have much overall grasp of the
real nature of HTML, this didn't help much either. I actually understand
HTML better now because of using DocBook.

What finally made DocBook clear to me was understanding the process of
generating a document, because this involved pulling a bunch of tools
together and discovering what each tool did. When people talk about
"doing it in DocBook", they often mean "writing something in their
favourite text editor, using DocBook tags in it, running a conversion
tool on it, and telling the conversion tool which stylesheet to look at".
They're all different bits which fit together.

On my Linux box, the process goes like this:
	1. With a standard text editor, write your document, marking
    up concepts and content rather marking up appearance. For example,
    a title is a piece of content. A paragraph is a piece of content.
    A combination of keys on the keyboard to press is content. A command
    for the user to try is content. Wanting a blank line to appear at the
    end is not. Writing in italics is not. You now have a document which
    is plain ASCII, which can be read via 'more' or 'less', and which has
    <sect2 id="using">, <keycombo>, <para>, </para>, <title> and </sect2>
    all through it. 
	2. Run a program which converts this into something useful on the
    document. On my machine, that program is either 'jade' (for generating
    HTML) or 'jadetex' (for generating PostScript). This program looks at
    a stylesheet which has rules for how the DocBook tags are rendered in
    different formats. For HTML, for example, it may say, "When you meet 
    the DocBook tag <para>, create an HTML tag <p>. When you meet the tag 
    <title>, create an HTML tag <h3>" and so on. It's this stylesheet which 
    decides where the blank lines and italics go.
	3(a). (Ideally) Admire the resulting HTML pages and the PostScript 
    document happily. Open the wine and ignore stage 4. 
	3(b). Spot horrible mistakes.
	4. Open up your text editor on the original document and correct
    all the mistakes.
	5. Re-run jade and go back to stage 3.

This made two things abundantly clear to me. First: DocBook looks 
misleadingly like HTML. But unlike HTML you can't just open a browser 
on it and expect all the tags to have an effect. Most browsers (Netscape,
Lynx, GNOME's help browser) have no idea what to do with DocBook. The 
DocBook is much more like C source. You can edit it, read it, and write 
it. But it's much more useful to create something with it and then play 
with the results.

Second: when people talk about "DocBook", they are generally talking about 
a set of tools which includes the DocBook DTD, the tools to convert it and 
the stylesheets those tools use. Rather like hackers talk about "having C 
on a machine", when in fact they mean they have the compiler, the debugger, 
and (optionally) an editor which does snazzy indenting. They're just 
generally assumed to be part of the deal. Similarly, I tend to say 
"DocBook tools" when in fact I'm referring to half a dozen packages. 

What is DocBook?

DocBook is a Document Type Definition or DTD. There are tons of DTDs.
DocBook is a DTD which is designed for marking up technical documents.
Other DTDs are designed for marking up recipes, for marking up chemical
structures, for marking up information about material in archives and
for all manner of other things. 

If these are definitions, something has to describe how the definitions
are made. That something for DocBook was SGML: Standard Generalised
Markup Language. SGML isn't a coding language, and it isn't a language 
most people write in. It's more of a metalanguage: a language which is 
used to describe how DTDs are written. It's not new. It was around in
the 1980's and predecessors were around before that. All I understand
about SGML is that DTDs are defined according to it. Luckily, this
appears to be all you need to understand about it at first. 

Why is DocBook good?

Some good things about DocBook:

	(1) You can write one plain ASCII document and regenerate it in 
    multiple formats. You don't have to learn HTML, LaTeX, and how to
    create PDF files. (You do have to learn DocBook, of course..) 
	(2) You can give other people the document, and they can regenerate
    it in their preferred format. A sample document I have is 57kb as
    the DocBook source, 140kb as the PostScript output, and 94kb as a
    directory of HTML. There is definitely an argument here for sending
    people the DocBook version.
	(3) It is specifically designed to mark up techical documentation:
    if it's related to computers and software there's probably a tag for it.
	(4) The people who define DocBook are active and responsive.
    The KDE and GNOME Documentation Projects both wanted to be able to
    include PNG graphics in their documentation, and submitted a request
    for enhancement. Lo and behold, this is one of the improvements in the
    latest version of DocBook.
	(5) It is not a proprietary binary format. 'Nuff said.
	(6) It is increasingly becoming the standard. DocBook documentation
    is showing up all over the place, from the GNOME and KDE Documentation
    Projects (where it is used both for documenting the code for hackers
    and for writing the guides for users) to the FreeBSD Documentation 
    Project. It is even showing up in parts of the Linux kernel.
	(7) It is not -that- hard to learn.

Um. Number seven is a bit tenuous. Let's just say, "In comparison with
LaTeX".

So what do I need?

Okay. If you are running a recent distribution of Linux you will probably
not have too much trouble with this. Look for the following packages on
your system:

	(1) sgml-common
	(2) docbook
	(3) jade
	(4) stylesheets

You may also want jadetex (if you have the urge to create printer-friendly
formats) and psgml (if you like using Emacs as your editor). If you don't
have these, have a poke at first, your distributor's ftp site, and if
that fails, head to the canonical source of DocBook tools: 
ftp://sourceware.cygnus.com/pub/docbook-tools/docware/ or a mirror such as
ftp://unix.hensa.ac.uk/mirrors/sourceware.cygnus.com/pub/docbook-tools/docware/
If you go for the RPMs rather than the sources, note that most of the 
binary RPMs are in the /noarch directory, which is why the /i386 directory
looks a little empty.

The ease of installing these tools varies. I had no trouble with my
Red Hat Linux system. Friends had no trouble with their Debian GNU/Linux
systems (I believe most of them typed 'apt-get docbook'...) I have come 
across Slackware users who have run into problems, however. If you're
using Slack, you will probably need to stop and create a number of
directories by hand in order for the installation to proceed.

According to practically every other guide I have seen, the next thing
you need is to create a little document in DocBook. I spent ages trying
to do this, and then gave up for about a month. The problem is that
when you are writing a document, about 95% of it will be the document
body. The remaining 5% will be the document head containing information
about the document itself, and the very first line of it will be an
incomprehensible line which is the key to getting jade to recognise it.
So if you take this route, you start with the most difficult bit first.

My solution, and the suggestion I would make is: get a file that's
valid DocBook from somewhere first. There are tons on the net but in
case you don't have cheap net access, here's a file that -should- run
if you type it in. You don't have to use my style of indentation, but
any more than two spaces as an indentation means that further down the
document you are going to be starting things halfway across the screen.
I'm sorry about the first half, which is the first line and the article
information. They are tags you only need about five percent of the time.
But you can just type those in and forget about it. Better yet, save them
as a template file somewhere and cut and paste them in. 

Then proof-read carefully!

All the angle brackets, slash marks and quote marks must be exactly as
typed for now. Otherwise Bad Things will happen when you do this:

Assuming you called this sample.sgml, run 'db2html sample.sgml'
on it. You should see something like this, and nothing else.

$ db2html sample.sgml 
TMPDIR is DBTOHTML_OUTPUT_DIR15304
Using stylesheet: "/usr/lib/sgml/stylesheets/cygnus-both.dsl"
input file was called sample.sgml -- output will be in sample
working on ../sample.sgml
about to copy cascading stylesheet and admon graphics to temp dir
about to rename temporary directory to sample

