From hobbit@lxorguk.ukuu.org.uk  Thu Jul 27 13:13:34 2000
Return-Path: <hobbit@lxorguk.ukuu.org.uk>
Received: from the-village.bc.nu (lightning [194.168.151.1])
	by aloss.ukuu.org.uk (8.9.3/8.9.3) with ESMTP id NAA12001
	for <hobbit@aloss.ukuu.org.uk>; Thu, 27 Jul 2000 13:13:33 +0100
Received: from hobbit by the-village.bc.nu with local (Exim 2.12 #1)
	id 13HmXD-0002Kq-00; Thu, 27 Jul 2000 12:13:07 +0000
Date: Thu, 27 Jul 2000 12:13:06 +0000
From: Telsa Gwynne <hobbit@lightning.swansea.linux.org.uk>
To: Ailbhe Leamy <ailbhe@lspace.org>
Subject: Re: Your DocBook experience
Message-ID: <20000727121305.C31917@lightning.swansea.linux.org.uk>
References: <20000727101444.B549@ossifrage.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
X-Mailer: Mutt 0.95.4us
In-Reply-To: <20000727101444.B549@ossifrage.net>; from Ailbhe Leamy on Thu, Jul 27, 2000 at 10:14:44AM +0100
Sender: Telsa Gwynne <hobbit@lxorguk.ukuu.org.uk>
Status: RO
Content-Length: 15180
Lines: 333

On Thu, Jul 27, 2000 at 10:14:44AM +0100 or thereabouts, Ailbhe Leamy wrote:
> Hi Telsa
> 
> I'm sorry, this is a helldesk thing... feel free to answer rudely.

Laugh. You're in luck: you caught me just as I was completing dealing
with the email backlog from a week away and I'm in explanation mode.

> I'm about to start learning DocBook, have the relevant O'Reilly, and
> wondered whether there was anything you thought I ought to know, since
> your mutt and procmail stuff was all so sensible...

Argh :)

I keep meaning to write this up. Quick brain dump for you:

DocBook woes come in two varieties:

Installing the tools.
Using the tools.

Installing: 

o RH (cos I know it best)
If you have a very recent Red Hat (RH 6.2) you will find the tools on
the CD or in the 6.2 ftp directory. They install and work fine on a
6.0 or 6.1 system. Otherwise, go to 
ftp://sourceware.cygnus.com/docware/docbook-tools (or some such) and 
grab the rpms. Either the source rpms, which compile and install fine, 
or the binaries. Note that all but one (jade) are in the /noarch directory, 
_not_ the /i386 directory.

o Debian (cos I intend to learn)
I believe that apt-get install may grab the necessaries. Debian
lays things (like files and contents) out slightly differently. 
One package is also called cygnus-stylesheets in Debian instead of 
stylesheets (in RH and others).

o Slackware (cos I've encountered people stuck here before)
This is a sod. Get the tarballs from the sourceware site above, but
in order to install them, you will have to create every necessary
directory, because tar won't deal with empty directories, only
directories with something in them. (There may well be a tar option
to overcome this, but there is a reason I stick to packages rather
than tarballs when possible, and that reason is that I already have
too many options to remember for tar...)

o Recent Mandrakes have the tools on them already. Pretty much as on
Red Hat.

The packges in question:
    * sgml-common
	This is the package containing a collection of useful DTDs
	and entities. You need this. DocBook is an example of a DTD.
	(But I don't think it's included in sgml-common!) Entities are 
	exactly as in HTML: if you want to use the signs < > or &, you 
	have to use entities, because those signs are used within DocBook 
	so have to be produced some other way. (They are "&lt;" "&gt;" and 
	"&amp;" respectively.) There are shitloads of entities. They're
	quite fun when you get the hang of them: you can actually spell
	my sister's name (Sin) correctly, like this: Si&acirc;n

    * docbook
	This is the package containing the DocBook DTD itself. The
	version I have has only DocBook 3.0 and 3.1, which are defined
	in terms of SGML. DocBook 4.0 has come out: this is defined in
	terms of both SGML and XML, should you be playing buzzword
	bingo. Subsequent major number version increases will be XML
	only, but they're a good way off. If you write _good_ DocBook,
	it won't matter. All you'll need to change is the DTD at the
	top. See below :)

    * stylesheets (cygnus-stylesheets in Debian)
	When you have written your DocBook article, you want to turn it
	into something useful. Commonly, this is HTML and/or PostScript
	format. The stylesheet is what the parser looks at to see how
	DocBook tags should be rendered in various formats. On rpm-
	using systems, this package also includes a bunch of handy
	scripts to invoke the parser (jade) in easy ways: db2html is
	used to create DocBook->HTML conversions, db2ps is used to create
	DocBook->PostScript conversions, etc. (On Debian, these scripts
	are packaged in one of the other packages but I forget which.)

    * jade
	This is what takes the docbook file and turns it into different
	formats. It looks at a stylesheet to do this. It has a default
	stylesheet, or you can tell it to use different ones. It has
	notoriously complex syntax and notoriously incomplete documentation
	(the _only_ useful page in the docs is 
	/usr/doc/jade-<version>/jadedoc/jade.htm which is scary as shit)
	and is written in notoriously difficult C. It is, however, the 
	first (and currently only) free tool we have for this, and Gets
	Things Right (very hard, according to people I've spoken to who
	are into all this stuff). Replacements and/or enhancements are
	rumoured, but not yet here. It does the conversion to HTML but
	not to PS: see jadetex for that. Jade creates valid but
	unreadable-to-humans HTML: it won't wreck your terminal or
	anything, it's just horribly formatted. It's well worth getting
	the latest version of HTML-Tidy to tidy up the results: see
	http://www.w3.org/People/Raggett/tidy/ for it. Only the latest
	versions will deal with the HTML version that jade uses.

    * jadetex
	This is required only if you want to create PostScript files
	or other files that require going through a postscripty sort
	of stage. It's not hard to install and it's worth having but
	it's not necessary.

    * psgml
	This is an optional package. It's a SGML-editing mode for EMACS,
	so you would use it when editing a DocBook file. If you use Emacs, 
	you will definitely want it: it includes highlighting, prompts on 
	what tags are allowable where, and validation. If you do not use 
	Emacs, you may find yourself tempted to learn it just to make
	use of this tool. It really is impressive. It was written by
	Norman Walsh, I think. He's one of the DocBook world biggies
	and knows what he's doing.


Using the little bastards:

The main things to know:

* Mark up content and forget about appearance.

	If you don't like the way the HTML comes out when you generate
	it, then blame (and modify) the stylesheets. It's not a DocBook
	issue. Don't think "that form of list takes up less screen space"
	and alter what kind of list you use in DocBook, for example. You
	can't guarantee what other stylesheets will do with the one you
	think is "prettier"; and you're describing the content incorrectly
	instead. It's the content you're marking up. Not the appearance.
	(Also, viewing the same HTML in netscape, Mozilla, Lynx, links,
	w3m, konqueror and the gnome-help-browser teaches you to forget
	about worrying about rendering anyway cos they all do it 
	differently...)

* Forget about GUI editors.

	This comes up _all_ the time. There are currently no good
	usable free GUI/WYSIWYG editors for editing DocBook. There are
	very good arguments which suggest that you end up in trouble
	if you try to use them. You start marking up on appearance, not
	content. See above. That said, there is interest in getting
	Conglomerate (http://www.conglomerate.org) development restarted
	and moving its focus to DocBook. There is also a rumour that
	Lyx (which as I recall requires a non-free library to run) will
	deal with DocBook. I got as far as not finding the source to
	one of the libraries and gave up and didn't bother installing
	it. Basically, your favourite text editor is your friend. You
	have enough to learn without trying to learn a new editor as
	well.

* DocBook is huge.

	There's over three hundred tags already (and I'm seriously
	considering a request for enhancement which involves adding
	at least one more). You will kill yourself if you treat it
	as a French vocabulary lesson and try to learn them all at
	once. Unless you're writing a lot of differently-focused
	docs, you won't need to. I use the GUIMenu and GUMenuItem
	tags all the time when I'm writing GNOME docs for users but
	I don't need to know about <function> <constant> and
	<programlisting> for that. On the other hand, when I've been
	doing kernel docs (well, checking other peoples :)), I need
	those last three a lot and there is not a gui-thing tag in
	sight... Expect to discover new tags and think "Damn, I 
	could have used that in the last doc!" all the time. We
	all do it :)
	
* Don't be tempted to use the SGML-specific tricks in a DocBook file.

	If and when you switch to XML they will bite you, and in addition,
	they make it much harder for other people to read later.

	Specifically:
	
	Don't use end-tag minimisation. That's when you do
	<title>Some title</> instead of <title>Some title</title>.
	I once met a doc which had seven </>s in a row -- and it
	was generating errors. I had no idea whether there was one
	too many, one too few, or what. It's evil. Fill in the end-tags :)

	Always include end tags. You can get away with no endtags in
	one or two very specific cases, such as <xref> for cross-
	referencing. But XML users will hate you.

	When you use id attributes (like the "id=" thing in
	<sect1 id="intro">) put the id all in lower case and put
	double-quotes around it.

	There's more about this in the DocBook book in one of the
	appendices.

* Make templates. I have a file which just says this:

	<!DOCTYPE BOOK PUBLIC "-//Davenport//DTD DocBook V3.0//EN" []>
	<!DOCTYPE Article PUBLIC "-//OASIS//DTD DocBook V3.1//EN"[]>
	<!DOCTYPE Book PUBLIC "-//GNOME//DTD DocBook PNG Variant V1.0//EN" []>
	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
	"http://www.w3.org/TR/REC-html40/loose.dtd">

	The first is a DocBook 3.0 file in book format.
	The second is a DocBook 3.1 file in article format.
	The third is a GNOME-specific thing I need a lot.
	The final one is an HTML DTD! It goes at the top of all my web pages
	these days, so I can get them through the validator at w3.org :)
	I just cut and paste from that into my new doc instead of trying
	to remember what a DTD looks like.

	I have another file which is called artheader.sgml which is a template
	of all the tags I need in an article header so I can just fill in the
	content. I have another file which is a template of the forms of
	lists I use the most often.

	This doesn't save you all the work, but it saves on a lot of
	repetitive typing. Just grab the relevant files when you need
	them, stuff them into your doc, and fill in the blanks.

* Know your jade errors. 
	When you invoke jade, whether by using jade directly (ugh)
	or by db2html (the neat little script), you will _hope_ to see
	something like this:

	$ db2html myfile.sgml 
        TMPDIR is DBTOHTML_OUTPUT_DIR11686
        Using stylesheet: "/usr/lib/sgml/stylesheets/cygnus-both.dsl"
        input file was called myfile.sgml -- output will be in
        myfile
        working on ../myfile.sgml
        about to copy cascading stylesheets and admon graphics to temp dir
        about to rename temporary directory to myfile
        $

	More likely, you'll get a series of errors :)
	If you get a _shitload_ of "element PARA not recognised", "element
	TITLE not recognised" scrolling up, go back to the first error and
	check it doesn't say it can't find a DTD and/or stylesheet for the
	document. Screens and screens of "element FOO not recognised" 
	usually means something's installed in the wrong place or that
	jade can't find it.

	If you get "element PARA not allowed here" or something, it probably
	means you left an end tag off. Instead of
	<para>Blah blah</para>
        <para>Foo</para>
        ...you probably have typoed and done "<para>" instead of "</para>"
	<para>Blah blah<para>
        <para>Foo</para>
        You can't nest paragraphs, so the meant-to-be-end-tag is being
	picked on.

	If you get something about vi delimiters in an id attribute, it
	(bizarrely!) probably means that you did a link to a web or ftp
	site and missed off a " mark at the end of it; ie you have put
	<ulink type="http" url="http://www.gnome.org>GNOME site</ulink>
        instead of
        <ulink type="http" url="http://www.gnome.org">GNOME site</ulink>.
	way back when in your doc. Then the next time you used an id
	attribute, probably in a <sect1 id="foo"> or something, it gets
	confused, thinks the start of that id attribute is the end of
	some great big long url, looks at < and > marks within this
	purported url, and blows its little head up.

* Learn through theft :)

	A note about that DocBook: the Definitive Guide book.
	I have this too, and it's now indispensible. But there's no
	way you'll learn to write DocBook from it. It's a reference
	in every sense of the word. I mostly use it for checking
	how to use particular tags or for hunting for entities for
	non-[A-Z][a-z] characters. The way I learned is the way
	everyone I know learned: find a bunch of docs in clear
	DocBook, look at them, start editing them to say what you
	want, and you'll get somewhere. Staring at a blank file
	with a DTD at the top and thinking "Now what?" isn't going
	to work :) Paul "Rusty" Russell sent some kernel docs to
	Alan with the comment that they were written in "cargo
	cult DocBook": he'd looked at other docs and tried things
	that looked the same and prayed. It's an excellent description
	and it's all too true! We actually wrote template docs for
	GNOME documentation because it was so painful to get 
	started.

Hmm. What else? Resources. Hmm.

The SelfDocBook is worth a look just for the concept. It's an intro
to DocBook written in DocBook which uses itself as its example.
* http://people.redhat.com/twaugh/docbook/selfdocbook/

Deb Richardson did a great little intro which she needs encouragement
to finish :) 
* http://www.oswg.org/static/oswg-nightly/oswg/en_US.ISO_8859-1/articles/OSWG-Catalog/oswg-catalog/documentation.html

The FreeBSD Docs folk have a great intro to DocBook. Like most docs
projects, they use a slightly enhanced form of DocBook but they have
clearly identified which parts are standard and which are not.
* http://www.freebsd.org/docproj/sgml.html
* http://www.freebsd.org/tutorials/docproj-primer/
	
The canonical intro is still that of Mark Galassi, "Get Going with DocBook".
* http://nis-www.lanl.gov/~rosalia/mydocs/docbook-intro/docbook-intro.html

The GNOME docs project has some stuff but how useful it is for non-GNOME
stuff I dunno. I've been involved in it for a while, and as we all learn,
we get more and more complicated and ambitious.
* http://www.gnome.org/gdp/handbook.html

The KDE docs folk have an _excellent_ introduction with examples (I'm
not too sure about Konqui's favourite food in their sample lists, though!)
and I have -- damnit! -- just lost the URL. If you find it, let me know!

Mailing lists exist in abundance, all with different foci. I am on
a _pile_ of them. I shall try and work out which are any use. Most
are not, for me: there is a heavy non-free and proprietary tool bias on
practically all of them except the Open Source Writers Group one
(surprise!) (http://www.oswg.org/oswg/ will have the links.) They
might help you, but I try very hard to use only free software, so
I'm a bit stuck. FrameMaker and its ilk are of no use to me :)

Phew. My hands hurt now, so I shall take it as a sign to get off the
computer.

I think that's it. And this time, I am keeping a copy of this, and I
_shall_ write it up. I get asked about DocBook a lot, and really I
dunno much, I am just learning. Still...

Telsa
-- 
 (o_         Telsa Gwynne - hobbit@lxorguk.ukuu.org.uk              _o)
 //\    Mike Scher: "What is the sound of one route flapping?"      /\\
 V_/_      Ron Echeverri: "Call Sprintlink again, dammit."         _\_V
             I *object* to unsolicited commercial email.

