txt2page (txt2page.tar.bz2) is a bash/vim script that converts
free-form plain text into HTML.
% txt2page filename.txt
creates filename.html. Here, txt2page is assumed already placed
in your PATH, and % is your Un*x command line.
Once you are beyond the most rudimentary situations, you will feel the need for some markup. I’ve chosen to use a troff syntax so that the document, with the help of some troff macros, will allow conversion to decent printout. Suitable troff (actually groff) macro files are included in the troff2page distribution (see below), but if you are a troff hand, you can write your own macros to taste.
.TH introduces an overall title, while .SH and .SS
introduce sections and subsections respectively. .SH and .SS
use man syntax, so if you’re using ms to convert, remember to
shadow the ms definition with a man-compatible definition.
.TH ThisIsaTitle
.SH This Is a Section
.SS This Is a Subsection
To keep compatibility with man, .TH sets its first argument as
the title, ignoring any subsequent args. If you need
multiple words in the title, enclose them in double quotes.
.TH "This Is a Multiword Title"
txt2page is clever about converting single and double quotes to
their curved Unicode versions. It will leave the enclosing
double quotes after .TH alone.
Oh, and double hyphens are converted to
em dashes — but only when appropriate. i-- and --version retain the
separated hyphens.
.** creates a section break with
no header — it produces an ornament to mark the break. This paragraph was introduced by .**.
One thing you *don’t* need to worry about marking up are URLs, which get
recognized and converted to links automatically. URLs that are
relative pathnames can be recognized too, provided they are
prefixed with ./, which is always correct and very little
additional baggage. I think it’s OK that these URLs and
pathnames get rendered verbatim in troff. It is OK to enclose
./-style relative pathnames with double quotes, to deal with
cases where the pathname contains spaces.
.so filename
interpolates the contents of filename in-place, as in troff. However,
.so filename Some Additional Description
merely *links* to
(the HTML version of) filename
with Some Additional Description
serving as link text. In troff of course, both syntaxes cause
sourcing of filename. I had it this way because it’s a way to
coax a Table of Contents into the HTML without additional
markup.
.NAV ... anywhere in the file will insert a navigation bar at
both the head and foot of the HTML. .NAV can have one, two,
three, or even four arguments. The first argument is deemed a
next page. If a second argument is present, the first argument
is the previous page and the second is the next page. A third
argument,
if any, is the table of contents page. And a fourth, if any,
is the index page.
.FS and .FE enclose footnote text.1
Blank lines create paragraph separations. This assumes a
suitable definition for the blank-line trap .blm in troff,
preferably one that is intelligent about when it inserts a
parindent.
As in troff, .nf and .fi can be used to enclose unfilled text, but only
the leading spaces are respected. .uu is an almost synonym for
.nf — it is needed so that troff can suppress a leading
parindent. .~ is like .uu but doesn’t need a closing .fi
— a blank line automatically terminates it.
Also as in troff, lines with leading spaces aren’t broken, so
that’s a nice way to write unmarked-up verse. However, unless you
plan on using the .lsm feature which is only available in
recent groffs that haven’t propagated yet to the linux and cygwin
packages, the first line may find an unsightly parindent creeping
in. The situation is not too bad: Even in older groffs, it is
possible to easily avoid the parindent if the unmarked-up verse starts at the
head of the document, or after a section header, or after another
block of verse.
.EX and .EE enclose monospace unfilled text, like code
fragments.
.JPEG -pos imgfile can be used to include images. -pos, which
governs alignment, is optional, and can be L, C or R. If not
provided, -pos is assumed to be -C. The image doesn’t have to be
a jpeg: anything that can go into an HTML <IMG> is fine.
P lain-text documents make for plain HTML — not necessarily. CSS is a great way to add typographic panache to your document without cluttering the document itself with markup. txt2page will pick up one css file from the same directory. If default.css exists, that’s the one chosen. Otherwise, the alphabetically least css filename is chosen. (I obviously wanted to avoid having to specify the css filename in the document or as a txt2page argument or via an environment variable. Simplicity!)
If you want to have multiple css files, simply have the main one
@import the rest. Remember that the @import statements have to
be the first in a css file. I didn’t know this starting out and
was quite puzzled when things didn’t work!
Some css files are included in the distribution. But of course you’ll want your own. They’re easy to create with a little inspection and infinitely tweakable outside your content to make it as cute as you want it. You don’t even need to invoke txt2page when you make CSS changes, as long as the CSS filename used remains the same. A good place to get cloud-based fonts is Google webfonts.
txt2page is a modest tool and expects pretty straightforward
plain text: essentially a single-column stack of prose, verse,
images, however long. To convert more elaborate troff source,
try the more powerful ../troff2page, which requires Common
Lisp. If you are not a troffglodyte, but rather a TeXnician, try
../tex2page.
I essentially wrote txt2page to create simple pages on hosts that had basic Un*x but no Common Lisp, and where the pages didn’t require the full power of troff2page anyway.
(Last modified: 2011-12-31)
1
Use an explicit symbol or number after .FS. In the body text,
use <backslash>*[^ sym] to refer to the
footnote. (The <backslash> is the single backslash character.) The syntax is complicated to keep troff happy, but to
make a virtue out of a necessity, it is usefully distinctive when
you’re editing the document in your text editor.