txt2page

txt2page (txt2page.tar.bz2) is a bash/vim script that converts free-form plain text into HTML.

  % txt2page filename.txt

creates filename.html. Here, txt2page is assumed already placed in your PATH, and % is your Un*x command line.

Once you are beyond the most rudimentary situations, you will feel the need for some markup. I’ve chosen to use a troff syntax so that the document, with the help of some troff macros, will allow conversion to decent printout. Suitable troff (actually groff) macro files are included in the troff2page distribution (see below), but if you are a troff hand, you can write your own macros to taste.

Sectioning

.TH introduces an overall title, while .SH and .SS introduce sections and subsections respectively. .SH and .SS use man syntax, so if you’re using ms to convert, remember to shadow the ms definition with a man-compatible definition.

  .TH ThisIsaTitle
  .SH This Is a Section
  .SS This Is a Subsection

To keep compatibility with man, .TH sets its first argument as the title, ignoring any subsequent args. If you need multiple words in the title, enclose them in double quotes.

  .TH "This Is a Multiword Title"

txt2page is clever about converting single and double quotes to their curved Unicode versions. It will leave the enclosing double quotes after .TH alone. Oh, and double hyphens are converted to em dashes — but only when appropriate. i-- and --version retain the separated hyphens.

.** creates a section break with no header — it produces an ornament to mark the break. This paragraph was introduced by .**.

Links

One thing you *don’t* need to worry about marking up are URLs, which get recognized and converted to links automatically. URLs that are relative pathnames can be recognized too, provided they are prefixed with ./, which is always correct and very little additional baggage. I think it’s OK that these URLs and pathnames get rendered verbatim in troff. It is OK to enclose ./-style relative pathnames with double quotes, to deal with cases where the pathname contains spaces.

  .so filename

interpolates the contents of filename in-place, as in troff. However,

  .so filename Some Additional Description

merely *links* to (the HTML version of) filename with Some Additional Description serving as link text. In troff of course, both syntaxes cause sourcing of filename. I had it this way because it’s a way to coax a Table of Contents into the HTML without additional markup.

.NAV ... anywhere in the file will insert a navigation bar at both the head and foot of the HTML. .NAV can have one, two, three, or even four arguments. The first argument is deemed a next page. If a second argument is present, the first argument is the previous page and the second is the next page. A third argument, if any, is the table of contents page. And a fourth, if any, is the index page.

.FS and .FE enclose footnote text.1

Paragraphs, lines, images

Blank lines create paragraph separations. This assumes a suitable definition for the blank-line trap .blm in troff, preferably one that is intelligent about when it inserts a parindent.

As in troff, .nf and .fi can be used to enclose unfilled text, but only the leading spaces are respected. .uu is an almost synonym for .nf — it is needed so that troff can suppress a leading parindent. .~ is like .uu but doesn’t need a closing .fi — a blank line automatically terminates it.

Also as in troff, lines with leading spaces aren’t broken, so that’s a nice way to write unmarked-up verse. However, unless you plan on using the .lsm feature which is only available in recent groffs that haven’t propagated yet to the linux and cygwin packages, the first line may find an unsightly parindent creeping in. The situation is not too bad: Even in older groffs, it is possible to easily avoid the parindent if the unmarked-up verse starts at the head of the document, or after a section header, or after another block of verse.

.EX and .EE enclose monospace unfilled text, like code fragments.

.JPEG -pos imgfile can be used to include images. -pos, which governs alignment, is optional, and can be L, C or R. If not provided, -pos is assumed to be -C. The image doesn’t have to be a jpeg: anything that can go into an HTML <IMG> is fine.

CSS

P lain-text documents make for plain HTML — not necessarily. CSS is a great way to add typographic panache to your document without cluttering the document itself with markup. txt2page will pick up one css file from the same directory. If default.css exists, that’s the one chosen. Otherwise, the alphabetically least css filename is chosen. (I obviously wanted to avoid having to specify the css filename in the document or as a txt2page argument or via an environment variable. Simplicity!)

If you want to have multiple css files, simply have the main one @import the rest. Remember that the @import statements have to be the first in a css file. I didn’t know this starting out and was quite puzzled when things didn’t work!

Some css files are included in the distribution. But of course you’ll want your own. They’re easy to create with a little inspection and infinitely tweakable outside your content to make it as cute as you want it. You don’t even need to invoke txt2page when you make CSS changes, as long as the CSS filename used remains the same. A good place to get cloud-based fonts is Google webfonts.

But I Want More...

txt2page is a modest tool and expects pretty straightforward plain text: essentially a single-column stack of prose, verse, images, however long. To convert more elaborate troff source, try the more powerful ../troff2page, which requires Common Lisp. If you are not a troffglodyte, but rather a TeXnician, try ../tex2page.

I essentially wrote txt2page to create simple pages on hosts that had basic Un*x but no Common Lisp, and where the pages didn’t require the full power of troff2page anyway.

(Last modified: 2011-12-31)


1 Use an explicit symbol or number after .FS. In the body text, use <backslash>*[^ sym] to refer to the footnote. (The <backslash> is the single backslash character.) The syntax is complicated to keep troff happy, but to make a virtue out of a necessity, it is usefully distinctive when you’re editing the document in your text editor.