XSL · Grey Nicholson

The Twaddlebot has been unleashed

2004-06-07T18:40:00+00:00

Last night version 1.0 of The Twaddle went live. It uses arbitrary XML and XSLT to generate valid XHTML pages... offline.

The idea of uploading bare-bones articles and an XSLT template, allowing the browser to generate pages as they're required, was a no-go. But I managed to rig up the transformation offline, to be run as a batch.

Following the tradition of giving XML languages names that are barely-logical acronyms beginning with X, I call the language XTw, which stands for XML... Twaddle... something.

Here's how I worked the magic (borrowing liberally from a newsgroup posting I made on the subject):

This assumes: no programming experience, but enough computer savvy to create XML and XSL files to need transforming in the first place; and a Windows (XP) machine)

First off, you'll need Xalan, available from http://xml.apache.org/xalan-j/ (and the requisite Java runtime, which you probably already have)

The actual file I downloaded was http://apache.rmplc.co.uk/dist/xml/xalan-j/xalan-j-current-bin.tar.gz

There's also http://apache.rmplc.co.uk/dist/xml/xalan-j/xalan-j-current-bin.zip if you prefer a zip.

The version I got was 2.6.0 (the Java version).

Unzip Xalan into a folder. I used C:\Program Files\xalan-j_2_6_0

Now the code from http://evc-cit.info/cit041x/batchfiles.html#transform:

echo off java -cp h:\java\xmljar\xalan-j_2_5_1\bin\xml-apis.jar;h:\java\xmljar\xalan-j_2_5_1\bin\xercesImpl.jar;h:\java\xmljar\xalan-j_2_5_1\bin\xalan.jar;. org.apache.xalan.xslt.Process -IN %1 -XSL %2 -OUT %3 %4 %5 %6 %7 %8 %9

The only line break should be after echo off.

Copy this into a plain text editor (e.g. Notepad), and save it as filename.bat (I used ANSI encoding, if it matters)

You should now have an MS-DOS Batch File.

(Apparently some versions of Notepad append .txt to filenames, even if they contain a file extension. In these cases, quoting the filename - e.g. “filename.bat” - allegedly solves the problem)

You'll most likely have to modify the code to point to the actual locations of your Xalan installation and files.

I only plan on using one XSL stylesheet with multiple files; the input files will be filename.xml. The output files will be filename.htm and will be kept in the folder above the one where the input and XSL files are kept. So, I modified the code a little:

java -cp "c:\program files\xalan-j_2_6_0\bin\xml-apis.jar";"c:\program files\xalan-j_2_6_0\bin\xercesImpl.jar";"c:\program files\xalan-j_2_6_0\bin\xalan.jar";. org.apache.xalan.xslt.Process -IN %1.xml -XSL "c:\path\to\an\xsl\file\xsl.xml" -OUT ..\%1.htm

This should all be on one line. %1 in the code will be replaced by the first argument passed to the batch file, %2 by the second argument, etc. ..\ means up one folder. The quotation marks around the filenames cause them to be treated as one item, despite their containing spaces.

You can add @echo off (without quotes) in an empty line above, if you prefer not to have masses of textual output in the command console. e.g.:

@echo off java -cp "c:\...

echo off turns off the display of subsequent commands; @ hides the echo off command.

To perform the transformation, open a command console (Start > Run > "cmd") and navigate to the location of your XML, XSL and batch files, by typing

cd "c:\path\to\files"

(including the quotes)

For simplicity's sake, I've shoved everything in the same folder, and used absolute paths for the programs. You could probably also mess around with relative paths or the path environment variable, but I can't be bothered.

I ended up having to use HTML Tidy to contort the output into valid XHTML. My final batch file reads:

java -cp "c:\program files\xalan-j_2_6_0\bin\xml-apis.jar";"c:\program files\xalan-j_2_6_0\bin\xercesImpl.jar";"c:\program files\xalan-j_2_6_0\bin\xalan.jar";. org.apache.xalan.xslt.Process -IN %1.xtw -XSL "XTw2XHTML.xsl" -OUT ..\thetwaddle\%1.htm "C:\Program Files\HTMLTidy\tidy.exe" -q -m -c --show-warnings no --output-xml yes --output-xhtml yes -latin1 --doctype strict --tidy-mark no --wrap 0 --ascii-chars no --drop-proprietary-attributes yes --fix-bad-comments no ..\thetwaddle\%1.htm echo Done %1.

(Line breaks have been doubled for clarity.)

The input XML files are all labelled filename.xtw; the XSL stylesheet is XTw2XHTML.xsl, and the output files are cacked into the folder thetwaddle, a sibling of the folder where the batch file lives, and assigned a suffix of .htm.

Those options shown for Tidy are the result of trial and error, or rather, trial and testing and reading Tidy's Quick Reference - no warranty implied. The echo command prints out a message for each finished file.

This batch file is wrapped up in another one, which repeatedly calls the first, thus:

@echo off echo Transforming XTw into XHTML... call xtw2xhtml afile call xtw2xhtml otherfiles echo Done.

The text output is just to make the command console more interesting while the batch program is running. It also helps pinpoint any errors, such as typos, which show up as blobs of text in the command console.

The result of all this fiddling is that I can change pages' contents more easily; I've been able to, fairly easily, implement a few minor changes that would have taken effort before. The final product lives here.

In semi-related news, it turns out that PURLs such as purl.org/mooquackwooftweetmeow, without the trailing slash, are possible - it's just partial redirects that have to end with slashes. The Twaddle's now on PURLs, too - purl.org/thetwaddle - with or without the slash.

While uploading “Unleash The Twaddlebot!” (The Twaddle v1.0), I was reminded that we're approaching the 50-file limit; that's not including styles, which are kept in a separate account. This means we'll probably have to change hosts.

Fortunately, ntl provide 55 megabytes of space, so I'm planning to shift everything there. This shouldn't be too troublesome now that everything's on PURLs.

Adventures in XML

2004-04-27T17:40:00+00:00

I did manage to get the XML+XSL-based jiggery-pokery for The Twaddle working - quite nicely, actually. Getting the entire contents of the content field onto the page took a little bit of effort, as described on the mozillaZine forums.

I won't be implementing this on The Twaddle, though - for a start, Opera and KHTML don't like XSLT. And it's not half as accessible for non-standard browsers (relics, mobile devices, text browsers...) as plain, extraneous-menu-items-and-such-written-into-the-article XHTML is. Nonetheless, a working example is online for the time being.

Update: the real thing's gone live... sort of... so the prototype has been removed. Additional related blurb is contained in a later entry in this weblog.

There's a slight chance that I might implement an XML-driven article system on Mooquackwooftweetmeow, where I'm not too fussed about old and/or buggy browsers. The fact that mobile devices won't render the page is more of a concern.

Perhaps some way of pulling in external XHTML fragments could be handled in CSS3? Then again, why duplicate XSL functionality in CSS - small devices' browsers could just be taught to handle XSL.

One more Twaddle-related thing: thanks to Internet Explorer conditional comments (on which MSDN has an hilarious article), I'm now feeding IE users some nice propaganda in the foot of the front page:

You're using Internet Explorer?! You do realise that it's years out-of-date, and screws up most modern web pages, don't you? In fact it's screwing this one up right now and you don't even know it. Try a proper web browser instead.

Oh, and another tiny little piece of The Twaddle-related trivia: the version number on the front page is now in the title text of the copyright notice - it's tidier and it leaves room for a pointless codename.

Over to the Mooquackwooftweetmeow Weblog now, where, thanks to our old friend XML namespaces, and our newer friend the XSL copy-of element, proper links are now in use. I've gone back through the weblog and updated plain text URLs to be links. The more observant of you will have noticed that there have been a smattering of links throughout this post - that'll be the norm from now on.

The even more observant of you will have noticed line breaks as well. I'd use paragraphs but the XSL stylesheet inserts the content into a paragraph - I don't think the XHTML validator would like paragraphs within paragraphs (not that it'd like this Atom file at all...). And I don't think the site's CSS would like divs to hold the text, in place of paragraphs; it might - I just haven't looked at mqwtm's CSS in ages so I can't remember. Besides, line breaks are lighter on the markup than open-and-close <xhtml:p> tags.

And in a final twist of XMLish loveliness, I've chucked a few XHTML <code> tags in as well.

</epic>

Opera + XSL = Eugh

2004-04-23T14:16:00+00:00

Evidently Opera doesn't like XSL - this weblog shows up as a lot of plain text with the odd URL chucked in. The question is whether I care.

The Twaddle is more of a public offering than this weblog, so it matters a little more if it's inaccessible using Opera... but then how many readers of The Twaddle use Opera? I'd say few to none. (Checking the site stats for The Twaddle will probably show a few Opera hits - most of which are me).

IE + XML + XSL + XHTML + W3C = Get In!

2004-04-23T14:07:00+00:00

As a prelude to some major back-end renovation I'm planning for The Twaddle, I decided to see if I could get Internet Explorer 6 to display this XSL-ified weblog nicely, not accounting for IE-unsupported CSS (which is already taken care of at The Twaddle). Previously, IE displayed the DOCTYPE declaration as plain text at the top of the page; using strategic HTML commenting, I've managed to prevent it from doing so.

Actually, I bet simply removing the DOCTYPE declaration wouldn't affect either Gecko or IE's rendering of the page, as I think XML kicks both of them into standards mode anyway.

The next step is to try this with some of The Twaddle. And I'd probably best check Opera's effort, too.

Hurrah once more!

2004-02-19T14:20:00+00:00

That w3schools (http://www.w3schools.com/) is pretty decent. I've now concocted an XSL stylesheet for this feed, so visiting its URL in a (good) web browser should display it as a nice page.

I managed to cajole XML namespaces into doing what I want with a little help from a random blog entry (http://today.icantfocus.com/blog/archives/entries/000430/) by Christopher H. Laco, and his one-size-fits-all feed stylesheet. Those who are interested can have a gander at my resulting stylesheet (http://purl.org/mooquackwooftweetmeow/weblog.xsl.xml).

Next job: rig up a klip.

Hurrah!

2004-02-19T13:00:00+00:00

Well, then... this is an atom weblog. Why's it only in atom format? Everything on Mooquackwooftweetmeow is done the old-fashioned way - using the human brain, a plain-text editor, and no PHP, ASP, SQL or any other fanciness. And I don't want to have to copy every entry out into an XHTML page. I'm thinking of having a bash at some XSLT, to automatically generate a fancy front for the weblog; I tried it with the RSS feed, but didn't quite manage it satisfactorily; perhaps my standards are just too high (after all, I am using a free web host).