Blogging an XML-driven DMS, III

The previous posts were a high-level overview of the process I used to work through my design and coding, in a tutorial style, though not like the strange, noddy, toy tutorials most people start out with (SitePoint example). However, a few of the features I have implemented are novel and more are to come, so this the next posts will be more descriptive, along the lines of descriptions of some simple testing, thinking, and research. This note is about:

Foreign namespaces in Atom

Atom is the premier feed format, in my opinion (that is, the Atom Syndication Format; Atom is bundles this and the related Atom Publishing Protocol together). The specification is very good and flexible. Content blocks come in two sorts: textual, and generic. Textual blocks (like the title) require the content to be either plain text, escaped HTML (run away) or XHTML. That is good. General blocks can contain (or reference externally) these, or any other sort of content as described by a MIME-type. SVG and MathML you would naturally think should drop straight into this model.

The other object of investigation is Google Reader. Reader is more than just a feed reader; it actually covers an entire engine. Most feed reading applications or services are very basic, letting you specify in some way a list of feeds to fetch, routinely getting updated copies of those, and displaying items from those feeds to you. The Reader platform is very much richer than that. Google searches, indexes, and stores essentially all syndicated content on the internet in its vast content eating monster. Every feed entry on the internet is cleverly assigned a new Google identifier and stored by Google. Then, whenever that entry is republished, mashed up, or whatever, Google tracks the item however it arrives in different people’s inboxes. So, if I and a friend both comment on an item in a feed we both happen to follow, the item is actually stored, tagged, and tracked internally by Google so it can merge the items together and let us each see the other’s comment. Very clever. There is a rough spec for the API to hook into this (offical spec; decompiled notes by Niall Kennedy). So, in conclusion, rather coolly, for every feed on the internet there is a corresponding one at<url-escaped feed URL> which re-publishes the feed according to Google’s information with its tracking codes and Reader’s Atom extensions in the xmlns:gr="" namespace (on feeds for shared folders and so on, comments and other additions are placed in this namespace, using <gr:annotation> elements for example).

My investigation then was to find out what has been done so far on support for foreign namespaces in feeds by browsers, and what Reader’s behaviour is.

Browsers turned out to be a flop. IE so far has no support for XML (coming in IE9), and I was not expecting any luck (SVG is not turned on in IE8 I believe, and there is no MathML support). Chrome still does not have the faintest idea what to do with feeds, and in general beyond invoking the right parser is thoroughly useless with XML. It does not position itself as a general XML engine, nor does it really deliver.

Mozilla is an XML chewing beast, and I had expected it to be brilliant. Unfortunately, it totally fails to display any non-textual content in Atom feeds using various attempts with syntax from the spec and made up. I have started poking around in the source code, and will work out shortly what needs to be fixed or filed. To report on what I have tried, Firefox 4 will take the feed fine, but drop externally referenced content, and will not decode image content (using the spec’s PNG examples). For XHTML, basic support using @type='xhtml' works by flattening any non-XHTML namespaced content to its string-value. Using MIME types does not help (e.g. asking it to treat the contents as application/xhtml+xml does not as expected behave like <object type='application/xhtml+xml' data='...'/>), nor does there seem to be any way to display SVG at all in a feed even without the question of embedding it in XHTML. This is very disappointing, and Google and Bugzilla do not turn up much.

My point is that while the spec writers have ruminated endlessly on this (discussions: ContentProblems, EscapedHTMLDiscussion, etc.), the browsers do not turn up much. In terms of what the ‘correct’ syntax is, Sam Ruby is guaranteed to be modelling right Atom use, and his feed uses the syntax I first tried. The lack of support even by Opera does make me wonder though whether much more work is needed on the client side to support these things.

If the browsers will not handle foreign content in feeds, what about Reader? No luck at all. I fed it some test feeds (sorry Daniel for messing about by mistake with your lone subscription), and even one gets projected down to escaped HTML. No luck at all. Basically, if I continue to use Reader as my micro-blog for links and notes, I cannot use the Google platform to host my aggregated feeds. On the other hand, Reader actually supports no less it seems than browsers and readers currently handle, so this is really a future worry.

Future work then will be investigating what needs to be done to make Firefox 4 more stupid. Instead of dumping namespaces it does not understand, it should just copy them straight over, and blindly map content into object tags to use the great handling it already has for foreign content to kick in.