Thursday, December 31, 2009

XPath Axis Selectors Implemented in JavaScript

Here is a quick implementation of the XPath axis selectors in JavaScript. Each function accepts a context node as a parameter, a filtering function, and a stop function. The two function arguments are optional and may be omitted.

If the filter function is present, it will be executed for each node encountered along the requested axis. It must return true for the node to be included in the output.

If the stop function is present, it will also be executed for each node encountered. When and if it returns true, the search along the axis will stop, and whatever nodes have been accumulated up to (and including) that point will be returned.

Here it is for download. Here is a minified version, only 2k!

This would be used like so:

var node = document.getElementById('myTable'), results;

// Find all descendant nodes
results = AXIS.descendant(node);

// Find all descendant text nodes
results = AXIS.descendant(node, function (n) {
  return (n.nodeType === 3);
});

// Find the closest FORM ancestor:
function isForm(n) {
  return (n.nodeName.toLowerCase() === 'form');
}
results = AXIS.ancestor(node, isForm, isForm)[0];

// Locate all H1 items between this node and the next table:
results = AXIS.following(node, function (n) {
  return (n.nodeName.toLowerCase() === 'h1');
}, function (n) {
  return (n.nodeName.toLowerCase() === 'table');
});

These are specifically written to operate on nodes, not elements. In other words, text nodes will be included as potential return values. This is great for me, as my first use for these is to assist in determining what text the user has selected. If the inclusion of text nodes were not a requirement, then one might consider optimizing the "descendant" axis to use querySelectorAll('*') instead--at least, in modern browsers.

This should work in all browsers. I did run into one snag with IE6. Apparently, if you have a <base> tag in the source, then the resulting tree structure ends up looking something like this:

<html>
    <head>
        <base>
            <body>...</body> <-- Same body element!
        </base>
    </head>
    <body>...</body>         <-- Same body element!
</html>

That <body> tag there is not a duplicate; it's really the same element, just included in two places in the DOM. It is both a descendant and a sibling of the <head> element. This causes an infinite loop when crawling the "following" axis, as the code crawls out of the <body> into the <base>, then into the <head>, then enters the <body> again.

Conditional compilation is used to fix this specifically for IE6, so the performance of other browsers should not be affected.

Monday, December 7, 2009

Flattening recursive XML data in SQL Server 2005

I recently had to flatten an XML data set that looked something like this:


<topic id="1" name="planets">
  <topicList>
    <topic id="2" name="jupiter">
      <topicList>
        <topic id="3" name="ganymede" />
        <topic id="18" name="callisto" />
        <topic id="92" name="io">
          <topicList>
            <topic id="21" name="europa" />
          </topicList>
        </topic>
      </topicList>
    </topic>
    <topic id="7" name="saturn">
      <topicList>
        <topic id="11" name="titan" />
      </topicList>
    </topic>
  </topicList>
</topic>

I needed to produce a result set that preserved the parent/child relationship between each topic. In other words, I needed to know that Callisto belonged to Jupiter when I finished. At first, I thought this was clearly a job for recursion. I envisioned leaping deftly from each topic to each of its children in turn. Indeed, with SQL Server 2005 and common table expressions, this is quite possible:


Declare @xml XML;
Set @xml = '.... (see above) ....';
With topics (id, parent, name, children)
As (
  Select
    @xml.value('/topic[1]/@id', 'integer') As id,
    Cast(null As integer) As parent,
    @xml.value('/topic[1]/@name', 'varchar(50)') As name,
    @xml.query('/topic/topicList') As children

  Union All

  Select
    child.node.value('@id', 'integer') As id,
    topics.id As parent,
    child.node.value('@name', 'varchar(50)') As name,
    child.node.query('topicList') As children
  From topics
  Cross Apply topics.children.nodes('/topicList/topic') As child(node))
Select id, parent, name
From topics;

As with all recursive CTEs, this one is broken into two parts. The rows returned by the first half are used as starting points for the recursion in the second half. In this case, the first half returns only a single row:


Select
  @xml.value('/topic[1]/@id', 'integer') As id,
  Cast(null As integer) As parent,
  @xml.value('/topic[1]/@name', 'varchar(50)') As name,
  @xml.query('/topic/topicList') As children

Result:
idparentnamechildren
1NULLplanets{xml}

The second half handles the recursion part. The secret here is in the final column. Each topic row includes a reference to its own children. The recursive portion of the CTE works on each of these children, one by one. Each child further includes a reference to its own children, and so the cycle continues. Thus, each recursion builds on the rows produced by the previous recursion.


Results:
cycleidparentname
11NULLplanets
221jupiter
271saturn
3182callisto
332ganymede
3922io
3117titan
42192europa

As interesting as this is, it is not terribly performant. But fear not: a better solution exists! Recall that each XML node contains information concerning its position within its document; it has references to its children, siblings, and parent nodes. Rather than trying to recursively walk the XML tree, then, it is possible to simply get a list of all "topic" nodes and derive the parent information from the node itself. This results in a much simpler query:


Select
  topics.topic.value('@id', 'integer') As id,
  topics.topic.value('../../@id', 'integer') As parent,
  topics.topic.value('@name', 'varchar(50)') As name
From @xml.nodes('//topic') As topics(topic)

Yup, that's all. The @xml.nodes('//topic') finds every <topic/> node, no matter how deeply nested it is. Then the parent id is found simply by looking at each topic's ancestors: topics.topic.value('../../@id', 'integer').

Quick, easy, and simple!