I recently had to flatten an XML data set that looked something like this:
<topic id="1" name="planets"> <topicList> <topic id="2" name="jupiter"> <topicList> <topic id="3" name="ganymede" /> <topic id="18" name="callisto" /> <topic id="92" name="io"> <topicList> <topic id="21" name="europa" /> </topicList> </topic> </topicList> </topic> <topic id="7" name="saturn"> <topicList> <topic id="11" name="titan" /> </topicList> </topic> </topicList> </topic>
I needed to produce a result set that preserved the parent/child relationship between each topic. In other words, I needed to know that Callisto belonged to Jupiter when I finished. At first, I thought this was clearly a job for recursion. I envisioned leaping deftly from each topic to each of its children in turn. Indeed, with SQL Server 2005 and common table expressions, this is quite possible:
Declare @xml XML; Set @xml = '.... (see above) ....'; With topics (id, parent, name, children) As ( Select @xml.value('/topic[1]/@id', 'integer') As id, Cast(null As integer) As parent, @xml.value('/topic[1]/@name', 'varchar(50)') As name, @xml.query('/topic/topicList') As children Union All Select child.node.value('@id', 'integer') As id, topics.id As parent, child.node.value('@name', 'varchar(50)') As name, child.node.query('topicList') As children From topics Cross Apply topics.children.nodes('/topicList/topic') As child(node)) Select id, parent, name From topics;
As with all recursive CTEs, this one is broken into two parts. The rows returned by the first half are used as starting points for the recursion in the second half. In this case, the first half returns only a single row:
Select @xml.value('/topic[1]/@id', 'integer') As id, Cast(null As integer) As parent, @xml.value('/topic[1]/@name', 'varchar(50)') As name, @xml.query('/topic/topicList') As children
id | parent | name | children |
---|---|---|---|
1 | NULL | planets | {xml} |
The second half handles the recursion part. The secret here is in the final column. Each topic row includes a reference to its own children. The recursive portion of the CTE works on each of these children, one by one. Each child further includes a reference to its own children, and so the cycle continues. Thus, each recursion builds on the rows produced by the previous recursion.
cycle | id | parent | name |
---|---|---|---|
1 | 1 | NULL | planets |
2 | 2 | 1 | jupiter |
2 | 7 | 1 | saturn |
3 | 18 | 2 | callisto |
3 | 3 | 2 | ganymede |
3 | 92 | 2 | io |
3 | 11 | 7 | titan |
4 | 21 | 92 | europa |
As interesting as this is, it is not terribly performant. But fear not: a better solution exists! Recall that each XML node contains information concerning its position within its document; it has references to its children, siblings, and parent nodes. Rather than trying to recursively walk the XML tree, then, it is possible to simply get a list of all "topic" nodes and derive the parent information from the node itself. This results in a much simpler query:
Select topics.topic.value('@id', 'integer') As id, topics.topic.value('../../@id', 'integer') As parent, topics.topic.value('@name', 'varchar(50)') As name From @xml.nodes('//topic') As topics(topic)
Yup, that's all. The @xml.nodes('//topic')
finds every <topic/>
node, no matter how deeply nested it is. Then the parent id is found simply by looking at each topic's ancestors: topics.topic.value('../../@id', 'integer')
.
Quick, easy, and simple!
No comments:
Post a Comment