Showing posts with label SQL. Show all posts
Showing posts with label SQL. Show all posts

Wednesday, December 29, 2010

Notes on GetSchemaTable

I recent found myself using the SqlDataReader.GetSchemaTable method. I made some notes detailing what it will return for various data types in SQL Server 2005.

Notes on GetSchemaTable
 Numeric 
ColumnNameDataTypeColumnSizePrecisionScaleNotes
bigintInt64819255 
bitBoolean1255255 
decimalDecimal17180 
intInt32410255 
moneyDecimal819255 
numericDecimal17180 
smallintInt1625255 
smallmoneyDecimal410255 
tinyintByte13255 
floatDouble815255 
realSingle47255 
datetimeDateTime8233 
smalldatetimeDateTime4160 
charString20255255 
varcharString20255255 
textString2147483647255255isLong
ncharString20255255 
nvarcharString20255255 
varchar(max)String2147483647255255isLong
nvarchar(max)String2147483647255255isLong
ntextString1073741823255255isLong
timestampByte[]8255255IsRowVersion
xmlString2147483647255255isLong

I created these notes through the simple method of creating a table with every possible data type. Or rather, every data type that I am interested in, which is very nearly the same thing--I mean, who uses varbinary, really? I then used the GetSchemaTable method on my new table, and inspected the results.

There is a page on MSDN that appears to contain this information, but upon closer inspection, it must be talking about something else entirely. For instance, it claims that the SQL Server data type "varchar" has no mapping whatsoever, not even as a System.String.

Certainly, this makes sense upon further reflection: it's possible to put any varchar value into a string, but not the other way around. Still, it's not terribly useful if you just want to know what sort of data type you can reasonably expect to get out of a given column.

I further note that the MSDN documentation states that the NumericPrecision column should be null for non-numeric data types, but this is simply not true.

Monday, December 7, 2009

Flattening recursive XML data in SQL Server 2005

I recently had to flatten an XML data set that looked something like this:


<topic id="1" name="planets">
  <topicList>
    <topic id="2" name="jupiter">
      <topicList>
        <topic id="3" name="ganymede" />
        <topic id="18" name="callisto" />
        <topic id="92" name="io">
          <topicList>
            <topic id="21" name="europa" />
          </topicList>
        </topic>
      </topicList>
    </topic>
    <topic id="7" name="saturn">
      <topicList>
        <topic id="11" name="titan" />
      </topicList>
    </topic>
  </topicList>
</topic>

I needed to produce a result set that preserved the parent/child relationship between each topic. In other words, I needed to know that Callisto belonged to Jupiter when I finished. At first, I thought this was clearly a job for recursion. I envisioned leaping deftly from each topic to each of its children in turn. Indeed, with SQL Server 2005 and common table expressions, this is quite possible:


Declare @xml XML;
Set @xml = '.... (see above) ....';
With topics (id, parent, name, children)
As (
  Select
    @xml.value('/topic[1]/@id', 'integer') As id,
    Cast(null As integer) As parent,
    @xml.value('/topic[1]/@name', 'varchar(50)') As name,
    @xml.query('/topic/topicList') As children

  Union All

  Select
    child.node.value('@id', 'integer') As id,
    topics.id As parent,
    child.node.value('@name', 'varchar(50)') As name,
    child.node.query('topicList') As children
  From topics
  Cross Apply topics.children.nodes('/topicList/topic') As child(node))
Select id, parent, name
From topics;

As with all recursive CTEs, this one is broken into two parts. The rows returned by the first half are used as starting points for the recursion in the second half. In this case, the first half returns only a single row:


Select
  @xml.value('/topic[1]/@id', 'integer') As id,
  Cast(null As integer) As parent,
  @xml.value('/topic[1]/@name', 'varchar(50)') As name,
  @xml.query('/topic/topicList') As children

Result:
idparentnamechildren
1NULLplanets{xml}

The second half handles the recursion part. The secret here is in the final column. Each topic row includes a reference to its own children. The recursive portion of the CTE works on each of these children, one by one. Each child further includes a reference to its own children, and so the cycle continues. Thus, each recursion builds on the rows produced by the previous recursion.


Results:
cycleidparentname
11NULLplanets
221jupiter
271saturn
3182callisto
332ganymede
3922io
3117titan
42192europa

As interesting as this is, it is not terribly performant. But fear not: a better solution exists! Recall that each XML node contains information concerning its position within its document; it has references to its children, siblings, and parent nodes. Rather than trying to recursively walk the XML tree, then, it is possible to simply get a list of all "topic" nodes and derive the parent information from the node itself. This results in a much simpler query:


Select
  topics.topic.value('@id', 'integer') As id,
  topics.topic.value('../../@id', 'integer') As parent,
  topics.topic.value('@name', 'varchar(50)') As name
From @xml.nodes('//topic') As topics(topic)

Yup, that's all. The @xml.nodes('//topic') finds every <topic/> node, no matter how deeply nested it is. Then the parent id is found simply by looking at each topic's ancestors: topics.topic.value('../../@id', 'integer').

Quick, easy, and simple!

Wednesday, June 10, 2009

SQL Server Subquery Quirk

Here's an interesting quirk of SQL Server 2005 that bit me today.

Consider this fictional query:

Select pageID, givenName
From phy.Contacts
Where pageID In (
  Select pageID
  From (
    Select id
    From oth.Pages
    Where [Hidden] = 0) As [subquery])

Here there are two levels of subqueries here.  The innermost level returns a set of [id] values.  The next level up is asking the innermost level for a set of [pageID] values.  Alas, the innermost query does not contain a [pageID] column.

However, this query executes anyway, without any errors, and appears to return decent results.  This surprised me: the middle-level query is looking for a column that doesn't exist.  I would have expected an error message in this case, not a result set.

This behavior is explained in a tiny "Caution" box in MSDN:

If a column is referenced in a subquery that does not exist in the table referenced by the subquery's FROM clause, but exists in a table referenced by the outer query's FROM clause, the query executes without error. SQL Server implicitly qualifies the column in the subquery with the table name in the outer query.

So apparently, the middle-level query is pulling the pageID column from the OUTER query instead.  Fascinating.