June 2006

Some family photos.

ElephantBunny PaxSteveMV05

One of the things that kept me busy this past winter and spring was tech editing Itzik Ben-Gan’s two books in Microsoft Press’s Inside Microsoft® SQL Serverâ„¢ 2005 series (1,2). Of Itzik’s many clever solutions to programming problems, my favorite was this function that returns a table of consecutive integers. It’s blazingly fast, and it’s the best way I know of to generate a sequence on the fly – probably even better than accessing a permanent table of integers.

create function Numbers(
  @from as bigint,
  @to as bigint
) returns table with schemabinding as return
  with t0(n) as (
    select 1 union all select 1
  ), t1(n) as (
    select 1 from t0 as a, t0 as b
  ), t2(n) as (
    select 1 from t1 as a, t1 as b
  ), t3(n) as (
    select 1 from t2 as a, t2 as b
  ), t4(n) as (
    select 1 from t3 as a, t3 as b
  ), t5(n) as (
    select 1 from t4 as a, t4 as b
  ), Numbers(n) as (
    select row_number() over (order by n) as n
    from t5
    select @from + n - 1 as n
    from Numbers
    where n <= @to - @from + 1

Estimated row size in bytes is an important factor used by the SQL Server optimizer to estimate query cost, and I’ve found an anomaly in the estimated costing algorithm for the Sort operator, as well as in the actual cost of sorting long data.The estimated cost of a Sort seems to take a giant jump when the estimated row size exceeds 4000 bytes, but that jump in estimated cost doesn’t correspond to any jump in actual cost.

It’s important to note that the jump does not depend on the length of the sort key, but only on the length of the row data being carried along. The cost estimate for sorting a estimated-to-be-long row on a short key is much greater than for sorting an estimated-to-be-medium-length row on the same short key.   (more…)