<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Steve Kass &#187; SQL Server</title>
	<atom:link href="http://www.stevekass.com/category/sql-server/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.stevekass.com</link>
	<description>this is my glass container</description>
	<lastBuildDate>Thu, 02 Feb 2012 04:17:03 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Graphemes to Phonemes Made Easy</title>
		<link>http://www.stevekass.com/2010/09/12/lexemes-to-graphemes/</link>
		<comments>http://www.stevekass.com/2010/09/12/lexemes-to-graphemes/#comments</comments>
		<pubDate>Mon, 13 Sep 2010 04:02:37 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
				<category><![CDATA[Language]]></category>
		<category><![CDATA[Music]]></category>
		<category><![CDATA[SQL Server]]></category>

		<guid isPermaLink="false">http://www.stevekass.com/2010/09/12/la-b%c9%94n-p%ca%81%c9%94no%cc%83sjasjo%cc%83/</guid>
		<description><![CDATA[My favorite book is Peter Lagefoged’s Vowels and Consonants, which is fitting for The Dessoff Choirs’ (self-appointed) pronunciation guru. As part of that job, I prepare International Phonetic Alphabet (IPA) transliterations of our concert music, at least when we’re singing in a language I know something about. It’s a tedious task, but lately less so, [...]]]></description>
			<content:encoded><![CDATA[<p><strike></strike>
<p>My favorite book is Peter Lagefoged’s <a href="http://books.google.com/books?id=cO9yDkqS1Y0C">Vowels and Consonants</a>, which is fitting for <a href="http://en.wikipedia.org/wiki/Dessoff_Choirs">The Dessoff Choirs</a>’ (self-appointed) pronunciation guru. As part of that job, I prepare International Phonetic Alphabet (<a href="http://en.wikipedia.org/w/index.php?title=International_Phonetic_Alphabet">IPA</a>) transliterations of our concert music, at least when we’re singing in a language I know something about. It’s a tedious task, but lately less so, thanks to the workflow system I recently cobbled together for our <a href="http://www.dessoff.org/new/">November concert</a> of French choral music.</p>
<p><strong>Goal: a database of French words and their IPA pronunciations.      <br /></strong>French is largely phonetic, so at first I considered creating a rule-based system to construct words’ approximate transliterations. The prospect became more and more complicated to imagine, and this led me to look for a downloadable lexicon that already included IPA (either the output of someone else’s rule-based system or the result of digitizing an existing dictionary).</p>
<p><strong>Dictionaries aplenty, most of them too “user-friendly.”      <br /></strong>There’s no shortage of good online dictionaries, but the ones I looked at were distinctly unhelpful. Only some of them contain IPA, first of all, and to begin with, most of them are accessible only through a type-and-click web interface. It might have been possible to automate the web interaction and turn my source texts into a sequence of HTTP requests, but my programming skills in that area are badly dated. Back when the web was a collection of static HTML pages, I’d jury rig something with wget and sed. Nowadays, the web is sophisticated. You don’t just go to a URL and get back a plain HTML document or file. A lot of what appears in your browser window requires client-side execution of Javascript or similar nonsense. <a href="http://www.google.com/search?q=wget+javascript">Forget about using wget</a> in such situations. (Similar situations have frustrated me before. Someone will have kindly assembled just the data I need, and will have kindly made it available, but only via a browser form for single-item retrieval.)</p>
<p><strong>Third download’s a charm.</strong>     <br />Eventually, I found some hopeful downloads. The first two, a file for OpenOffice spellcheck, and a dictionary for WinEDT, didn’t fit the bill, but the third, <a href="http://spirit.blau.in/simon/2010/04/07/advantages-of-ralfs-french-dictionary/">Ralf’s French dictionary</a>, did. I don’t know who Ralf is, nor do I know who’s behind the <a href="http://spirit.blau.in/simon/">testing simon</a> blog, where Google Search led me to discover Ralf’s dictionary. (Simon is apparently a speech recognition system, which explains the connection to dictionaries with IPA.) Ralf’s dictionary contains hundreds of thousands of French words (lexemes) with their textual representations (graphemes, like you’re reading here) and IPA equivalents (phonemes).</p>
<p><strong>Ralf’s dictionary is not a dictionary.</strong>     <br />For nearly 25 years, my go-to dictionary for French pronunciation has been a 1980 Hachette. It provides IPA for each of its over 50,000 entries. But like most dictionaries, well, it’s a dictionary, not a lexicon. It’s full of definitions — and that’s the point. “Ralf’s dictionary” is a <a href="http://en.wikipedia.org/wiki/Lexicon">lexicon</a> that happily includes IPA. The big difference for me, today, is that a complete lexicon like Ralf’s contains all the words people utter (or sing), many of which (especially verbs in the case of French) are not dictionary “words,” but are inflected forms of dictionary words. You can find <em>parler</em> in Hachette (on page 1137), right between <em>parlementer</em> and <em>parleur,</em> and you can find it in Ralf’s (at position 259506), also between <em>parlementer</em> and <em>parleur</em>, but in Ralf’s, it’s not <em><strong>right</strong></em> between. After <em>parlementer</em> and before <em>parler</em> in Ralf’s you’ll find (though turning data pages creates no wonderful musty book smell) <em>parlementera</em>, <em>parlementerai</em>, <em>parlementeraient</em>, …, <em>parlements</em>, <em>parlementâmes</em>, …, <em>parlementé</em>, <em>parlementée</em>, and <em>parlementées</em>. And all with IPA.</p>
<p><strong>Ok, so <em>dussé</em> is missing. But <em>eut</em> is not.</strong>     <br />For years, I was never quite sure how to pronounce some inflected verb forms in French. Was the pronunciation of <em>eut</em> (not an entry in Hachette) the same as for <em>eu</em> (which is listed), or does it rhyme with <em>peut</em>? Not that I have occasion to speak <em>eut</em> often, but I’ve had occasion to sing it (in d’Indy’s delightful Madrigal, for example, which Dessoff will be singing in a choral arrangement this November). Sure, I could have asked someone, but that would mean having to ask someone. According to Ralf, the answer is yes. Both <em>eut</em> and <em>eu</em> are pronounced [y]. Ralf could be wrong (he often is — I’ll get to that later, though he <a href="http://forum.wordreference.com/showthread.php?t=584769">doesn’t appear to be</a> in this case), but the pronunciation of <em>eut</em> is a valuable fact, and he recognizes that. </p>
<p>Click <a href="http://www.youtube.com/watch?v=sKBOdeYeRxQ">here</a> to see YouTube’s divoboy perform d’Indy’s Madrigal (with outstanding French diction <strong><em>save for the incorrect pronunciation of </em>eut<em>, because it probably wasn’t in his dictionary</em></strong>).</p>
<p>One of the weirder French verb forms I do know how to pronounce is <em>dussé</em>, as in <em>“Je vais faire cela, dussé-je le regretter ensuite.”</em> By itself, <em>dussé</em> isn’t really a word, but when <em>dusse</em> (the first person imperfect subjunctive form of <em>devoir</em>) and various other verb forms ending in a mute e appear in inversion with its pronomial subject, the spelling changes: <em>e</em> becomes <em>é</em>. Despite the <em>accent aigu</em>, however, dussé-je is pronounced [dus<font size="2">&#x025B;&#x0292;</font>], not [duse&#x0292;]. For better or for worse, by the way, the days of <em>dussé-je</em> may be numbered. In its controversial 1990 “rectifications,” France’s <a href="http://en.wikipedia.org/wiki/Superior_Council_of_the_French_language">Superior Council of the French Language</a> (only in France, you may think, but also in Belgium and Canada) declared the correct spelling to henceforth be <em>dussè-je</em>. That makes a lot of sense, but of course this is the organization that in the same proclamation tried to change the official spelling of oignon to ognon. As you can imagine, that didn’t go over very well, so we’ll see if <em>dussè-je</em> sticks. You can read more about <em>dussé-je</em>/<em>dussè-je</em>&#160;<a href="http://forum.wordreference.com/showthread.php?t=18902">here</a>, which is where I copped the sample sentence above.</p>
<p><strong>Ok, I’ll say it: XML is not evil.</strong>     <br />Ralf’s dictionary is an XML file. I’ll admit it, I’ve got issues with XML, or more specifically with people who think XML is a database format, but Ralf used it wisely, as a self-documenting container for data exchange. CSV would have been fine, too, but XML was a better idea here, because the Unicode characters that represent IPA don’t always survive being shuttled around in less standardized text files. </p>
<p><strong>Import time.</strong>     <br />Each lexeme in Ralf’s dictionary was associated with a phoneme (the IPA I wanted), a grapheme (the lexeme written down) and sometimes a role (abbreviation, letter, name, or verb). The IPA in Ralf’s dictionary was for speech, and I ultimated needed slightly different pronunciations for singing, so I imported Ralf’s data into a table with an extra phoneme column that contained the changes I wanted.</p>
<p>My database platform of choice, as always, is Microsoft SQL Server. With a lot more trial and error than I’d have needed to import from CSV or various other formats, I finally managed to make XQuery happy. Here’s my import query.</p>
<pre style="line-height: 10pt">WITH Imported(Item,Role,Grapheme,Phoneme) AS (
  SELECT
    T1.lexeme.query('.'),
    T1.lexeme.value('./@role','nvarchar(100)') as Role,
    T1.lexeme.value('grapheme[1]','nvarchar(100)') as Grapheme,
    T1.lexeme.value('phoneme[1]','nvarchar(100)') as Phoneme
  FROM FD
  CROSS APPLY x.nodes('/lexicon/lexeme') AS T1(lexeme)
)
  INSERT INTO FrenchIPA
  SELECT
    Item,
    Role,
    Grapheme,
    Phoneme,
    replace(replace(
      Phoneme,N'?',N'?'
      ),N'??',N'o?'
    )
    as Phoneme2
  FROM Imported;</pre>
<p><strong>Replacing graphemes with phonemes.</strong> </p>
<p>The source texts I had were just that — texts, text strings. In order to use the table FrenchIPA, I had to identify the individual words in my texts. While in theory, that’s harder than writing the right XQuery for import, it’s something I’ve done a gazillion times and <a href="http://groups.google.com/groups/search?q=kass+split+group%3Amicrosoft.public.sqlserver.*">helped other people do a gazillion times</a>. <a href="http://users.drew.edu/skass/sql/ListToTableProc.sql.txt">One version</a> of a query for this has been on my Drew web page for years. Cobble, cobble, cobble, and out comes this clumsy, kludgy, clunky, but effective query I used to make a first pass at word-for-word transliteration (replacing each word in the input string variable @txt with its associated phoneme).</p>
<pre style="line-height: 10pt">with Puncts(n1,n2) as (
  select
    n as n1,
    (select min(n) from Nums as N2
     where N2.n &lt;= len(@txt) and N2.n &gt;= N1.n
     and substring(@txt,N2.n,1) not like '%[a-z]%' collate Latin1_General_CI_AS
    ) as n2
  from dbo.Nums as N1
  where n &lt;= len(@txt)
), Wds(st,fn,w) as (
  select
    min(n1), n2,
    substring(@txt,min(n1),n2-min(n1)) as wd
  from Puncts
  group by n2
), Reps(i,st,fn,w,Grapheme,IPA) as (
  select row_number() over (order by st desc), st, fn, w, Grapheme, P2
  from Wds join FrenchIPA
  on lower(w) = Grapheme
), Result(i,r) as (
  select cast(0 as bigint),@txt
  union all
  select
    Reps.i, stuff(r,st,fn-st,IPA)
  from Reps join Result
  on Reps.i = Result.i+1
)
  select top 1 '['+replace(replace(r,' ','   '),'
',']
[')+']' from Result order by i desc
  option (MAXRECURSION 1000);</pre>
<p>The most kludgy part is the recursive query that replaces one word at a time with IPA. If anyone is curious about how this works, ask me.</p>
<p><strong>Cleaning up the result. </strong></p>
<p>This doesn’t produce the final transliteration, by any means, but it’s darn close. Here’s what it yields for d’Indy’s Madrigal (and which example allows me to type the word with two apostrophes yet again).</p>
<p><em>[Note: I see garbage below in Chrome; IE is ok. And unfortunately, some combination of WordPress, MySQL, Windows Live Writer, and HTML disagrees with Unicode’s combining diacritical characters, so you’ll see meandering tildes.]</em></p>
<pre style="line-height: 10pt">[ki   &#x0292;am&#x025B;   fy   d&#x0259;   ply   &#x0283;a&#x027E;m&#x0251;&#x0303;   viza&#x0292;,]
[d&#x0259;   k&#x0254;l   ply   bl&#x0251;&#x0303;,   d&#x0259;   &#x0283;&#x0259;v&#x0153;   ply   swaj&oelig;;]
[ki   &#x0292;am&#x025B;   fy   d&#x0259;   ply   &#x0292;&#x0251;&#x0303;ti   ko&#x027E;sa&#x0292;,]
[ki   &#x0292;am&#x025B;   fy   k&#x0259;   ma   dam   &#x0254;   du   i&oelig;!]
[ki   &#x0292;am&#x025B;   y   l&#x025B;v&#x027E;   ply   su&#x027E;i&#x0251;&#x0303;t,]
[ki   su&#x027E;i&#x0251;&#x0303;   &#x027E;&#x0251;&#x0303;di   k&#x0153;&#x027E;   ply   &#x0292;waj&oelig;,]
[ply   &#x0283;ast   s&#x025B;&#x0303;   su   gimp   t&#x027E;&#x0251;&#x0303;spa&#x027E;&#x0251;&#x0303;t,]
[ki   &#x0292;am&#x025B;   y   k&#x0259;   ma   dam   &#x0254;   du   i&oelig;!]
[ki   &#x0292;am&#x025B;   y   vwa   de'&oelig;&#x0303;   ply   du   &#x0251;&#x0303;t&#x0251;&#x0303;d&#x027E;,]
[mi&#x0272;&#x0254;n   d&#x0251;&#x0303;   ki   bu&#x0283;   &#x0251;&#x0303;p&#x025B;&#x027E;l   mj&oelig;;]
[ki   &#x0292;am&#x025B;   fy   d&#x0259;   &#x027E;&#x0259;ga&#x027E;de   si   t&#x0251;&#x0303;d&#x027E;,]
[ki   &#x0292;am&#x025B;   fy   k&#x0259;   ma   dam   &#x0254;   du   i&oelig;!]</pre>
<p>All that’s left is touchup, mainly. </p>
<p style="line-height: 12pt">1. Add schwas for syllables that are silent in speech, but not in song. (Spoken, <em>Frères Jacques</em> has two syllables; sung, it has four.) </p>
<p>2. Fix some mistakes in Ralf’s dictionary, like his having gotten œ and ø backwards most everywhere. (It’s debatable whether a distinction really exists anyway.) </p>
<p>3. Indicate where there are liaisons (and check against the music to avoid marking them across rests). </p>
</p>
<p>After not much additional work, this is what I got:</p>
<pre style="line-height: 10pt">[ki   &#x0292;am&#x025B;   fy   d&#x0259;   ply   &#x0283;a&#x027E;m&#x0251;&#x0303;   viza&#x0292;&#x0259;]
[d&#x0259;   k&#x0254;l   ply   bl&#x0251;&#x0303;,   d&#x0259;   &#x0283;&#x0259;v&oslash;   ply   swaj&oslash;]
[ki   &#x0292;am&#x025B;   fy   d&#x0259;   ply   &#x0292;&#x0251;&#x0303;ti   ko&#x027E;sa&#x0292;&#x0259;]
[ki   &#x0292;am&#x025B;   fy   k&#x0259;   ma   dam&#x203f;o   duz&#x203f;j&oslash;]

[ki   &#x0292;am&#x025B;z&#x203f;y   l&#x025B;v&#x027E;&#x0259;   ply   su&#x027E;i&#x0251;&#x0303;t&#x0259;]
[ki   su&#x027E;i&#x0251;&#x0303;   &#x027E;&#x0251;&#x0303;di   k&oelig;&#x027E;   ply   &#x0292;waj&oslash;]
[ply   &#x0283;ast&#x0259;   s&#x025B;&#x0303;   su   g&#x025B;&#x0303;p&#x0259;   t&#x027E;&#x0251;&#x0303;spa&#x027E;&#x0251;&#x0303;t&#x0259;]
[ki   &#x0292;am&#x025B;   fy   k&#x0259;   ma   dam&#x203f;o   duz&#x203f;j&oslash;]

[ki   &#x0292;am&#x025B;z&#x203f;y   vwa   d&oelig;&#x0303;   ply   duz&#x203f;&#x0251;&#x0303;t&#x0251;&#x0303;d&#x027E;&#x0259;]
[mi&#x0272;&#x0254;n&#x0259;   d&#x0251;&#x0303;   ki   bu&#x0283;&#x203f;&#x0251;&#x0303;p&#x025B;&#x027E;l&#x0259;   mj&oslash;]
[ki   &#x0292;am&#x025B;   fy   d&#x0259;   &#x027E;&#x0259;ga&#x027E;de   si   t&#x0251;&#x0303;d&#x027E;&#x0259;]
[ki   &#x0292;am&#x025B;   fy   k&#x0259;   ma   dam&#x203f;o   duz&#x203f;j&oslash;]</pre>
<p>This makes me very happy, and, despite the time I spent writing the queries, it saved me a lot of time. In fact, it probably took more time to write this post than it did to put together the IPA for this concert. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.stevekass.com/2010/09/12/lexemes-to-graphemes/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Localization (probably) strikes again</title>
		<link>http://www.stevekass.com/2009/11/26/localization-probably-strikes-again/</link>
		<comments>http://www.stevekass.com/2009/11/26/localization-probably-strikes-again/#comments</comments>
		<pubDate>Fri, 27 Nov 2009 02:17:18 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[SQL Server]]></category>

		<guid isPermaLink="false">http://stevekass.com/2009/11/26/localization-probably-strikes-again/</guid>
		<description><![CDATA[Yesterday, the Italian postal service misprocessed a bunch of ATM and credit card transactions. Specifically, the virgola was shifted two places, appending two zeros to the transaction amount. There’s no telling exactly how this happened, but it wouldn’t surprise me if it had something—if not everything—to do with localization in one way or another. In [...]]]></description>
			<content:encoded><![CDATA[<p>Yesterday, the Italian postal service misprocessed a bunch of ATM and credit card transactions. Specifically, the <em>virgola</em> was shifted two places, appending two zeros to the transaction amount. There’s no telling exactly how this happened, but it wouldn’t surprise me if it had something—if not everything—to do with <a href="http://en.wikipedia.org/wiki/Localization" target="_blank">localization</a> in one way or another. In Italy, a comma (<em>virgola</em>), not a period, precedes a number’s decimal part, but software might see things otherwise.</p>
<p>Some software interprets number strings according to the operating system localization (unless overridden). Other software ignores the OS localization. SQL Server’s CAST operator, for example, only accepts a period as the decimal separator, and it disregards commas in strings intended to represent numbers.</p>
<p>At least it does this <a href="http://groups.google.com/group/microsoft.public.es.sqlserver/browse_thread/thread/602e49958909fb9b/ad6e74c54ae5abc?hl=en&amp;ie=UTF-8&amp;q=kass+isnumeric+comma+decimal#0ad6e74c54ae5abc" target="_blank">as of 2005</a>; previous versions followed a complicated set of rules in an attempt to disallow numbers that weren’t valid in the U.S., India, or China. In India (ones, thousands, lakhs, crore, thousand crore, lakhs crore, etc.), digit groups bounce between two and three digits, and 1,234,56,70,000.0 is a valid number. In China (yi1, wan4, yi4, wan4 yi4, etc.), it would be 123,4567,0000.0. Interpreting human-readable representations of numbers is no simple task. Explaining the issue isn’t much easier. </p>
<p>In all versions of SQL Server, this happens regardless of language or culture settings.</p>
<pre><code>select cast('115,00' as money) as TooMuch;

TooMuch
---------------------
11500.00</code></pre>
<p>[From <a href="http://entertainment.slashdot.org/story/09/11/25/1448218/Moving-Decimal-Bug-Loses-Money" target="_blank">Slashdot</a>, noting <a href="http://www.ilsole24ore.com/art/SoleOnLine4/Italia/2009/11/poste-italiane-disguido-addebiti-gonfiati.shtml" target="_blank">ilsole24ore.com</a>] </p>
]]></content:encoded>
			<wfw:commentRss>http://www.stevekass.com/2009/11/26/localization-probably-strikes-again/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>9/11 pager intercepts on Wikileaks</title>
		<link>http://www.stevekass.com/2009/11/26/911-pager-intercepts-on-wikileaks/</link>
		<comments>http://www.stevekass.com/2009/11/26/911-pager-intercepts-on-wikileaks/#comments</comments>
		<pubDate>Thu, 26 Nov 2009 04:56:13 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
				<category><![CDATA[Black Tuesday]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Teaching]]></category>

		<guid isPermaLink="false">http://stevekass.com/2009/11/26/911-pager-intercepts-on-wikileaks/</guid>
		<description><![CDATA[Early this morning, Wikileaks began posting alphanumeric pager messages from four carriers (Arch, Metrocall, Skytel, and Weblink_B) that were intercepted during a 24-hour period beginning early on September 11, 2001. Alphanumeric pager messages are unencrypted, and, like communications over a public 802.11 wireless network, they’re skimmable with the right (and not exotic) software and hardware. [...]]]></description>
			<content:encoded><![CDATA[<p>Early this morning, <a href="http://911.wikileaks.org/files/index.html" target="_blank">Wikileaks began posting</a> alphanumeric pager messages from four carriers (Arch, Metrocall, Skytel, and Weblink_B) that were intercepted during a 24-hour period beginning early on September 11, 2001. Alphanumeric pager messages are unencrypted, and, like communications over a public 802.11 wireless network, they’re skimmable with the right (and not exotic) software and hardware.</p>
<ul>
<li>“Due to today&#8217;s tragic events, it makes sense to cut back wherever feasible on payroll. Expect a very light business day. Please call all stores and review payroll issues”</li>
<li>“RING ALL CHICAGO AIPORTS AND EVERY MAJOR BUILDING DOWNTOWN. BUSH IS DOING A SPEECH.&#160; THIS IS SERIOUS POOH..”</li>
<li>“Holy crap, are you watching the news.”</li>
<li>“I hope you have gone home by now. The BoA tower and space needle here are closed. I suspect tall buildings across the country will be closed. Take care my love.-cb”</li>
</ul>
<p>This might be the most interesting public data mine since <a href="http://stevekass.com/category/black-tuesday/" target="_blank">the AOL breach</a>. The total volume is far less, but unlike the AOL data, this data hasn’t been anonymized. There are full names, phone numbers, and other identifying information in the mix. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.stevekass.com/2009/11/26/911-pager-intercepts-on-wikileaks/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Buy my book (from Barnes and Noble)</title>
		<link>http://www.stevekass.com/2009/04/13/buy-my-book-from-barnes-and-noble/</link>
		<comments>http://www.stevekass.com/2009/04/13/buy-my-book-from-barnes-and-noble/#comments</comments>
		<pubDate>Mon, 13 Apr 2009 22:52:13 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
				<category><![CDATA[SQL Server]]></category>

		<guid isPermaLink="false">http://stevekass.com/2009/04/13/buy-my-book-from-barnes-and-noble/</guid>
		<description><![CDATA[If you squint, you’ll see my name in tiny print under Itzik’s. He wrote most of the book, but I contributed two chapters and did most of the technical review. Click on the image to visit the book&#8217;s Barnes and Noble page. . . . . .]]></description>
			<content:encoded><![CDATA[<p><a title="Inside Microsoft SQL Server 2008: T-SQL Querying" href="http://search.barnesandnoble.com/booksearch/isbnInquiry.asp?isbn=9780735626034"><img style="margin: 0px 5px 5px 0px; display: inline" src="http://images.barnesandnoble.com/images/34720000/34725169.JPG" alt="" align="left" /></a>If you squint, you’ll see my name in tiny print under Itzik’s. He wrote most of the book, but I contributed two chapters and did most of the technical review. Click on the image to visit the book&#8217;s Barnes and Noble page.</p>
<p>.</p>
<p>.</p>
<p>.</p>
<p>.</p>
<p>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.stevekass.com/2009/04/13/buy-my-book-from-barnes-and-noble/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Read this if you serve up web pages from SQL data</title>
		<link>http://www.stevekass.com/2008/05/31/read-this-if-you-serve-up-web-pages-from-sql-data/</link>
		<comments>http://www.stevekass.com/2008/05/31/read-this-if-you-serve-up-web-pages-from-sql-data/#comments</comments>
		<pubDate>Sat, 31 May 2008 00:07:27 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
				<category><![CDATA[SQL Server]]></category>

		<guid isPermaLink="false">http://stevekass.com/2008/05/31/read-this-if-you-serve-up-web-pages-from-sql-data/</guid>
		<description><![CDATA[If you manage, write, visit, or otherwise have anything to do with a web app that connects to a SQL Server database, good guy and Microsoft Program Manager Buck Woody wants you to read this: [copied with permission from here] You might have read recently that there have been ongoing SQL injection attacks against vulnerable [...]]]></description>
			<content:encoded><![CDATA[<p>If you manage, write, visit, or otherwise have anything to do with a web app that connects to a SQL Server database, good guy and Microsoft Program Manager Buck Woody wants you to read this:</p>
<p>[copied with permission from <a title="http://blogs.msdn.com/buckwoody/archive/2008/05/30/sql-injection-attacks.aspx" href="http://blogs.msdn.com/buckwoody/archive/2008/05/30/sql-injection-attacks.aspx">here</a>]</p>
<blockquote><p>You might have read recently that there have been ongoing SQL injection attacks against vulnerable web applications occurring over the last few months.&nbsp; These attacks have received recurring attention in the press as they pop up in various geographies around the world. These attacks do not leverage any SQL Server vulnerabilities or any un-patched vulnerabilities in any Microsoft product – the attack vector is vulnerable custom applications. In fact, SQL Injection is a coding issue that can attack any database system, so it&#8217;s a good idea to learn how to defend against them.
<p>In order to help you respond to and defend yourself from these attacks, Microsoft has an authoritative blog including talking points and guidance.&nbsp; You can find this at <a title="http://blogs.technet.com/swi/archive/2008/05/29/sql-injection-attack.aspx" href="http://blogs.technet.com/swi/archive/2008/05/29/sql-injection-attack.aspx">this Technet location</a>. (Retype the underlying URL if you like. I only linked it this way because it wrapped.)</p>
</blockquote>
<p>Ok, if you didn&#8217;t visit <a title="http://blogs.technet.com/swi/archive/2008/05/29/sql-injection-attack.aspx" href="http://blogs.technet.com/swi/archive/2008/05/29/sql-injection-attack.aspx">the Technet link</a>, visit it before reading on.
<p>Thanks. Now I&#8217;ll add another bit of advice:
<p>There&#8217;s a non-SQL injection issue here as well. The risk in question starts when a web application incorporates part of the URL into SQL and executes it blindly (SQL injection), but the risk to end users only occurs because the web app commits &#8220;HTML<br />injection.&#8221; The web app unwittingly delivers a malicious bit of HTML that says &#8220;Hey browser, please run a script from this other web site.&#8221; That malicious bit of HTML won&#8217;t be sent to my browser if the web application doesn&#8217;t blindly incorporate table data (especially table data containing HTML tags) into the HTML pages it delivers.
<p>Here&#8217;s an analogy. When you fill a prescription, you get instructions like &#8220;Take one pill twice a day for seven days.&#8221; Those instructions probably get printed out of some database. If the instructions say &#8220;Chew up all the pills and wash them down with a cup of bleach,&#8221; something&#8217;s wrong with the pharmacy&#8217;s database. Something&#8217;s also wrong with the pharmacy for not catching the bogus instructions before dispensing the prescription. And if you follow the instructions, something&#8217;s wrong with you.</p>
<p>The risk Buck is drawing our attention to is like this, and the Technet blog tells us to secure our database. Just as importantly, we should pay attention to what we dispense, and not just assume that if we&#8217;re dispensing our data, it&#8217;s good data. Browsers often render (and in the case of scripts, execute) whatever a trusted site sends them, and if trusted sites send HTML out without vetting it, well, they shouldn&#8217;t be trusted. If you&#8217;re a web developer and you want your site to be trusted, then vet what you deliver.</p>
<p>I don&#8217;t do web apps, but I don&#8217;t think a responsible web app should send me script tags that refer to third-party sites. In fact, the web app probably shouldn&#8217;t send me any table data without scrubbing it for tags, non-printing ASCII characters, etc.
<p>Many years ago, we thought it was funny to email people BEL characters, and then someone figured out email shouldn&#8217;t be allowed to contain BEL. Years ago bulletin boards figured out they shouldn&#8217;t allow users to put any old HTML into their posts.<br />The threat then was still minor &#8211; jokers figured out they could mess up some bulletin board formatting by posting opening tags without closing them. Apparently this was only half fixed. Web apps typically scrub what comes in through the expected channels, but a lot of web apps (most?) apparently don&#8217;t scrub the HTML they send out. They should. In fact, they must, now that the bad guys have figured out how to exploit sloppy web apps to modify table data bypassing the expected route. The bad guys may soon find some more sloppy code and exploit it to mess with your data.</p>
<p>Just as it&#8217;s possible to scrub outgoing email for viruses, it should be possible (and routine) to scrub outgoing HTML for malicious content. While I don&#8217;t trust email attachments that have a &#8220;no viruses&#8221; sticker on them, and I wouldn&#8217;t trust a random site that tells me &#8220;this web page is safe,&#8221; I would trust Microsoft or another trustworthy source if they told me their web servers scrub all outgoing web pages for unexpected script tags. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.stevekass.com/2008/05/31/read-this-if-you-serve-up-web-pages-from-sql-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Spearman&#8217;s rho for SQL Server</title>
		<link>http://www.stevekass.com/2008/03/29/spearmans-rho-for-sql-server/</link>
		<comments>http://www.stevekass.com/2008/03/29/spearmans-rho-for-sql-server/#comments</comments>
		<pubDate>Sat, 29 Mar 2008 06:33:51 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
				<category><![CDATA[SQL Server]]></category>

		<guid isPermaLink="false">http://stevekass.com/2008/03/29/spearmans-rho-for-sql-server/</guid>
		<description><![CDATA[Before SQL Server 2005 was released, a calculation that requiring a ranking was both relatively difficult to express as a single query and relatively inefficient to execute. That changed in SQL Server 2005 with support for the SQL analytic functions RANK(), ROW_NUMBER(), etc., and partial support for SQL&#8217;s OVER clause. Spearman&#8217;s rho (Spearman&#8217;s correlation coefficient) [...]]]></description>
			<content:encoded><![CDATA[<p>Before SQL Server 2005 was released, a calculation that requiring a ranking was both relatively difficult to express as a single query and relatively inefficient to execute. That changed in SQL Server 2005 with support for the SQL analytic functions RANK(), ROW_NUMBER(), etc., and partial support for SQL&#8217;s OVER clause.</p>
<p>Spearman&#8217;s rho (Spearman&#8217;s correlation coefficient) is a useful statistic that can be calculated more easily in SQL Server 2005 than in earlier versions. Below is an implementation of Spearman&#8217;s rho for SQL Server 2005 and later.</p>
<p>SQL&#8217;s RANK() and the rank order required for the calculation of Spearman&#8217;s rho are slightly different: if for example four values are tied for third place, RANK() will equal 3 for all four of them. The Spearman&#8217;s formula requires them all to be ranked 4.5, the average of their positions (3rd, 4th, 5th, and 6th) in an ordered list of the data. To address this difference, the code below adjusts the SQL RANK() by adding to it 0.5 for each occurrence of a data value beyond the first. I used COUNT(*) with an OVER clause for this.</p>
<p>The script below demonstrates the calculation for two data sets. The first one is from <a href="http://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient" title="Accessed March 28, 2008" target="_blank">Wikipedia&#8217;s page on Spearman&#8217;s rho</a>; I made up the second data set to include duplicate data values. I haven&#8217;t tested the code thoroughly, but for a variety of small test data sets, it matches hand calculations and the result <a href="http://www.wessa.net/rankcorr.wasp" title="Accessed March 28, 2008" target="_blank">here</a> [1].</p>
<p><font face="Courier New">create table SampleData (<br />
ID int identity(1,1) primary key,<br />
x decimal(5,2),<br />
y decimal(5,2)<br />
); </font></p>
<p><font face="Courier New">insert into SampleData(x,y) values(106,7);<br />
insert into SampleData(x,y) values(86,0);<br />
insert into SampleData(x,y) values(100,27);<br />
insert into SampleData(x,y) values(101,50);<br />
insert into SampleData(x,y) values(99,28);<br />
insert into SampleData(x,y) values(103,29);<br />
insert into SampleData(x,y) values(97,20);<br />
insert into SampleData(x,y) values(113,12);<br />
insert into SampleData(x,y) values(112,6);<br />
insert into SampleData(x,y) values(110,17);<br />
go </font></p>
<p><font face="Courier New">create procedure Spearman as<br />
with RankedSampleData(ID,x,y,rk_x,rk_y) as (<br />
select<br />
ID,<br />
x,<br />
y,<br />
rank() over (order by x) +<br />
(count(*) over (partition by x) &#8211; 1)/2.0,<br />
rank() over (order by y) +<br />
(count(*) over (partition by y) &#8211; 1)/2.0<br />
from SampleData<br />
)<br />
select<br />
1e0 -<br />
(<br />
6<br />
*sum(square(rk_x-rk_y))<br />
/count(*)<br />
/(square(count(*)) &#8211; 1)<br />
)<br />
from RankedSampleData;<br />
go </font></p>
<p><font face="Courier New">exec Spearman; </font></p>
<p><font face="Courier New">go<br />
truncate table SampleData;<br />
go </font></p>
<p><font face="Courier New">insert into SampleData(x,y) values(1,3);<br />
insert into SampleData(x,y) values(3,5);<br />
insert into SampleData(x,y) values(5,8);<br />
insert into SampleData(x,y) values(3,4);<br />
insert into SampleData(x,y) values(4,7);<br />
insert into SampleData(x,y) values(4,6);<br />
insert into SampleData(x,y) values(3,4);<br />
go </font></p>
<p><font face="Courier New">exec Spearman;<br />
go </font></p>
<p><font face="Courier New">drop proc Spearman;<br />
drop table SampleData;</font></p>
<p>[1] Wessa, P. (2008), Free Statistics Software, Office for Research Development and Education, version 1.1.22-r4, URL <a href="http://www.wessa.net/">http://www.wessa.net/</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.stevekass.com/2008/03/29/spearmans-rho-for-sql-server/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Elapsed time excluding nights and weekends</title>
		<link>http://www.stevekass.com/2007/12/19/elapsed-time-excluding-nights-and-weekends/</link>
		<comments>http://www.stevekass.com/2007/12/19/elapsed-time-excluding-nights-and-weekends/#comments</comments>
		<pubDate>Wed, 19 Dec 2007 20:39:54 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
				<category><![CDATA[SQL Server]]></category>

		<guid isPermaLink="false">http://stevekass.com/2007/12/19/elapsed-time-excluding-nights-and-weekends/</guid>
		<description><![CDATA[Finding elapsed time in SQL Server is easy, so long as the clock is always running: just use DATEDIFF. But you often need to find elapsed time excluding certain periods, like weekends, nights, or holidays. A fellow SQL Server MVP recently posed a variation on this problem: to find the number of minutes between two [...]]]></description>
			<content:encoded><![CDATA[<p>Finding elapsed time in SQL Server is easy, so long as the clock is always running: just use DATEDIFF. But you often need to find elapsed time excluding certain periods, like weekends, nights, or holidays. A fellow SQL Server MVP recently posed a variation on this problem: to find the number of minutes between two times, where the clock is running only from 6:00am-6:00pm, Monday-Friday. He needed this to compute how long trouble tickets stayed at a help desk that was open for those hours.</p>
<p>I came up with a function DeskTimeDiff_minutes(@from,@to) for him. It requires a permanent table that spans the range of times you might care about, holding one row for every time the clock is turned on or off, weekdays at 6:00am and 6:00pm in this case.</p>
<p>The table also holds an &#8220;absolute business time&#8221; in minutes (ABT-m): the total number of &#8220;help desk open&#8221; minutes since a fixed but arbitrary &#8220;beginning of time.&#8221; Elapsed help desk time is then simply the difference between ABT-m values. While the table only records the ABT-m 10 times a week, you can find the ABT-m for an arbitrary datetime @d easily. Find the row of the table with time d closest to @d but not later. In that row you&#8217;ll find the ABT-m at time d, and you&#8217;ll also find out whether the clock was (or will be) running or not between d and @d. If not, the ABT-m at time @d is the same as at time d. Otherwise, add the number of minutes between d and @d.</p>
<p>Here&#8217;s the code. The reference table here is good from early 2000 until well past 2050, and you can easily extend it or adapt it to other business rules. A larger permanent table of times shouldn&#8217;t affect performance, because the function only performs (two) index seek lookups on the table.</p>
<p>If you cut and paste this for your own use, watch out for &#8220;smart quotes&#8221; or other WordPress/Live Writer formatting quirks.</p>
<p><font face="Lucida Console">create table Minute_Count(<br />&nbsp; d datetime primary key,<br />&nbsp; elapsed_minutes int not null,<br />&nbsp; timer varchar(10) not null check (timer in (&#8216;Running&#8217;,'Stopped&#8217;))<br />); </font>
<p><font face="Lucida Console">insert into Minute_Count values (&#8217;2000-01-03T06:00:00&#8242;,0,&#8217;Running&#8217;);<br />insert into Minute_Count values (&#8217;2000-01-03T18:00:00&#8242;,12*60,&#8217;Stopped&#8217;); </font>
<p><font face="Lucida Console">insert into Minute_Count values (&#8217;2000-01-04T06:00:00&#8242;,12*60,&#8217;Running&#8217;);<br />insert into Minute_Count values (&#8217;2000-01-04T18:00:00&#8242;,24*60,&#8217;Stopped&#8217;); </font>
<p><font face="Lucida Console">insert into Minute_Count values (&#8217;2000-01-05T06:00:00&#8242;,24*60,&#8217;Running&#8217;);<br />insert into Minute_Count values (&#8217;2000-01-05T18:00:00&#8242;,36*60,&#8217;Stopped&#8217;); </font>
<p><font face="Lucida Console">insert into Minute_Count values (&#8217;2000-01-06T06:00:00&#8242;,36*60,&#8217;Running&#8217;);<br />insert into Minute_Count values (&#8217;2000-01-06T18:00:00&#8242;,48*60,&#8217;Stopped&#8217;); </font>
<p><font face="Lucida Console">insert into Minute_Count values (&#8217;2000-01-07T06:00:00&#8242;,48*60,&#8217;Running&#8217;);<br />insert into Minute_Count values (&#8217;2000-01-07T18:00:00&#8242;,60*60,&#8217;Stopped&#8217;);<br />/* any Monday-Friday week */</font>
<p><font face="Lucida Console">declare @week int;<br />set @week = 1;<br />while @week &lt; 2100 begin<br />&nbsp; insert into Minute_Count<br />&nbsp;&nbsp;&nbsp; select<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; dateadd(week,@week,d),<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; elapsed_minutes + 60*@week*60,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; timer<br />&nbsp; from Minute_Count<br />&nbsp; set @week = @week * 2<br />end; </font>
<p><font face="Lucida Console">go </font>
<p><font face="Lucida Console">create function DeskTimeDiff_minutes(<br />&nbsp; @from datetime,<br />&nbsp; @to datetime<br />) returns int as begin<br />&nbsp; declare @fromSerial int;<br />&nbsp; declare @toSerial int;<br />&nbsp; with S(d,elapsed_minutes,timer) as (<br />&nbsp;&nbsp;&nbsp; select top 1 d,elapsed_minutes, timer<br />&nbsp;&nbsp;&nbsp; from Minute_Count<br />&nbsp;&nbsp;&nbsp; where d &lt;= @from<br />&nbsp;&nbsp;&nbsp; order by d desc<br />&nbsp; )<br />&nbsp;&nbsp;&nbsp; select @fromSerial =<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; elapsed_minutes +<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; case when timer = &#8216;Running&#8217;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; then datediff(minute,d,@from)<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; else 0 end<br />&nbsp;&nbsp;&nbsp; from S;<br />&nbsp; with S(d,elapsed_minutes,timer) as (<br />&nbsp;&nbsp;&nbsp; select top 1 d,elapsed_minutes, timer<br />&nbsp;&nbsp;&nbsp; from Minute_Count<br />&nbsp;&nbsp;&nbsp; where d &lt;= @to<br />&nbsp;&nbsp;&nbsp; order by d desc<br />&nbsp; )<br />&nbsp;&nbsp;&nbsp; select @toSerial =<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; elapsed_minutes +<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; case when timer = &#8216;Running&#8217;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; then datediff(minute,d,@to)<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; else 0 end<br />&nbsp;&nbsp;&nbsp; from S;<br />&nbsp; return @toSerial &#8211; @fromSerial;<br />end;<br />go<br />select MAX(d) from Minute_Count<br />select dbo.DeskTimeDiff_minutes(&#8217;2007-12-19T18:00:00&#8242;,&#8217;2007-12-24T17:51:00&#8242;);<br />go </font>
<p><font face="Lucida Console">drop function DeskTimeDiff_minutes;<br />drop table Minute_Count;</font></p>
]]></content:encoded>
			<wfw:commentRss>http://www.stevekass.com/2007/12/19/elapsed-time-excluding-nights-and-weekends/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The hemisphere requirement</title>
		<link>http://www.stevekass.com/2007/11/21/the-hemisphere-requirement/</link>
		<comments>http://www.stevekass.com/2007/11/21/the-hemisphere-requirement/#comments</comments>
		<pubDate>Wed, 21 Nov 2007 22:16:47 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
				<category><![CDATA[SQL Server]]></category>

		<guid isPermaLink="false">http://stevekass.com/2007/11/21/the-hemisphere-requirement/</guid>
		<description><![CDATA[Microsoft plans to support spatial data types in SQL Server 2008, and a preview is available to the community in the latest CTP (community technology preview), available here. John O&#8217;Brien, a Windows Live Developer MVP,&#160;has been&#160;trying out the new spatial types in some cool Virtual Earth projects&#160;(John&#8217;s site is&#160;here), and&#160;in one of his projects, SQL [...]]]></description>
			<content:encoded><![CDATA[<p>Microsoft plans to support spatial data types in SQL Server 2008, and a preview is available to the community in the latest CTP (community technology preview), available <a title="http://www.microsoft.com/sql/2008/default.mspx" href="http://www.microsoft.com/sql/2008/default.mspx">here</a>.
<p>John O&#8217;Brien, a Windows Live Developer MVP,&nbsp;has been&nbsp;trying out the new spatial types in some cool Virtual Earth projects&nbsp;(John&#8217;s site is&nbsp;<a title="http://www.soulsolutions.com.au" href="http://www.soulsolutions.com.au">here</a>), and&nbsp;in one of his projects, SQL Server threw an interesting error message. When he zoomed far enough out in Virtual Earth, then tried to create a polygon from the map bounds, SQL Server reacted with:
<p>“The specified input does not represent a valid geography instance because it exceeds a single hemisphere. Each geography instance must fit inside a single hemisphere. A common reason for this error is that a polygon has the wrong ring orientation.”
<p>John&nbsp;found a workaround, dividing the map into two pieces, but he was interested to know what the SQL Server folk thought about the situation. Here’s my reply. It’s less a response to John’s inquiry than it is a ramble about geometry and what hemispheres and orientation have to do with how you can or can’t specify polygons.
<p>To begin, think of the earth’s Equator as a polygon. How would you answer the following questions?
<ul>
<li>“If I travel Eastbound around the earth along the equator, have I gone clockwise or counter-clockwise?”
<li>“Is the north pole inside the equator or outside the equator?” </li>
</ul>
<p>In the plane (or on a flat map of the world), a polygon or other closed non-self-intersecting curve has a well-defined “inside” and “outside”. A polygon separates the plane into two regions, one that has finite area and one that is unbounded. The finite region is deemed “inside” the polygon. On a sphere, however, a closed curve determines two finite regions, either of which might be what someone thinks of as the inside.
<p>For example, the four-sided outline of the US state of Wyoming separates the earth into what you could call “Wyoming” and “anti-Wyoming.” But are we so sure which is the inside and which is the outside? Our intuition is that the smaller region is always the inside, but there’s nothing about geometry and geography to tell us that. Maybe Wyoming is most of the world. A single geographic region could contain most of the earth’s surface within its borders, couldn’t it?
<p>Suppose Wyoming declared itself to be Great Wyoming and annexed all of North America, Europe, and continued to conquer the world. Suppose its armies crossed the equator and eventually took over almost everything—everything but Antarctica, in fact.
<p>Then the boundary of Great Wyoming would then be the same as the boundary of Antarctica. You would probably want Great Wyoming to be inside the boundary of Great Wyoming and Antarctica to be inside the boundary of Antarctica, but how can that work—the boundaries are the same?
<p>This is a problem. On a sphere, the naïve idea of interior/exterior isn’t well-defined. One solution would be to pass a law that every polygon on earth must fit inside a single hemisphere with room to spare. We could then <i>define</i> the interior of a polygon to be the smaller of the two regions it determines. This would place Antarctica, not Wyoming, within the borders of Great Wyoming—wrong, but unambiguous. And anyway, who would ever need to consider a region <s>bigger than 640K</s> that doesn’t fit inside a single hemisphere?
<p>Fortunately, though, we don’t have to abandon or compromise the notion of interior and exterior on the earth’s surface: Antarctica can remain outside Greater Wyoming. All we need to do is be precise about the direction in which we describe a polygon. When specifying the boundary of a region, you can give a forwards/backwards or clockwise/counter-clockwise sense to the boundary by choosing the way you order the list of vertices. List them so that what you consider inside the region is on your left as you &#8220;connect the dots,&#8221; because we will&nbsp;adopt the convention that the left side as you walk the perimeter is the inside. What’s on the right will be interpreted as outside. Now you can describe the boundary of Great Wyoming. Just describe it as drawn from west to east, so Antarctica is on the right (exterior). (This works because a sphere is an “orientable surface.” SQL Server’s new geography data type isn’t supported on a Klein bottle, where CultureInfo.IsOrientableWorld—if such a property existed—would be false.)
<p>Once we require polygons to be oriented, there’s no need to require that they fit within a single hemisphere, but nonetheless, SQL Server 2008’s geography data type adopts the hemisphere requirement. For geometry objects of type Polygon, I think this is a good idea. I’m not sure whether it’s a standard GIS requirement or just SQL Server’s, but it prevents users from accidentally entering the coordinates of Wyoming in clockwise fashion only to discover later that Perth and Addis Ababa, but not Cheyenne, are in Wyoming. [For some of the other geography types, such as LineString, I don’t see a benefit from requiring the object to fit in a hemisphere, but consistency isn’t a bad thing.]</p>
]]></content:encoded>
			<wfw:commentRss>http://www.stevekass.com/2007/11/21/the-hemisphere-requirement/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A Million Random Digits with 100,000 Normal Deviates</title>
		<link>http://www.stevekass.com/2006/08/09/a-million-random-digits-with-100000-normal-deviates/</link>
		<comments>http://www.stevekass.com/2006/08/09/a-million-random-digits-with-100000-normal-deviates/#comments</comments>
		<pubDate>Wed, 09 Aug 2006 04:27:08 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
				<category><![CDATA[SQL Server]]></category>

		<guid isPermaLink="false">http://www.stevekass.com/2006/08/09/a-million-random-digits-with-100000-normal-deviates/</guid>
		<description><![CDATA[Groundbreaking when it was published in 1955, the classic book &#8220;A Million Random Digits with 100,000 Normal Deviates&#8221; has been republished electronically by the RAND corporation with permission &#8220;to duplicate this electronic document for personal use only, as long as it is unaltered and complete.&#8221; Books like these were a staple of statistical research in [...]]]></description>
			<content:encoded><![CDATA[<p>Groundbreaking when it was published in 1955, the classic book &#8220;A Million Random Digits with 100,000 Normal Deviates&#8221; has been republished electronically by the RAND corporation with permission &#8220;to duplicate this electronic document for personal use only, as long as it is unaltered and complete.&#8221;  Books like these were a staple of statistical research in the mid-20th century, and this particular one was highly revered.</p>
<p>Nowadays, there are better sources of random numbers, such as <a href="http://www.fourmilab.ch/hotbits/">HotBits</a>, and there are many ways to generate pseudorandom numbers, which are not random, but have many of the properties of random number and are useful for many purposes.</p>
<p>I hope it&#8217;s not a violation of the copyright for me to provide instructions on how to use SQL to load the book&#8217;s content in its published format (or any identically-formatted list) into a SQL table that can be queried for random (not pseudorandom) sequences of numbers. The script uses a few of SQL Server 2005&#8242;s new features, including the BULK rowset provider for text files, some of the new analytic functions, and TOP with a variable. You&#8217;ll also need a table-valued function called Numbers(), like the one in my previous SQL post.</p>
<p>The RAND book is available <a href="http://www.rand.org/pubs/monograph_reports/MR1418/index.html">here</a>, and my script works for the support file &#8220;Datafile: A Million Random Digits,&#8221; available for download <a href="http://www.rand.org/pubs/monograph_reports/MR1418/index.html">here</a>. The SQL Server 2005 script below assumes you&#8217;ve downloaded this file and unzipped it to C:\\RAND\\MillionDigits.txt.</p>
<p>The beginning of the file looks like this</p>
<p><code>00000   10097 32533  76520 13586  34673 54876  80959 09117  39292 74945<br />
00001   37542 04805  64894 74296  24805 24037  20636 10402  00822 91665<br />
00002   08422 68953  19645 09303  23209 02560  15953 34764  35080 33606<br />
00003   99019 02529  09376 70715  38311 31165  88676 74397  04436 27659<br />
00004   12807 99970  80157 36147  64032 36653  98951 16877  12171 76833<br />
00005   66065 74717  34072 76850  36697 36170  65813 39885  11199 29170<br />
00006   31060 10805  45571 82406  35303 42614  86799 07439  23403 09732<br />
00007   85269 77602  02051 65692  68665 74818  73053 85247  18623 88579<br />
00008   63573 32135  05325 47048  90553 57548  28468 28709  83491 25624<br />
00009   73796 45753  03529 64778  35808 34282  60935 20344  35273 88435</code></p>
<p>Unix-style newlines (<tt>0x0A</tt>) are used, and the million digits are organized into 20,000 five-digit integers with leading zeroes, so the script will import the file into a table of 20,000 five-digit numbers (as char(5) data with leading zeroes). Here&#8217;s the script:  <span id="more-31"></span></p>
<p><tt>create database MillionDigits<br />
go</tt></p>
<p><tt>use MillionDigits<br />
go</tt></p>
<p><tt> </tt></p>
<p><tt>create table MillionDigitsFile (<br />
c varchar(max)<br />
)<br />
go</p>
<p>insert into MillionDigitsFile<br />
select BulkColumn<br />
from openrowset(bulk 'C:\\RAND\\MillionDigits.txt\\', SINGLE_CLOB) as D<br />
go</p>
<p>create table NumbersFromTable(<br />
position int primary key,<br />
number char(5) not null<br />
)<br />
create index NumbersFromTable_number on NumbersFromTable(number)<br />
go</p>
<p>-- The first of the five groups of two numbers each<br />
-- begins at position 9 of each line. Each of the other<br />
-- four groups on a line begins 13 characters after the<br />
-- previous one. The second number in each group<br />
-- begins 6 characters after the first.<br />
insert into NumbersFromTable<br />
select<br />
row_number() over (order by N.n,A.n,B.n) as rk,<br />
substring(c,9+72*N.n+13*A.n+6*B.n,5) as n<br />
from<br />
Numbers(0,19999) as N,<br />
Numbers(0,4) as A,<br />
Numbers(0,1) as B,<br />
MillionDigitsFile<br />
go</p>
<p>-- How random does it look? (and a sneaky way to<br />
-- aggregate over an aggregate)<br />
select top 1<br />
min(count(*)) over (),<br />
max(count(*)) over (),<br />
avg(1.00000*count(*)) over (),<br />
stdev(count(*)) over ()<br />
from NumbersFromTable<br />
group by number<br />
go</p>
<p>/* Selects a @length-long sequence of numbers from<br />
the table, where the place to start is found as<br />
follows.  Given a random integer, use % to turn<br />
it into a number's position between 1 and 200000.<br />
Reduce that position % 20000 to find a starting<br />
line of the book, and reduce the following<br />
number % 10 to find the starting number on<br />
that line.<br />
*/<br />
create function RandomSequence(<br />
@seed int,<br />
@length int<br />
) returns table as return (<br />
select top (@length)<br />
row_number() over (order by position) as i,<br />
number<br />
from NumbersFromTable<br />
where position &gt;= (<br />
select number%20000<br />
from NumbersFromTable<br />
where 1+@seed%200000 = position<br />
) + (<br />
select number%10<br />
from NumbersFromTable<br />
where 1+(@seed+1)%200000 = position<br />
)<br />
order by position<br />
)<br />
go</p>
<p>-- Generate a few random sequences. You'll get different ones<br />
-- each time you run this.<br />
declare @seed int<br />
set @seed = abs(binary_checksum(newid()))%200000<br />
select * from RandomSequence(@seed,50)<br />
set @seed = abs(binary_checksum(newid()))%200000<br />
select * from RandomSequence(@seed,123)</p>
<p></tt></p>
<p><tt>-- Uncomment to clean up<br />
-- use master<br />
-- go<br />
-- drop database MillionDigits</tt></p>
]]></content:encoded>
			<wfw:commentRss>http://www.stevekass.com/2006/08/09/a-million-random-digits-with-100000-normal-deviates/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to generate a sequence on the fly</title>
		<link>http://www.stevekass.com/2006/06/03/how-to-generate-a-sequence-on-the-fly/</link>
		<comments>http://www.stevekass.com/2006/06/03/how-to-generate-a-sequence-on-the-fly/#comments</comments>
		<pubDate>Sat, 03 Jun 2006 14:44:10 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
				<category><![CDATA[SQL Server]]></category>

		<guid isPermaLink="false">http://www.stevekass.com/2006/06/03/how-to-generate-a-sequence-on-the-fly/</guid>
		<description><![CDATA[One of the things that kept me busy this past winter and spring was tech editing Itzik Ben-Gan&#8217;s two books in Microsoft Press&#8217;s Inside MicrosoftÂ® SQL Serverâ„¢ 2005 series (1,2). Of Itzik&#8217;s many clever solutions to programming problems, my favorite was this function that returns a table of consecutive integers. It&#8217;s blazingly fast, and it&#8217;s [...]]]></description>
			<content:encoded><![CDATA[<p>One of the things that kept me busy this past winter and spring was tech editing Itzik Ben-Gan&#8217;s two books in Microsoft Press&#8217;s Inside MicrosoftÂ® SQL Serverâ„¢ 2005 series (<a href="http://www.microsoft.com/MSPress/books/9615.asp">1</a>,<a href="http://www.microsoft.com/MSPress/books/8564.asp">2</a>).  Of Itzik&#8217;s many clever solutions to programming problems, my favorite was this function that returns a table of consecutive integers. It&#8217;s blazingly fast, and it&#8217;s the best way I know of to generate a sequence on the fly &#8211; probably even better than accessing a permanent table of integers.</p>
<p><code>create function Numbers(<br />
&nbsp;&nbsp;@from as bigint,<br />
&nbsp;&nbsp;@to   as bigint<br />
) returns table with schemabinding as return<br />
&nbsp;&nbsp;with t0(n) as (<br />
&nbsp;&nbsp;&nbsp;&nbsp;select 1 union all select 1<br />
&nbsp;&nbsp;), t1(n) as (<br />
&nbsp;&nbsp;&nbsp;&nbsp;select 1 from t0 as a, t0 as b<br />
&nbsp;&nbsp;), t2(n) as (<br />
&nbsp;&nbsp;&nbsp;&nbsp;select 1 from t1 as a, t1 as b<br />
&nbsp;&nbsp;), t3(n) as (<br />
&nbsp;&nbsp;&nbsp;&nbsp;select 1 from t2 as a, t2 as b<br />
&nbsp;&nbsp;), t4(n) as (<br />
&nbsp;&nbsp;&nbsp;&nbsp;select 1 from t3 as a, t3 as b<br />
&nbsp;&nbsp;), t5(n) as (<br />
&nbsp;&nbsp;&nbsp;&nbsp;select 1 from t4 as a, t4 as b<br />
&nbsp;&nbsp;), Numbers(n) as (<br />
&nbsp;&nbsp;&nbsp;&nbsp;select row_number() over (order by n) as n<br />
&nbsp;&nbsp;&nbsp;&nbsp;from t5<br />
&nbsp;&nbsp;)<br />
&nbsp;&nbsp;&nbsp;&nbsp;select @from + n - 1 as n<br />
&nbsp;&nbsp;&nbsp;&nbsp;from Numbers<br />
&nbsp;&nbsp;&nbsp;&nbsp;where n <= @to - @from + 1<br />
</code></p>
]]></content:encoded>
			<wfw:commentRss>http://www.stevekass.com/2006/06/03/how-to-generate-a-sequence-on-the-fly/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

