Groundbreaking when it was published in 1955, the classic book “A Million Random Digits with 100,000 Normal Deviates” has been republished electronically by the RAND corporation with permission “to duplicate this electronic document for personal use only, as long as it is unaltered and complete.” Books like these were a staple of statistical research in the mid-20th century, and this particular one was highly revered.

Nowadays, there are better sources of random numbers, such as HotBits, and there are many ways to generate pseudorandom numbers, which are not random, but have many of the properties of random number and are useful for many purposes.

I hope it’s not a violation of the copyright for me to provide instructions on how to use SQL to load the book’s content in its published format (or any identically-formatted list) into a SQL table that can be queried for random (not pseudorandom) sequences of numbers. The script uses a few of SQL Server 2005’s new features, including the BULK rowset provider for text files, some of the new analytic functions, and TOP with a variable. You’ll also need a table-valued function called Numbers(), like the one in my previous SQL post.

The RAND book is available here, and my script works for the support file “Datafile: A Million Random Digits,” available for download here. The SQL Server 2005 script below assumes you’ve downloaded this file and unzipped it to C:\\RAND\\MillionDigits.txt.

The beginning of the file looks like this

00000   10097 32533  76520 13586  34673 54876  80959 09117  39292 74945
00001   37542 04805  64894 74296  24805 24037  20636 10402  00822 91665
00002   08422 68953  19645 09303  23209 02560  15953 34764  35080 33606
00003   99019 02529  09376 70715  38311 31165  88676 74397  04436 27659
00004   12807 99970  80157 36147  64032 36653  98951 16877  12171 76833
00005   66065 74717  34072 76850  36697 36170  65813 39885  11199 29170
00006   31060 10805  45571 82406  35303 42614  86799 07439  23403 09732
00007   85269 77602  02051 65692  68665 74818  73053 85247  18623 88579
00008   63573 32135  05325 47048  90553 57548  28468 28709  83491 25624
00009   73796 45753  03529 64778  35808 34282  60935 20344  35273 88435

Unix-style newlines (0x0A) are used, and the million digits are organized into 20,000 five-digit integers with leading zeroes, so the script will import the file into a table of 20,000 five-digit numbers (as char(5) data with leading zeroes). Here’s the script:  

create database MillionDigits
go

use MillionDigits
go

create table MillionDigitsFile (
c varchar(max)
)
go

insert into MillionDigitsFile
select BulkColumn
from openrowset(bulk 'C:\\RAND\\MillionDigits.txt\\', SINGLE_CLOB) as D
go

create table NumbersFromTable(
position int primary key,
number char(5) not null
)
create index NumbersFromTable_number on NumbersFromTable(number)
go

-- The first of the five groups of two numbers each
-- begins at position 9 of each line. Each of the other
-- four groups on a line begins 13 characters after the
-- previous one. The second number in each group
-- begins 6 characters after the first.
insert into NumbersFromTable
select
row_number() over (order by N.n,A.n,B.n) as rk,
substring(c,9+72*N.n+13*A.n+6*B.n,5) as n
from
Numbers(0,19999) as N,
Numbers(0,4) as A,
Numbers(0,1) as B,
MillionDigitsFile
go

-- How random does it look? (and a sneaky way to
-- aggregate over an aggregate)
select top 1
min(count(*)) over (),
max(count(*)) over (),
avg(1.00000*count(*)) over (),
stdev(count(*)) over ()
from NumbersFromTable
group by number
go

/* Selects a @length-long sequence of numbers from
the table, where the place to start is found as
follows. Given a random integer, use % to turn
it into a number's position between 1 and 200000.
Reduce that position % 20000 to find a starting
line of the book, and reduce the following
number % 10 to find the starting number on
that line.
*/
create function RandomSequence(
@seed int,
@length int
) returns table as return (
select top (@length)
row_number() over (order by position) as i,
number
from NumbersFromTable
where position >= (
select number%20000
from NumbersFromTable
where 1+@seed%200000 = position
) + (
select number%10
from NumbersFromTable
where 1+(@seed+1)%200000 = position
)
order by position
)
go

-- Generate a few random sequences. You'll get different ones
-- each time you run this.
declare @seed int
set @seed = abs(binary_checksum(newid()))%200000
select * from RandomSequence(@seed,50)
set @seed = abs(binary_checksum(newid()))%200000
select * from RandomSequence(@seed,123)

-- Uncomment to clean up
-- use master
-- go
-- drop database MillionDigits