From http://www.sqlservercentral.com/articles/Datetime+Manipulation/61822/
Find Missing Numbers
It is obvious that, to find missing numbers, we need to have a table or resultset which contains all the numbers. Then we can apply a NOT IN clause and identify the missing numbers.
If you are wondering, why I needed missing numbers, here is the reason. In one of my applications, there is an entity that we call coordinator. Each coordinator has a 4 digit numeric code. The code starts at 1000 and ends at 9999. (There is no chance that we will have more than 9000 coordinators at any point of time, so a 4 digit code is sufficient). When the user creates a new coordinator, the system automatically generates a new coordinator number. However, the user can still edit it. The system will allow the user to edit the coordinator number (during a new entry) as long as it does not produce a duplicate code.
With the above functionality, we needed a way to reuse un-used/missing numbers. When we create a new coordinator, we need to find the lowest unused coordinator code. For this purpose we needed a way to identify the missing numbers.
As mentioned earlier in this article, I did not want to keep a temp table. I want to generate the numbers on the fly. There are several ways to do this. I like to use a CTE to do this.
In one of my previous articles, I had presented a way to generate a sequence of numbers by using a CTE. It was fast. But a faster code fragment was posted by Kathi Kellenberger in the discussion forum. It was taken from Itzik Ben-Gan’s book “Inside Microsoft SQL Server 2005 T-SQL Querying”. Here is the code fragment which generates 1 million records in the fraction of a second. [code]
WITH
L0 AS (SELECT 1 AS C UNION ALL SELECT 1), --2 rows
L1 AS (SELECT 1 AS C FROM L0 AS A, L0 AS B),--4 rows
L2 AS (SELECT 1 AS C FROM L1 AS A, L1 AS B),--16 rows
L3 AS (SELECT 1 AS C FROM L2 AS A, L2 AS B),--256 rows
L4 AS (SELECT 1 AS C FROM L3 AS A, L3 AS B),--65536 rows
L5 AS (SELECT 1 AS C FROM L4 AS A, L4 AS B),--4294967296 rows
num AS (SELECT ROW_NUMBER() OVER(ORDER BY C) AS N FROM L5)
SELECT N FROM NUM WHERE N <= 1000000;
This code is VERY VERY fast and I find it to be the best candidate for generating a sequence of numbers on the fly. I wanted to re-use this code and hence I created a function that takes a Minimum and Maximum value and returns a resultset containing sequence numbers within the given range. [code]
CREATE FUNCTION dbo.GetNumbers
(
@Start BIGINT,
@End BIGINT
)
RETURNS @ret TABLE(Number BIGINT)
AS
BEGIN
WITH
L0 AS (SELECT 1 AS C UNION ALL SELECT 1), --2 rows
L1 AS (SELECT 1 AS C FROM L0 AS A, L0 AS B),--4 rows
L2 AS (SELECT 1 AS C FROM L1 AS A, L1 AS B),--16 rows
L3 AS (SELECT 1 AS C FROM L2 AS A, L2 AS B),--256 rows
L4 AS (SELECT 1 AS C FROM L3 AS A, L3 AS B),--65536 rows
L5 AS (SELECT 1 AS C FROM L4 AS A, L4 AS B),--4294967296 rows
num AS (SELECT ROW_NUMBER() OVER(ORDER BY C) AS N FROM L5)
INSERT INTO @ret(Number)
SELECT N FROM NUM WHERE N BETWEEN @Start AND @End
RETURN
END
Once we have the function to generate the sequence number, we can easily write the code to identify the missing numbers. [code]
1 SELECT MIN(Number)
2 FROM dbo.GetNumbers(1000, 9999)
3 WHERE Number NOT IN (SELECT CoordinatorID FROM Coordinators)
Find Missing Dates
I guess the requirement to find missing dates must be more common than the missing number requirement I mentioned above. To find the missing dates (Dates at which no sales took place, Dates at which a given employ was absent etc) we need to have a table or resultset which contains all the dates within a given range.
The following example shows how to generate a sequence of dates using the same logic we discussed earlier. [code]
--Generate a sequence of all the dates for the month of October, 2007
SELECT
CAST('2007-10-01' AS DATETIME) + Number-1
FROM dbo.GetNumbers(1, 30)
--Generate a sequence of all the dates for year 2007
SELECT
CAST('2007-01-01' AS DATETIME) + Number-1
FROM dbo.GetNumbers(1, 365)
Once we have a sequence of dates, we can easily apply a NOT IN clause and find the missing dates based on the specific requirement.
Leave a Reply