I learned a new trick for dealing with deleting rows after the nth index from a table. First, some backstory:
I’m in the gaming industry with clients overseas, and our client has an online casino and lottery house that runs games like slots, blackjack and poker. To launch these games, we use a token system. The customer launches a URL with a GUID token, which is immediately swapped once the game loads. The client continues to request tokens to perform actions behind the scenes, and the token must continually change to prevent someone who got access to a token – either by random guess, or worse.
Token rotation systems aren’t the simplest thing in the world – the token system has to handle multiple tokens for a given customer, because if the client application sends a request to our server that is processed successfully, changing the token, but the client never gets a response, the client is left with an old token and cannot reference that particular customer anymore. The system I set up for this vendor manages 5 tokens simultaneously, with them expiring after 10 minutes. My token table looks like this:
ID CreatedOn CustomerID GamingVendorID Token CountOfUses LastUsedOn IsExpired 321 2014-02-13 12:09:48.337 1151 4 73F636BC-F6C4-4301-83A3-3A24744E0C65 1 2014-02-13 11:59:48.347 0 324 2014-02-13 11:59:48.363 1151 4 12C33AB2-44CA-4946-9CDC-AE0934FA8DBC 0 NULL 0
However, in production there will be tons of tokens, and each vendor is configured to allow a maximum number of tokens simultaneously – this vendor has 5. Without some sort of cleanup routine, there will be a lot more than 5 unexpired tokens in the table. I have a stored procedure to clean up these excess tokens, but since it has to run on every call to validate and rotate a token, I can’t have it look over the data by customer ID and vendor ID, as it would be too slow. I also didn’t want to clean up just the requesting vendor’s and customer’s tokens and leave everything else alone; I wanted the table to have as few rows as possible. So, how can it be done with a single batch and no looping, temporary tables or cursors?
Enter the Common Table Expression (CTE):
WITH cteTokens AS ( SELECT t.ID, ROW_NUMBER() OVER (PARTITION BY g.ID, t.CustomerID ORDER BY t.CreatedOn DESC) AS RowNumber, g.[AllowedNumberOfTokensKeptBesidesCurrentToken] AS CountToKeep FROM CustomerRotatingTokens t LEFT OUTER JOIN GamingVendors g ON (g.ID = t.GamingVendorID) WHERE t.CountOfUses >= g.[AllowedNumberOfTokenReuses] OR ([TokensExpireAfterNumberOfSeconds] IS NOT NULL AND DATEADD(SECOND, g.[TokensExpireAfterNumberOfSeconds], t.CreatedOn) < GETDATE()) ), cteTokensToExpire AS ( SELECT ID FROM cteTokens WHERE RowNumber > CountToKeep ) UPDATE CustomerRotatingTokens SET IsExpired = 1 WHERE ID IN (SELECT ID FROM cteTokensToExpire);
Here’s an overview of the interesting parts of the code:
- WITH cteTokens AS – this names our expression (think of it as a table alias)
- ROW_NUMBER() OVER (PARTITION BY g.ID, t.CustomerID ORDER BY t.CreatedOn DESC) AS RowNumber – this adds a sequential row number to each row, starting at 1. If you specify ROW_NUMBER() OVER (ORDER BY t.CreatedOn DESC) (this is the bare minimum syntax needed to do this), it will add a sequential number to each row. We need to delete any tokens after the 5th newest token, so we need the tokens to be “grouped” by the vendor’s ID and the customer ID. That’s where the PARTITION part comes in – think if it almost like a GROUP BY statement so ROW_NUMBER() starts counting over when it reaches a new combination of those fields.
- cteTokensToExpire AS – this next CTE selects a new table containing only those tokens whose index (the ROW_NUMBER()) is greater than what the vendor is allowed to keep – 5.
- UPDATE… WHERE ID IN (SELECT ID FROM cteTokensToExpire) – expires any tokens that were selected by cteTokensToExpire.
Part of the client requirements is that we differentiate between invalid and expired tokens, so I can’t just delete them from the table immediately, but I have another CTE that blows out expired rows that are a bit older than when they expired.
The speed of the CTE is amazing. It’s most definitely faster than building a temporary table to aggregate the results and then expire and delete the rows, which would not have flown since this has to run a LOT.