Detecting changes in a table

Suppose I want to sync with my clients a bunch of information when the program has started and at specific intervals throughout the day. However, I don’t want the server to send over a huge list of all the information every single time it is requested, as it’s a waste of bandwidth and time. I should mention that a lot of clients I work with have spotty internet connections, so it’s important that the apps function well offline as well as transfer only the data that is needed to be transferred.)

How do I know that the client has the latest updated copy of data from the database? We do this by generating a checksum of the data that we would transfer. The client would send over its last known checksum value, or NULL if it does not have one stored, and the server only pushes data over if the checksum value passed back to the server is NULL or differs from the checksum the server generates. It’s ridiculously fast, at least for the data sets I send over – a few hundred to a few thousand records per set.

Here are some functions I created to get the data that needs to be synced:

CREATE FUNCTION FranchiseInfo_Franchises (@franchiseID INT) AS ...<br />CREATE FUNCTION FranchiseInfo_Items (@franchiseID INT) AS ...<br />CREATE FUNCTION FranchiseInfo_Staff (@franchiseID INT) AS ...<br />CREATE FUNCTION FranchiseInfo_Locations (@franchiseID INT) AS ...<br />CREATE FUNCTION FranchiseInfo_PrepaidCards (@franchiseID INT) AS ...

Et cetera.

Each function returns columns of data that I push to the client. You don’t specifically NEED to use table-valued functions, but in this scenario it definitely makes it easier to do the following steps. (Specifically, if I had used just stored procs, I would need to copy and paste the same SQL used for the procs in the checksum procedure, which reduces maintainability.)

What we want is some sort of a checksum that would change when the data in the table has changed.

CREATE PROCEDURE FranchiseInfo_GetChecksums
@franchiseID INT
SELECT 'Franchise' AS DataType, (SELECT CHECKSUM_AGG(BINARY_CHECKSUM(*)) FROM dbo.FranchiseInfo_Franchises(@franchiseID)) AS [Checksum]
UNION SELECT 'Items', (SELECT CHECKSUM_AGG(BINARY_CHECKSUM(*)) FROM dbo.FranchiseInfo_Items(@franchiseID))
UNION SELECT 'Locations', (SELECT CHECKSUM_AGG(BINARY_CHECKSUM(*)) FROM dbo.FranchiseInfo_Locations(@franchiseID))<br /> UNION SELECT 'PrepaidCards', (SELECT CHECKSUM_AGG(BINARY_CHECKSUM(*)) FROM dbo.FranchiseInfo_PrepaidCards(@franchiseID))<br /> UNION SELECT 'Staff', (SELECT CHECKSUM_AGG(BINARY_CHECKSUM(*)) FROM dbo.FranchiseInfo_Staff(@franchiseID))<br /> ;

(Later I changed it so each one returned the checksum value in a column, so the procedure would return only one row with the checksum values. It was much easier to do that, rather than parse the row values. I’ll leave this as an exercise to the reader.)

BINARY_CHECKSUM() takes in a list of columns, in this case, the columns returned from the stored procedure. Yes, using * for columns is frowned upon but I don’t care, as it makes it easier to update the functions and know that it’s including that in the aggregation. CHECKSUM_AGG aggregates the other checksums into one checksum.

In my client code, I provide a list of the checksums I already have – as Nullable<long> / long? datatype, and if I have the checksum stored, I pass it in, otherwise, I leave the value as NULL. The server calls the function above and compares the checksums returned. If the checksum is different than what was passed in, then I call the appropriate procedure to return updated data to the client. This way, I’m only passing down the data set only if it has changed.


Wow. I learned something very important about SQL today.

Consider this SQL statement, containing a inline table-valued function (TVF)

FROM dbo.Games_StandardLotto_Summaries(
dbo.Date(DATEADD(D, -1, GETDATE())), 
dbo.EndOfDay(DATEADD(D, 1, GETDATE())), 

It’s taking 17 seconds to run, and I cannot figure out WHY. I copied the guts of the TVF from a sql script I was tweaking for the good part of an hour. It’s a single line with CTEs that runs a fairly complex query in about a second or two – vast improvement from getting the same data from temp tables in 15-20 times the time. But now, it’s back to square one, it seems.


SELECT @sd = dbo.Date(DATEADD(D, -1, GETDATE())), 
    @ed = dbo.EndOfDay(DATEADD(D, 1, GETDATE()));
SELECT * FROM dbo.Games_StandardLotto_Summaries(@sd, @ed, NULL, NULL, NULL);

…runs in under 1 second for me.

So what SQL is telling me here is that for TVFs, and I’m sure more than that, it uses the parameters by reference – meaning it executes the representation of the expression, rather than the value of the expression at the time the TVF is called.

The lesson learned here is that you should always declare and set your variables and use those to execute the TVF, rather than calling it with complex statements. As you can see above, I’m calling GETDATE(), then adding or subtracting a day from that value, and then running a scalar function to make it midnight or 11:59:59:999, twice per row minimum. Storing it into variables knocked the time down to next to nothing. Changing the SQL code above that calls the TVF resulted in a drop from 16.766 seconds to 0.275 seconds. That’s almost a 61x improvement!