Recently, I’ve been developing a SQL Server Data Warehouse solution where I needed to remove all the “unused” rows from particular dimension tables, that is, rows that weren’t referenced by a foreign key relationship.
In itself, this isn’t particularly tricky. I had code that did something like this (using the AdventureWorks2008 sample database):
DELETE p FROM [Person].[Address] p WHERE NOT EXISTS ( SELECT 1 FROM [Person].[BusinessEntityAddress] f WHERE f.[AddressID] = p.[AddressID]) |
I added a NOT EXISTS
clause for each foreign key relationship and table.
Now, as you would expect, as the number of tables and relationships grew, this became cumbersome and prone to error. I reasoned that SQL Server “knows” what can and can’t be deleted and if you try to delete a row that is referenced from another table will report an error such as:
The DELETE statement conflicted with the REFERENCE constraint “FK_BusinessEntityAddress_Address_AddressID”
As a result, I developed a script that will check the foreign key relationships on a given table and only attempt to delete the rows that are not referenced. If tables and relationships are added at a later date, I no longer needed to worry about updating my code to take this into account.
The script (for SQL Server 2008) is shown here:
-- Set the name of the schema and table here DECLARE @SCHEMA sysname SET @SCHEMA = 'Person' DECLARE @TABLE sysname SET @TABLE = 'Address'; -- Determine unused rows for the above specified table DECLARE @SQL nvarchar(max); WITH ForeignKeys AS ( SELECT ROW_NUMBER() OVER(ORDER BY fk.object_id) as RowNumber, SCHEMA_NAME(fk.schema_id) as ReferencingSchemaName, OBJECT_NAME(fk.parent_object_id) AS ReferencingTableName, COL_NAME(col.parent_object_id, col.parent_column_id) AS ReferencingColumnName, COL_NAME(col.referenced_object_id, col.referenced_column_id) AS ReferencedColumnName FROM sys.foreign_keys AS fk INNER JOIN sys.foreign_key_columns AS col ON fk.object_id = col.constraint_object_id WHERE fk.referenced_object_id = OBJECT_ID(@SCHEMA + '.' + @TABLE) AND is_disabled = 0 ) SELECT @SQL = ISNULL(@SQL, 'DELETE p FROM [' + @SCHEMA + '].[' + @TABLE + '] p WHERE 1=1') + ' AND NOT EXISTS (SELECT 1 FROM [' + ReferencingSchemaName + '].['+ ReferencingTableName + '] f' + cast(RowNumber as varchar) + ' WHERE f' + cast(RowNumber as varchar) + '.[' + ReferencingColumnName + '] = p.[' + ReferencedColumnName + '])' FROM ForeignKeys EXEC sp_ExecuteSQL @SQL |
Currently, it is only suitable for tables that are referenced using a single column key. I’ll expand it to handle multiple columns in another post.