Initializing a newly added column in a very large table
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-- Adding a new column, which you want to initialize with a default value (often true with bit columns) | |
-- can be tricky. Our gut feeling is to write a quick update statement, or use a construct like the following. | |
ALTER TABLE dbo.a_very_large_table | |
ADD test INT NOT NULL DEFAULT(0) | |
GO | |
/* | |
Both these methods should be avoided at all cost. They create a very large transaction, creating a situation of blocking or getting blocked. If blocked, rollback process can take hours. | |
The following script proposes two methods. | |
First one uses a controlled update using row count and the second one uses a cursor. | |
The cursor method is prone to least blocking because it updates one row at a time, | |
but the update can take a long time. Please read the conclusion at the bottom of the script. | |
*/ | |
--##### Mass updating of a column in a large table | |
--##### TEST1 Using controlled rowcount ###### | |
begin try | |
alter table dbo.a_very_large_table | |
drop column test | |
end try | |
begin catch | |
end catch | |
go | |
begin try | |
alter table dbo.a_very_large_table | |
add test bit | |
end try | |
begin catch | |
end catch | |
go | |
set nocount on; | |
set transaction isolation level read uncommitted; -- make sure to include it to avoid block other processes or getting blocked. | |
print 'About to start updating null values to 0 for test' | |
select current_timestamp | |
set rowcount 10000 | |
while (1=1) | |
begin | |
update dbo.a_very_large_table set | |
test=0 | |
where | |
test is null | |
if @@rowcount=0 | |
break | |
end | |
select current_timestamp | |
print 'done' | |
go | |
set rowcount 0 | |
go | |
-- Row count 10000 (Not so good) | |
-- Start: 2014-02-12 10:44:29.550 | |
-- End: 2014-02-12 10:45:41.753 | |
-- Row count 100000 (Better) | |
-- Start: 2014-02-12 10:29:38.397 | |
-- End: 2014-02-12 10:31:02.823 | |
-- Row count 200000 (Even better) | |
-- Start: 2014-02-12 10:41:50.973 | |
-- End: 2014-02-12 10:42:02.850 | |
-- Row count 300000 (Best) Potential Problem: Larger rowcount will create a large transaction. Potential rollback may take a long time. May block other processes as well. | |
-- Start: 2014-02-12 10:43:34.890 | |
-- End: 2014-02-12 10:43:52.370 | |
--##### TEST2 Using Cursors ###### | |
USE ServiceBook | |
go | |
begin try | |
alter table dbo.a_very_large_table | |
drop column test | |
end try | |
begin catch | |
end catch | |
go | |
begin try | |
alter table dbo.a_very_large_table | |
add test bit | |
end try | |
begin catch | |
end catch | |
go | |
set rowcount 0 | |
set nocount on | |
print 'about to start updating null values to 0 for test using cursor' | |
select current_timestamp | |
declare @crsr cursor | |
set @crsr=cursor local for | |
select | |
e.a_very_large_tableid | |
from | |
dbo.a_very_large_table e | |
where | |
e.test is null | |
declare @pk int | |
open @crsr | |
while (1=1) | |
begin | |
fetch NEXT FROM @crsr INTO | |
@pk | |
IF @@fetch_status <> 0 | |
break | |
UPDATE dbo.a_very_large_table SET | |
test=0 | |
WHERE | |
a_very_large_tableId=@pk | |
end | |
close @crsr | |
deallocate @crsr | |
select current_timestamp | |
print 'done' | |
go | |
set rowcount 0 | |
go | |
-- Cursor method: Killed it after 5 mins. | |
-- Conclusion1: Rowcount controlled loops are the best way to mass update columns in large tables. | |
-- Conclusion2: Setting of rowcount should be picked carefully. | |
-- Larger rowcount will create a large transaction. Potential rollback may take a long time. May block other processes as well. | |
-- Divide the total# of rows by a number between 30-50, and use that to control the rowcount. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment