Skip to content

Instantly share code, notes, and snippets.

@jujiro
Created May 20, 2019 17:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jujiro/a7daeea6de6f6bde34f77608906ae665 to your computer and use it in GitHub Desktop.
Save jujiro/a7daeea6de6f6bde34f77608906ae665 to your computer and use it in GitHub Desktop.
Initializing a newly added column in a very large table
-- Adding a new column, which you want to initialize with a default value (often true with bit columns)
-- can be tricky. Our gut feeling is to write a quick update statement, or use a construct like the following.
ALTER TABLE dbo.a_very_large_table
ADD test INT NOT NULL DEFAULT(0)
GO
/*
Both these methods should be avoided at all cost. They create a very large transaction, creating a situation of blocking or getting blocked. If blocked, rollback process can take hours.
The following script proposes two methods.
First one uses a controlled update using row count and the second one uses a cursor.
The cursor method is prone to least blocking because it updates one row at a time,
but the update can take a long time. Please read the conclusion at the bottom of the script.
*/
--##### Mass updating of a column in a large table
--##### TEST1 Using controlled rowcount ######
begin try
alter table dbo.a_very_large_table
drop column test
end try
begin catch
end catch
go
begin try
alter table dbo.a_very_large_table
add test bit
end try
begin catch
end catch
go
set nocount on;
set transaction isolation level read uncommitted; -- make sure to include it to avoid block other processes or getting blocked.
print 'About to start updating null values to 0 for test'
select current_timestamp
set rowcount 10000
while (1=1)
begin
update dbo.a_very_large_table set
test=0
where
test is null
if @@rowcount=0
break
end
select current_timestamp
print 'done'
go
set rowcount 0
go
-- Row count 10000 (Not so good)
-- Start: 2014-02-12 10:44:29.550
-- End: 2014-02-12 10:45:41.753
-- Row count 100000 (Better)
-- Start: 2014-02-12 10:29:38.397
-- End: 2014-02-12 10:31:02.823
-- Row count 200000 (Even better)
-- Start: 2014-02-12 10:41:50.973
-- End: 2014-02-12 10:42:02.850
-- Row count 300000 (Best) Potential Problem: Larger rowcount will create a large transaction. Potential rollback may take a long time. May block other processes as well.
-- Start: 2014-02-12 10:43:34.890
-- End: 2014-02-12 10:43:52.370
--##### TEST2 Using Cursors ######
USE ServiceBook
go
begin try
alter table dbo.a_very_large_table
drop column test
end try
begin catch
end catch
go
begin try
alter table dbo.a_very_large_table
add test bit
end try
begin catch
end catch
go
set rowcount 0
set nocount on
print 'about to start updating null values to 0 for test using cursor'
select current_timestamp
declare @crsr cursor
set @crsr=cursor local for
select
e.a_very_large_tableid
from
dbo.a_very_large_table e
where
e.test is null
declare @pk int
open @crsr
while (1=1)
begin
fetch NEXT FROM @crsr INTO
@pk
IF @@fetch_status <> 0
break
UPDATE dbo.a_very_large_table SET
test=0
WHERE
a_very_large_tableId=@pk
end
close @crsr
deallocate @crsr
select current_timestamp
print 'done'
go
set rowcount 0
go
-- Cursor method: Killed it after 5 mins.
-- Conclusion1: Rowcount controlled loops are the best way to mass update columns in large tables.
-- Conclusion2: Setting of rowcount should be picked carefully.
-- Larger rowcount will create a large transaction. Potential rollback may take a long time. May block other processes as well.
-- Divide the total# of rows by a number between 30-50, and use that to control the rowcount.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment