Skip to content

Instantly share code, notes, and snippets.

@afruzan
Created May 24, 2023 06:36
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save afruzan/6bfc87726047e547c4079add99f09101 to your computer and use it in GitHub Desktop.
Save afruzan/6bfc87726047e547c4079add99f09101 to your computer and use it in GitHub Desktop.
hints about database cleanup job to delete old records from time-series table

اولا که بهترین روش که ایده خودم هست استفاده از partitioning در دیتابیس هایی که ممکن باشه هست ولی اگر ممکن نبود (مانند sqlite) باید به کارایی دقت زیادی کرد. در این موارد اگر جدول مربوطه فایل های وابسته حجیمی داشته باشه یه راهکار اینه ابتدا از حذف منطقی برای حذف کردن فایل های و تصاویر وابسته استفاده کرد و در فواصل زمانی بیشتر زمانی که کار سیستم سبک تره حذف واقعی انجام داد.

به هر حال برای انجام فرآیند حذف موارد زیر مهم اند:

If your query affects 5000 rows or more in the same table, that table gets locked during the operation. This is standard SQL Server behavior. Basically every DELETE causes a lock on that row and every 5000 row locks on the same table cause a Lock Escalation from row to table.

declare @MoreRowsToDelete bit
set @MoreRowsToDelete = 1
while @MoreRowsToDelete = 1
begin
    delete top (1000) MyTable from MyTable where MyColumn = SomeCriteria
    if not exists (select top 1 * from MyTable where MyColumn = SomeCriteria)
        set @MoreRowsToDelete = 0
end

One note is that you need to index the column by which you are searching the table for rows to delete. If you do not have an index, then SQL Server will acquire many more locks while it searches for the rows to delete.

When you delete a large number of records several things might happen.

First you may have cascade delete turned on in which case deleting 10000 records from the main table could result in deleting millions of records when all the child tables are taken into account. Even without cascaded delte, if there are a lot of child tables, they all have to be checked to make sure there are no records before the delete can take place.

Further a delete like this might escalate the locking on the table so that no one else can use the table while the delete is taking place. This is even more of a problem when using cascade delete as the process will take longer and many more tables might get locked.

Another thing that might cause a large delete to be especially slow is if there are triggers on the table, especially triggers that have some sort of looping in them.

In general this technique is used when the delete for a large number of records would take too long and cause blocking in the meantime. You go by batches because that is generally faster than one row at a time.

This is not a techinque you need to use every time you delete a batch of records, if the delete would take a few seconds in a table with no child records and/or no triggers, there is no need. If the table is not one that others would normally be accessing simultaneously with the delete (such as a job that does the deletes during off hours), you would not need to use the technique.

It should be mentioned that deleting everything at once is fastest most of the time. The concerns you mentioned are valid, though. but Fastest is good most of the time except when it causes blocking. In that case it is better to have a slower process that allows others to work.

As well as preventing lock escalation keeping the batches small also can prevent the transaction log needing to grow if the database is in simple recovery mode.

https://stackoverflow.com/q/20499270/4983201 https://stackoverflow.com/a/19406437/4983201

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment