Friday, December 9, 2011

Deleting Duplicate Records from a Table

Many Time DBA faces a situation , in which they have to remove duplicate records from a table. This is somewhat difficult but we have some methods  that will make it somewhat easier to resolve.


Method 1:
Delete all rowids that is BIGGER than the SMALLEST rowid value (for a given key):
SQL> DELETE FROM table_name A 
  2  WHERE ROWID > ( SELECT min(rowid) 
  3                  FROM table_name B
  4                  WHERE A.key_values = B.key_values );
Method 2:
This method is usually faster. However, remember to recreate all indexes, constraints, triggers, etc. on the table when done.
SQL> create table table_name2 as select distinct * from table_name1;
SQL> drop table table_name1;
SQL> rename table_name2 to table_name1;
Method 3:
SQL> DELETE FROM my_table t1
  2  WHERE EXISTS ( SELECT 'x' FROM my_table t2
  3                 WHERE t2.key_value1 = t1.key_value1
  4                   AND t2.key_value2 = t1.key_value2
  4                   AND t2.rowid      > t1.rowid );
Note: One can eliminate  unnecessary operations by creating an index on the joined fields in the inner loop (no need to loop through the entire table on each pass by a record). This will speed-up the deletion process.
Note 2: If you are comparing NULL columns, use the NVL function. Remember that NULL is not equal to NULL. This should not be a problem as all key columns should be NOT NULL by definition.
Method 4:
This method collects the first row (order by rowid) for each key values and delete the rows that are not in this set:
SQL> DELETE FROM my_table t1
  1  WHERE rowid NOT IN ( SELECT min(rowid) 
  2                       FROM my_table t2
  3                       GROUP BY key_value1, key_value2 );
Note: IF key_value1 is null or key_value2 is null, this still works correctly.
Method 5:
This method is the easiest way to remove duplicate records from a table. But this is not feasible method for large table
1) Put duplicates in a temp table.
 
Create Table table_dups as Select * From evil_table
Group  By col1, col2, col3.... Having count(*) > 1;
2) Remove duplicates from table.
Delete From evil_table where  (col1, col2, col3...) in (Select col1, col2,
col3  From   table_dups);

3) Put duplicate rows back.
Insert into evil_table (col1, col2, col3...) Select col1, col2, col3...
From   table_dups;



No comments:

Post a Comment

Followers