HomeОбразованиеRelated VideosMore From: kudvenkat

Part 4 Delete duplicate rows in sql

1002 ratings | 364677 views
Link for all dot net and sql server video tutorial playlists http://www.youtube.com/user/kudvenkat/playlists Link for slides, code samples and text version of the video http://csharp-video-tutorials.blogspot.com/2014/05/part-4-delete-duplicate-rows-in-sql.html In this video, we will discuss deleting all duplicate rows except one from a sql server table. SQL Script to create Employees table Create table Employees ( ID int, FirstName nvarchar(50), LastName nvarchar(50), Gender nvarchar(50), Salary int ) GO Insert into Employees values (1, 'Mark', 'Hastings', 'Male', 60000) Insert into Employees values (1, 'Mark', 'Hastings', 'Male', 60000) Insert into Employees values (1, 'Mark', 'Hastings', 'Male', 60000) Insert into Employees values (2, 'Mary', 'Lambeth', 'Female', 30000) Insert into Employees values (2, 'Mary', 'Lambeth', 'Female', 30000) Insert into Employees values (3, 'Ben', 'Hoskins', 'Male', 70000) Insert into Employees values (3, 'Ben', 'Hoskins', 'Male', 70000) Insert into Employees values (3, 'Ben', 'Hoskins', 'Male', 70000) The delete query should delete all duplicate rows except one. Here is the SQL query that does the job. PARTITION BY divides the query result set into partitions. WITH EmployeesCTE AS ( SELECT *, ROW_NUMBER()OVER(PARTITION BY ID ORDER BY ID) AS RowNumber FROM Employees ) DELETE FROM EmployeesCTE WHERE RowNumber ] 1
Html code for embedding videos on your blog
Text Comments (122)
Aravind Kumar Eriventy (3 years ago)
Grate it works for me, Saved lot of time Thank you very much....
Sql Server (11 months ago)
Try this ... https://youtu.be/yC8pZabO5Sg
Rajeev Kumar Jha (2 years ago)
Yes your 2nd and 3rd records also deleted.keep it up.
kudvenkat (3 years ago)
+aravind kumar eruventy Thanks a bunch for taking your valuable time to give feedback. Means a lot. Glad you found the videos useful. Dot Net & SQL Server videos for aspiring software developers https://www.youtube.com/user/kudvenkat/playlists?view=1&sort=dd If you need videos for offline viewing, you can order them using the link below http://www.pragimtech.com/Order.aspx Code Samples, Text Version of the videos & PPTS on my blog http://csharp-video-tutorials.blogspot.com Tips to effectively use our channel https://www.youtube.com/watch?v=y780MwhY70s Want to receive email alerts, when new videos are uploaded, please subscribe to our channel using the link below http://www.youtube.com/subscription_center?add_user=kudvenkat Please click the THUMBS UP button below the video, if you think you liked them Thank you for sharing these links with your friends Best Venkat
Ani Kos (2 months ago)
Thanks, this helped.
bob waithaka (3 months ago)
Simple and working fine.
kiran kumar (4 months ago)
u r hunting the quires....straighht forward and indirectly ....great
pavani patchipulusu (4 months ago)
sir, can we delete duplicate rows without using cte?
shankar jadhav (5 months ago)
Great work thank you..
Jansi Rani (5 months ago)
Can you explain differences between union and union all, coalesce and isnull, how to optimise sql queries.
Mohammed Rafeeq (6 months ago)
Thanks very good video but My Database is Mysql MaraiaDB which doesnt support ROW_NUMBER() and over clause how do i upgrade to oracle database server without losing data
Dan Rowe (6 months ago)
Thanks for the video. You provided exactly what I needed to get started with my query.
Lester M (7 months ago)
Bojan Terzija (7 months ago)
Thank you wery much !
Giancarlo Herrera (8 months ago)
It works for me, Thanks!
Nirav Shah (8 months ago)
What if ID is identity Column? is there any way we can do this with identity column?
valdy 13 (8 months ago)
This really happening
Rohan Kayande (8 months ago)
I have problem with rk_cte as ( select ROW_NUMBER() over (PARTITION by name order by name ) as result from rk ) delete from rk where result > 1 error :--- Invalid column name 'result'. please check it out i have only one column in my table i.e. name a a b b c c c d like
Tech Info (10 months ago)
Note: #1831 Duplicate index `idauteur_4`. This is deprecated and will be disallowed in a future release. :the solution of this probleme please
Lokeswaran .N (11 months ago)
If the data does not contain ID, then how should be delete?
Nice Informations share here...I also share something...http://bit.ly/2k7v01q
Azman Amir (11 months ago)
"relation "employeecte" does not exist" I got this error in postgres database when I try to delete. Can someone help me. Thank you!
Agent M (4 months ago)
I can select * from CTE but I can't DELETE FROM CTE which is so dumb. why
sowmya mukthavaram (11 months ago)
Can u pls help me how to delete duplicates in prod
Stalin S.P (7 months ago)
Either prod or dev query is same
Mahesh Gupta (1 year ago)
Hi Venkat , I have a question ,I want to keep a track of the duplicate records I am deleting and I want to store that in a table before delete is that possible using CTE ,IF not then what should be the best way to do it in SQL ? Thanks
Semi Kolon (1 year ago)
Palanivelu samudi (1 year ago)
Thanks, very helpful
Jayjay F (1 year ago)
Can’t you just use distinct?
Dan Philip Minguito (1 year ago)
Thank you
Pavan N (1 year ago)
NIKHIL DEVNANI (1 year ago)
can i removes these tuples with the help of self join
darpan waghchawre (1 year ago)
how to load specific record by dts in database ? can you please explan me
Sunil Singh (1 year ago)
what will happen if there is no any ID column in table and how could I remove duplicacy. And partition concept is not clear
saif khan (1 year ago)
how can i download this videos please let me know
Abhinay kanaparthi (1 year ago)
when am trying to execute this am getting an error "invalid object name employees" can u plz explain
BINGOVIDS (1 year ago)
Change the database name, where the table employees is created.
Aravind Reddy (1 year ago)
Brilliant Sir :-) kudos
Explained better in this video about DELETE, TRUNCATE & DROP https://www.youtube.com/watch?v=t2qoqI5TjSk&t=4s
Irwan Hermawan (1 year ago)
Thanks for the tutorial
Eng Hazymeh (1 year ago)
very useful example , have a great day
Alok Tripathi (1 year ago)
Thank you Venkut Sir, I learn a lot from your videos
Dejen Wogayehu (1 year ago)
with clause Not recognized MySQL database , What is the alternative?
Billion Dollars idea (1 year ago)
can you please write a code for the mysql, i have tried a lot on YouTube but nothing helpful
arjun yadav (1 year ago)
Just one thing I don't understand: You're deleting from the EmployeeCTE, but how come it delete the row in the Employee table?
murali subramanian (1 year ago)
What if there is no ID...
sakshi singh rawat (1 year ago)
this is very useful video and understandable.... Thanku so much
Hi Venkat, You are rock star for beginners to learn SQL Without basics there is nothing in every course. With your course and videos about SQL i got placed in Deloitte. In every interview, interviewers never go for advanced level without asking basic level questions. If we are very strong in Basics you can manage and convenience the interviewer with basic knowledge that will create great impact. Your videos are reference for any SQL interview and for every concept. Thank you so much for your help.
Aishwarya Saran (2 years ago)
there are 4 table naming january, febrauary, march and april for salary of 5 employees monthwise, i want result in one table which will tell the sum of all 4months salary with employee name and sum of his salary(jan+feb+march+april) sir, help me with this query.
Devidas Devadig (1 year ago)
Sarvajit Kumar (2 years ago)
you are great
gowtham asohan (2 years ago)
HI, how to remove the duplicate values in listagg fn in sql...
Hiraji Jadhav (2 years ago)
Hi venkat sir recently I attended interview and there interviewer asked me SQL query as there are two tables item and item Rate item table consist of item id Item Name 1. Parker 2. Marie 3. Good Day 4. Monaco item Rate consist of ItemId ItemRate 1. 5 3. 20 and Query was. Select I.iemName from item I left join ItemRate R on I.itemId != R.itemId where R.itemRate>5 what is the output of above query please explain
hareped (2 years ago)
CREATE TABLE Item ( ItemId INT NOT NULL, ItemName nvarchar(25) ) INSERT INTO Item VALUES (1, 'Parker'), (2, 'Marie'), (3, 'Good Day'), (4, 'Monaco'); CREATE TABLE ItemRate ( ItemId INT NOT NULL, ItemRate INT ) INSERT INTO ItemRate VALUES (1, 5), (2, 20); SELECT * FROM Item SELECT * FROM ItemRate Select I.ItemName from Item I left join ItemRate R on I.ItemId != R.ItemId where R.ItemRate>5
suresh kumar (2 years ago)
Hi Hiraji Jadhav, There are 2 things happening here.  1) When Left Join applied with "NOT Equal To" sign, all the itemIDs, except which are equal IDs, in both the tables(1=1 and 3=3) will be in our results. Now, the result looks like the below.  (Query select * from #Item1 I left join #ItemRate R ON I.itemid!=R.itemid) ItemID;  ItemName;  ItemID ItemRate 1 Parker 3 20 2 Marie 1 5 2 Marie 3 20 3 Good Day 1 5 4 Monaco 1 5 4 Monaco 3 20 2) In the 'Where' clause, the query has "R.Itemrate > 5" so, now there will be only 3 items, i.e. Parker, Marie & Monaco, coz Itemrate for them is >5 Hope this helps.
Hiraji Jadhav (2 years ago)
suresh kumar Yup But I want to know how it comes
suresh kumar (2 years ago)
Yeah Hiraji Jadhav, the answer is all but 'Good Day', that is, 'Parker, Marie & Monaco'. I overlooked "Not Equal" Condition in "I.ItemID!=R.ItemID" and framed "Equal" condition in my earlier query.
Hiraji Jadhav (2 years ago)
suresh kumar Appreciate your reply sir but answer is different actually condition is I.itemId != R.ItemId
fisal eljadwi (2 years ago)
thanks a lot for this video, but I have a question. you've deleted rows based on duplicated IDs, in real environment, duplicated rows suppose to be considered on multi columns.. am I right?
Richard Cooper (2 years ago)
WOW thank you so much for this video!!! I always knew how to select on recordsets with duplicates using a distinct statement but had never thought about using a CTE to delete the duiplicates. You'r e genius!! :-)
Ott Miller (2 years ago)
Thank you Kudvenkat
Danilo Sales (2 years ago)
That really helped me! Thx a lot.
Dan Cacovean (2 years ago)
`cool stuff!!!
Rohit Patil (2 years ago)
hi great videos ....can you please share MSBI videos please....
sarbrinder singh (2 years ago)
Doesn't work shows error on WITH
vinod chandak (2 years ago)
CTE will not work when the data volume is huge.In that case,how do we write the code?
Siyamand Rashid (2 years ago)
thanks alot for these all efforts , i have question regarding find duplicate values in most cases the id , is PK and its sequential number which not allow duplicate, so you find duplicate in your example if the name, gender and salary are same but they have different ID ? many thanks
Alex/Nate Tarantino (2 years ago)
Very helpful tutorial! Thank you +kudvenkat!
Hareesh A (2 years ago)
This is a good example. I have a question. Consider Production Database scenario example my table have a million records. it is difficult to find which rows are duplicate. before deleting duplicate rows I wanted to know which rows have the duplicate. How to find duplicate records? Thanks
hareped (2 years ago)
ADIL KEVIN (2 years ago)
JavaScript interview questions I want
shri kant verma (2 years ago)
you are excellent instructor! example was very helpful. Thanks a lot!
TallCoolDrink (2 years ago)
So, when you delete from the CTE, you're actually deleting from the Employee table?
Shanmukh shan (2 years ago)
hi please explain about surrogate key
Shanmukh shan (2 years ago)
great job, nice videos and good explanation........thank u
bcollender (2 years ago)
Not all heroes wear capes! thanks buddy
Afif Khaja (2 years ago)
Mr. Kudvenkat, you are an excellent instructor! The code was very well explained and the example was very helpful. Thanks!
sumit kumar (3 years ago)
Sir I have a table with 6 columns no id and no constraints . I have some data in table like 5 out of those 6 columns are identical . How to delete those data ? Like a columns is flag . SO I have to delete those data which belongs to a particular value of that flag column .
hareped (2 years ago)
Dear Sumit Kumar Grate Question Try this WITH EmployeesCTE AS ( SELECT *, ROW_NUMBER()OVER(PARTITION BY Col1, Col2, Col3, Col4, Col5 ORDER BY Col1) AS RowNumber FROM Employees ) DELETE FROM EmployeesCTE WHERE RowNumber > 1 Do partition by 5 different columns CREATE TABLE Employees ( ID int, FirstName nvarchar(50), LastName nvarchar(50), Gender nvarchar(50), Salary int ) GO INSERT INTO Employees VALUES (1, 'Mark', 'Hastings', 'Male', 60000); INSERT INTO Employees VALUES (1, 'Mark', 'Hastings', 'Male', 60000); INSERT INTO Employees VALUES (1, 'Mark', 'Hastings', 'Male', 60000); INSERT INTO Employees VALUES (2, 'Mary', 'Lambeth', 'Female', 30000); INSERT INTO Employees VALUES (2, 'Mary', 'Lambeth', 'Female', 30000); INSERT INTO Employees VALUES (3, 'Ben', 'Hoskins', 'Male', 70000); INSERT INTO Employees VALUES (3, 'Ben', 'Hoskins', 'Male', 70000); INSERT INTO Employees VALUES (3, 'Ben', 'Hoskins', 'Male', 70000); INSERT INTO Employees VALUES (1, 'Mark', 'Hastings', 'Male', 60000); INSERT INTO Employees VALUES (2, 'Mary', 'Lambeth', 'Female', 30000); INSERT INTO Employees VALUES (3, 'Ben', 'Hoskins', 'Male', 70000); INSERT INTO Employees VALUES (25, 'Mark', 'Hastings', 'Male', 60000); INSERT INTO Employees VALUES (10, 'Mark', 'Hastings', 'Male', 60000); INSERT INTO Employees VALUES (11, 'Mark', 'Hastings', 'Male', 60000); INSERT INTO Employees VALUES (21, 'Mary', 'Lambeth', 'Female', 30000); INSERT INTO Employees VALUES (22, 'Mary', 'Lambeth', 'Female', 30000); INSERT INTO Employees VALUES (30, 'Ben', 'Hoskins', 'Male', 70000); INSERT INTO Employees VALUES (31, 'Ben', 'Hoskins', 'Male', 70000); INSERT INTO Employees VALUES (33, 'Ben', 'Hoskins', 'Male', 70000) WITH EmployeesCTE AS ( SELECT *, ROW_NUMBER()OVER(PARTITION BY LastName, FirstName, Gender, Salary ORDER BY LastName) AS RowNumber FROM Employees ) DELETE FROM EmployeesCTE WHERE RowNumber > 1 SELECT * FROM Employees
vaibhav vishe (3 years ago)
Not delete duplicates if id is only duplicate in column and other row not duplicate.. Row is duplicate only when its all the column value matching to other, but if only one column is duplicate and other are not then it should not get delete. But here happening same.. So pls check once again by taking id value only duplicate and name and other column not duplicate
sweet yogurt (2 years ago)
hi, I think query needs correction. If 2 rows has same "ID" and different other column values still it will get deleted. Correction: Add all column names in partition by clause
Tanmay Patil (3 years ago)
Thanks so much!
Abhishek Maurya (3 years ago)
Great man your all the videos are Awesome Hats off kudvenkat
Ashwini Bandgar (3 years ago)
Hello Sir, I am trying to write same query using subquery .But it gives error. SELECT * FROM (SELECT *,ROW_NUMBER()OVER(PARTITION BY ID ORDER ID)AS RowNumber FROM Employees) result DELETE FROM result WHERE RowNumber >1. It gives error: Invalid object name 'result' Will you please help where I am going wrong?? Thanks, Ashwini B
Dibyajyoti Chowdhury (2 years ago)
. A subquery exists only for a single statement not for multiple. For multiple statement execution temp tables or CTE is used.
El Cubano (3 years ago)
+Ashwini Bandgar Use Results instead of Result and it will work.
Kris Maly (3 years ago)
Revisiting to refresh my memory. I enjoyed watching this video and recommend others. Thanks for educating the community and appreciate all your efforts
Mks (3 years ago)
great effort in all video.....very clear explanation
Sql Server (11 months ago)
Try this ... https://youtu.be/yC8pZabO5Sg
Elio Rivas (3 years ago)
hey bro, i have a table called pacgold and colums `_url`, `_position`, `empresa`, `objetivo`, `ubicacion`, `email`, `numero`, `website` how can i do it? i'm using phpmyadmin
William Xu (3 years ago)
Just one thing I don't understand: You're deleting from the EmployeeCTE, but how come it delete the row in the Employee table?
Sql Server (11 months ago)
Try this ... https://youtu.be/yC8pZabO5Sg
Rahul Kohli (1 year ago)
also , another question can be , what if the CTE joins two tables and user decides to delete row/rows from that CTE, in this case SQL server will give an error.
Rahul Kohli (1 year ago)
Thats a good question, now when working with CTEs you have to understand they are not normal tables but just 'table expressions', what that means is CTE is created over an already existing normal table, in other words , the CTE's are always working over underlying table. so any modification rows of CTEs are actually performed on underlying tables.
Shubham Kumar (1 year ago)
William Xu CTE is an inbuilt function. Common Table Expression. It doesn't affect table name.
arjun yadav (1 year ago)
same doubt to me also....have u got answer for this kindly share......
Manish Vasuki (3 years ago)
awesome,thanks a lot for your explanation.Can u post the video for lock in sql server.
Anuj Pal (3 years ago)
As i run this command is shows erros like-Recursive common table expression 'Tbl_DeleteMulRow' does not contain a top-level UNION ALL operator
Kris Maly (3 years ago)
Awesome  Q & A Revisiting I enjoyed watching this video and recommend others to watch Thanks a lot
Mohit Chawla (4 years ago)
Good Job!!  But in this case we partitioned on the bases of ID that was same for the records with the same values ,,, What if the values of records are same but the id's are unique and we want to remove them  and we don't have any such kind of column so that we can partition correctly .
George Roman (4 years ago)
You are great!!! Thanks a lot,  I save a lot of time of my life, thanks to you..... :)
Adlai King (4 years ago)
This is a very interesting video, keep up good job.
Anil Vemana (4 years ago)
sir , we can directly use '' Select Distinct * from Employee ''  instead of Using CTE .  using distinct is easy compare to CTE i think. what is your opinion. give me reply. thanks
Anil Vemana (2 years ago)
Thanks fur your comments
missyneon999 (3 years ago)
+Anil Vemana That's what i was thinking at first, but i think CTE is better if you have a result set of rows containing different rows of the same id but different records, i.e. rows with same id value but different salary values. Distinct won't work in this case cos it is used to delete duplicate rows that are completely identical, i.e. with identical column values. So, CTE will work if you use either row_number or denserank ranking functions, depending on your requirements.
itsme4you (4 years ago)
Does it work on Oracle? Oracle CTE is only supporting SELECT statement. It throws error when I attempt to delete rows from CTE.
Saurabh Singh Maurya (3 months ago)
I also tried the same in Oracle for delete it throughs error., please let know if you got any solution,
Kris Maly (4 years ago)
Harambe (4 years ago)
thanks, was very helpful.
Kris Maly (4 years ago)
Good video
Girijesh Kumar (4 years ago)
thanks sir.........
ola Odusanya (4 years ago)
Many thanks Venkat,you are a genius! Bless you.Cheers
Pradeep Kumar (4 years ago)
Difference between the Stored Procedures and Functions? Can you please record this video and upload. I face this question many times....  Thanks in advance.......!
#Stuti# #Tehri# (4 years ago)
Saagar Soni (4 years ago)
thnx a lot sir
Joe S (4 years ago)
Great video, thanks!! Question: in the video, you are assuming that you know that there are duplicate rows in the table.. How can we alter the query to check first for duplicate rows? Meaning, based on your example, what if we had rows with same value for ID but different values for other columns.. In that case they would not be duplicates.. How to handle such scenario? Thanks again
KSREDDY K (4 years ago)
+kudvenkat By using physical location(%%PHYSLOC%%) can we find which row inserted first and which one is new up to now in a table ...,so that based can we delete latest or else old records based on requirement??? is there any possibility like that........,
Mr Irrepressible (4 years ago)
you could use this query to check for duplicate rows (you'll need to alter the column name to suit your table): SELECT UserName,  COUNT(UserName) AS NumOccurrences FROM MyTable GROUP BY UserName HAVING ( COUNT(UserName) > 1 )
kudvenkat (4 years ago)
Very good question. As a developer my next question will be what are the conditions then for considering the rows to be duplicate, based on which the query logic depends. Hope this makes sense to you.
Pradeep Kumar (4 years ago)
Thank You very much for your response in uploading this video sir.........
Khalid Afridi (4 years ago)
Excellent solution to delete the duplicate data using partition function. You are awesome dear, not only this one but all the other videos specially the SQL are fabulous and very much informative and very easy to understand. We learn a lot from your video series. Be blessed and thanks alot.  

Would you like to comment?

Join YouTube for a free account, or sign in if you are already a member.