Topic: deleting duplicate data in many rows (Excel 2000) @ AskWoody

deleting duplicate data in many rows (Excel 2000)
Home » Forums » AskWoody support » Productivity software by function » Visual Basic for Applications » deleting duplicate data in many rows (Excel 2000)
- This topic has 12 replies, 5 voices, and was last updated 23 years ago.
Author

Topic
New Reply

WSjujuraf
AskWoody Lounger

June 11, 2002 at 9:47 pm #372163

Hi all,
I have 20,000+ rows of data in multiple cols. I need to be able to occasionally go through this data and delete duplicates.

In this case a duplicate is defined as any occurrence of the same data in more than one row. In any given row, the data can occur more than once in one or more columns, however.

Example (3 data rows)
This is OK.
AA, BB, AA, DD
EE, KK, LL, MM
ZZ, FF, FF, WW

This is NOT ok: (3 data rows)
AA, BB, AA, DD
EE, KK, LL, AA
ZZ, FF, FF, AA

Rows 2 and 3 contain data that already exist in previous rows (AA is found in row 1). These last two rows would get deleted.

I think I need VBA here, not a formula. It’d have to be some triply nested loop I think. One for the whole data set, one within sequencial dupes in one column, and then one to check all columns for a pair of cells. The code would also have to keep track of the “used” data so that it searches through the list of previous data and if any matches are found (for the new rows) that row is deleted.

This is more than a search/replace since there is no known data to search for (it’s any duplicate data).

I’ll start working on the algorithm now but wanted to post this to the group in case someone comes up with a good methodology. I’ll post what I came up with when I get it working.

Thnx, Deb

Reply | Quote

Viewing 1 reply thread
Author

Replies
- H. Legare Coleman
  AskWoody Plus
  
  June 11, 2002 at 10:16 pm #593947
  
  A couple of questions:
  
  1- How many columns are involved in the check for duplicates (it looks like four from your example)?
  
  2- What should happen in this case:
  
  AA, BB, AA, DD
  EE, KK, LL, AA
  ZZ, FF, FF, DD
  EE,GG,GG,HH
  
  Reply | Quote
- WSjujuraf
  AskWoody Lounger
  
  June 11, 2002 at 10:31 pm #593948
  
  The # of cols is not fixed, but I was going to assume that either the source data is the current worksheet or a selected range before starting the search. I’d then just query for the # of cols that contain data for the known rows.
  
  I was thinking of first making a pass through the data to gather all the possible data and store them in a collection (which would by definition get rid of dups for purpose of searching). This collection would be my compare list when I’m checking if any given data for a cell has already been found. If yes, I’d increment a counter by 1 and that would tell me that the data has already occured. I was going to create a user defined data type to store in the collection (e.g., DataFound as variant, DataCount as integer, DataRow as long). If I’m still on the same row, DataCount can be > 1 as long as DataFound has not already been found in a previous row. (I know this because DataCount = 1 and DataRow current row for the active cell.) This is quite confusing I know.
  
  To answer your 2nd question, rows 2,3,4 would be deleted since they all contain at least one data element that exists in a previous row. (For row 2, AA appears in row 1; for row 3, DD appears in row 1; for row 4, EE appears in row 2).
  
  I haven’t started this algorithm yet as I’m working on something else but will try it later tonight.
  
  Thanks for looking into this,
  Deb
  
  Reply | Quote
  
  WSjscher2000
  AskWoody Lounger
  
  June 12, 2002 at 2:03 am #593992
  
  This is some strange data. If you don’t care what row the data is (was) in, or how it was grouped with anything else, you might take each non-empty cell, copy it to a single column on a separate sheet, sort that, and then the work of killing duplicates should be simple.
  
  Reply | Quote
  
  H. Legare Coleman
  AskWoody Plus
  
  June 12, 2002 at 3:04 am #594003
  
  But if 2,3 and 4 should be deleted, then EE would not appear in the results. Is that what you want? Or, would row four not be deleted because row two got deleted before row four is checked?
  
  Reply | Quote
  
  WSjujuraf
  AskWoody Lounger
  
  June 12, 2002 at 6:27 pm #594246
  
  Ok, I now know what I’m talking about
  
  I talked to the user and I was making it too complicated. They need to delete duplicate rows where ALL the cells in a given row match 100% of the cells in any previous row.
  
  They tried doing it with MapPoint and it worked but took 30 minutes for each data set (which contains 100,000 data ponts) and there are 120 data sets. (It was on a slow 400MHz PC with 15GB disk).
  
  So, back to the beginning…If my data is
  
  row1: AA, BB, AA, DD
  row2: AA, DD, CC, AA
  row3: BB,CC,AA, DD
  row4: AA, DD, CC, AA
  
  only row 4 would be deleted.
  
  Now I wonder if there is a formula to do this as well. A VBA solution would be fine too.
  
  Thanks all for the time spent to figure this one out.
  Deb
  
  Reply | Quote
  
  H. Legare Coleman
  AskWoody Plus
  
  June 12, 2002 at 7:06 pm #594277
  
  OK, try this code:
  
  Public Sub DeleteDeptNums() Dim lLastRow As Long, lLastCol As Long, I As Long, J As Long, K As Long lLastRow = Worksheets("Sheet1").Range("A65536").End(xlUp).Row - 1 lLastCol = Worksheets("Sheet1").UsedRange.Columns.Count - 1 For I = 0 To lLastRow - 1 For J = lLastRow To I + 1 Step -1 For K = 0 To lLastCol If Worksheets("Sheet1").Range("A1").Offset(I, K).Value _ Worksheets("Sheet1").Range("A1").Offset(J, K).Value Then Exit For End If Next K If K > lLastCol Then Worksheets("Sheet1").Range("A1").Offset(J, 0).EntireRow.Delete End If Next J Next I End Sub
  
  Reply | Quote
  
  WSSammyB
  AskWoody Lounger
  
  June 13, 2002 at 10:20 am #594445
  
  Deb, you don’t need VBA to do this, just use the Data | Filter | Advance Filter menu and use a blank criteria and check the box “Unique records only.” HTH –Sam
  
  Reply | Quote
  
  WSjujuraf
  AskWoody Lounger
  
  June 13, 2002 at 4:08 pm #594594
  
  Hey that actually works! This is great. I obviously have never look at that menu before. Thanks for filling me in.
  
  Deb
  
  Reply | Quote
  
  WSjujuraf
  AskWoody Lounger
  
  June 13, 2002 at 7:10 pm #594639
  
  Oopps, I spoke too soon or I’m not using this feature right.
  
  I entered this data in A1:D7
  AA BB AA DD
  DD DD AA BB
  AA BB AA DD
  ZZ HH KK LL
  DD DD AA BB
  
  I selected the area, did Data/Filter. Advanced Filter. I checked ‘unique records’ only and did the blank criteria (it prompts for a cell so I selected an empty cell).
  
  I got this:
  AA BB AA DD
  DD DD AA BB
  AA BB AA DD
  ZZ HH KK LL
  
  So it did not remove duplicates, I still have two copies of “AA BB AA DD”
  
  What did I do wrong?
  
  AAH, I did it again and found out it ONLY works on numbers, not text!!! I must have headers on each column.
  
  When I changed the data to numbers with text labels it worked:
  
  A, B, C, D <—- labels
  11,22,33,44
  44,33,11,22
  11,22,33,44
  77,88,99,77
  11,22,33,44
  77,88,99,77
  
  This created:
  11,22,33,44
  44,33,11,22
  77,88,99,77
  
  which is correct. My data is a mix of text and numbers so I'll have to figure a way to turn it into numbers.
  
  This all helps though… At least I have a start on the solution (and there's always the code another lounger came up with).
  
  Deb
  
  Reply | Quote
  
  WSSammyB
  AskWoody Lounger
  
  June 13, 2002 at 7:39 pm #594651
  
  It works on everything, it’s just tough to figure out the first time. To make it easier, we will use two sheets, one for the original data and one for the criteria.
  
  First, make sure that your data has a row of headers, one for each column. You can just make them up: C1, C2, etc. if you want.
  
  Next, copy the row of headers and paste it on sheet 2, cell A1.
  
  Now, you are ready to do the Advanced filter. Click in the original table and do Data | Filter | Advanced Filter. Excel tries to read your mind, so hopefully it will already have you original data selected and entered in the list range. If not, enter it there.
  
  Now for the criteria range, select the first row of headers and a blank row beneath the headers. Finally, check the “unique records only,” and press OK.
  
  Sometimes, Excel cannot figure out your header row, if you get this alert, just press OK to use the first row as headers.
  
  Since this was done in place, Excel has hidden the duplicate rows, so, if it looks OK, you can do Edit | GoTo | Special, Visible cells only, copy this data and paste it on another worksheet. If you don’t like the filtered data, you can do a Data | Filter | Show All. HTH –Sam
  
  Reply | Quote
- macropod
  AskWoody_MVP
  
  June 12, 2002 at 5:52 am #594032
  
  Hi Deb,
  See if this meets your needs. It simply trawls through the active sheet finding rows with duplicates and deleting them. I’ve also included an alternative for deleting just the duplicate cells, in case that’s what you’re after.
  
  Sub ClearDuplicateRowsInColumns()
  
  Dim A As Long
  Dim B As Long
  Dim C As Range
  Dim D As Range
  
  On Error GoTo Abort
  Application.ScreenUpdating = False
  Application.Calculation = xlCalculationManual
  
  Set D = ActiveSheet.UsedRange.Columns
  Set C = ActiveSheet.UsedRange.Rows
  For B = D.Columns.Count To 1 Step -1
  For A = C.Rows.Count To 1 Step -1
  If Application.WorksheetFunction.CountIf(Range(Rows(1).Columns(1), Rows(A).Columns(), C.Rows(A).Columns() > 1 Then
  ‘If you only want to delete the offending cell, use: If Application.WorksheetFunction.CountIf(Columns(, C.Rows(A).Columns() > 1 Then
  C.Rows(A).EntireRow.Delete
  ‘If you only want to delete the offending cell, use: Rows(A).Columns(.Delete Shift:=xlUp
  End If
  Next A
  Next B
  
  Abort:
  Application.ScreenUpdating = True
  Application.Calculation = xlCalculationAutomatic
  End Sub
  
  Cheers
  
  Cheers,
  Paul Edstein
  [Fmr MS MVP - Word]
  
  Reply | Quote
- WSjujuraf
  AskWoody Lounger
  
  June 12, 2002 at 6:28 pm #594247
  
  Hey I like that! Cool. I did try it and it seemed to work but now I find that my requirement is not as complicated as I had thought. See my updated post which says only the entire row needs to be a duplicate of a previous row before it’s deleted.
  
  Thanks a lot. Deb
  
  Reply | Quote
Viewing 1 reply thread

Reply To: Reply #594594 in deleting duplicate data in many rows (Excel 2000)
You can use BBCodes to format your content.
Your account can't use all available BBCodes, they will be stripped before saving.

Your information:
Name (required):

Mail (will not be published) (required):

Website:

Cancel

Plus Membership

Donations from Plus members keep this site going. You can identify the people who support AskWoody by the Plus badge on their avatars.

AskWoody Plus members not only get access to all of the contents of this site -- including Susan Bradley's frequently updated Patch Watch listing -- they also receive weekly AskWoody Plus Newsletters (formerly Windows Secrets Newsletter) and AskWoody Plus Alerts, emails when there are important breaking developments.

Welcome to our unique respite from the madness.

It's easy to post questions about Windows 11, Windows 10, Win8.1, Win7, Surface, Office, or browse through our Forums. Post anonymously or register for greater privileges. Keep it civil, please: Decorous Lounge rules strictly enforced. Questions? Contact Customer Support.

deleting duplicate data in many rows (Excel 2000)

Plus Membership

Search Newsletters

Search Forums

View the Forum

Search for Topics

Recent Topics

Recent blog posts

My Profile

Key Links

Remembering Woody

deleting duplicate data in many rows (Excel 2000)

Plus Membership

Search Newsletters

Search Forums

View the Forum

Search for Topics

Recent Topics

Recent blog posts

My Profile

Login and Registration

Key Links

Remembering Woody