Hi all,
I have 20,000+ rows of data in multiple cols. I need to be able to occasionally go through this data and delete duplicates.
In this case a duplicate is defined as any occurrence of the same data in more than one row. In any given row, the data can occur more than once in one or more columns, however.
Example (3 data rows)
This is OK.
AA, BB, AA, DD
EE, KK, LL, MM
ZZ, FF, FF, WW
This is NOT ok: (3 data rows)
AA, BB, AA, DD
EE, KK, LL, AA
ZZ, FF, FF, AA
Rows 2 and 3 contain data that already exist in previous rows (AA is found in row 1). These last two rows would get deleted.
I think I need VBA here, not a formula. It’d have to be some triply nested loop I think. One for the whole data set, one within sequencial dupes in one column, and then one to check all columns for a pair of cells. The code would also have to keep track of the “used” data so that it searches through the list of previous data and if any matches are found (for the new rows) that row is deleted.
This is more than a search/replace since there is no known data to search for (it’s any duplicate data).
I’ll start working on the algorithm now but wanted to post this to the group in case someone comes up with a good methodology. I’ll post what I came up with when I get it working.
Thnx, Deb