• Remove duplicates based on two fields

    • This topic has 11 replies, 4 voices, and was last updated 13 years ago.
    Author
    Topic
    #482883

    I know that I easily can remove duplicates based upon one field’s records, and that I can do so with respect to more than one field at one time. However, I want to remove duplicates if, for example, A1=A2 and B1=B2. In that case, I want record 1 or 2 removed. I attached a spreadsheet as an example. In the example, the duplicates would be records 2 and 4, so I want one those records removed. I think that I could do this through a couple of queries, but I’m looking for something easier. Thanks!

    Viewing 5 reply threads
    Author
    Replies
    • #1330147

      Jimmy,

      This code should solve your problem.

      Code:
      Option Explicit
      
      Sub RemoveDups()
      
        Dim lCurRow        As Long
        Dim lTestRowOffset As Long
        
        lCurRow = 2
        
        Do
        
          lTestRowOffset = 1
       
          Do
          
            If Cells(lCurRow, 1).Value = Cells(lCurRow + lTestRowOffset, 1).Value And _
               Cells(lCurRow, 2).Value = Cells(lCurRow + lTestRowOffset, 2).Value Then
               
               Rows(lCurRow + lTestRowOffset).EntireRow.Delete
               
            Else
               lTestRowOffset = lTestRowOffset + 1
            End If
          
          Loop Until Cells(lCurRow + lTestRowOffset, 1).Value = ""
          
          lCurRow = lCurRow + 1
          
        
        Loop Until Cells(lCurRow, 1).Value = ""
        
      End Sub
      

      Please note that I solved the problem as indicated. If the data had been sorted first the code could be a little more efficient but as it stands you have to loop through the entire data set for each itteration to make sure you have all the duplicates. Of course if your dataset {table} isn’t too big this isn’t a real problem. :cheers:

      Test Data
      30689-TestData
      Results
      30690-Results

      May the Forces of good computing be with you!

      RG

      PowerShell & VBA Rule!
      Computer Specs

    • #1330148

      Thanks! If I may impose with a couple of questions, can I edit the code if the two fields are not contiguous, e.g., A2 and C2? Would it then be 2TestRowOffset? I also wonder about date/time granularity. Two times may be formatted to appear as 13:10, but really may be 13:10:45 and 13:10:55, etc. Does the code use the formatted time or the actual numeric time? (My worksheet is sorted by date/time, descending order.)

    • #1330181

      Jimmy,

      1. Can I edit the code if the two fields are not contiguous, e.g., A2 and C2? Yes just change the Column Numbers, they are the hard coded numbers. e.g. A=1 B=2 C=3 thus A2 & C2 would be

      Code:
            If Cells(lCurRow, 1).Value = Cells(lCurRow + lTestRowOffset, 1).Value And _ 
               Cells(lCurRow, 3).Value = Cells(lCurRow + lTestRowOffset, 3).Value Then 

      2. Does the code use the formatted time or the actual numeric time? Actual Time
      Although the code could be modified to use formatted time.

      Code:
        
          If Format(Cells(lCurRow, 1).Value, "hh:mm") = _
             Format(Cells(lCurRow + lTestRowOffset, 1).Value, "hh:mm") And _
             Cells(lCurRow, 2).Value = Cells(lCurRow + lTestRowOffset, 2).Value Then
      
      

      :cheers:

      May the Forces of good computing be with you!

      RG

      PowerShell & VBA Rule!
      Computer Specs

      • #1330312

        Hi
        You can solve this without having to resort to VBA.
        In C2 enter
        =A2&”|”&B2
        In D2 enter
        =COUNTIF($C$2:C2,C2)>1
        Copy both down as far as required

        Filter on column D for TRUE and delete the selected rows

        Regards
        Roger Govier
        Microsoft Excel MVP

    • #1330445

      Roger,

      Nice! However the VBA is more robust since it will work if the records are sorted or not. I tried your solution on my test data and it didn’t get the right answer. Then I remembered the OP has said his data was sorted by date/time decending, my test data was not. Upon sorting your solution worked perfectly and is an interesting approach to the problem.

      I notice you’re a fairly new poster here so let me take this opportunity to welcome you to the Lounge as a poster. It’s always nice to have more talent, especially MVPs, to add new perspectives to the Lounge’s talent base. :cheers:

      May the Forces of good computing be with you!

      RG

      PowerShell & VBA Rule!
      Computer Specs

      • #1330465

        Hi
        Many thanks for the welcome.

        However, I fear you are incorrect in your assertion regarding sorting.
        The formula solution will work with either sorted or unsorted data, as the attached workbook shows. I added some more data to jumble the order up a bit.

        The formula in D2
        =COUNTIF($C$2:C2,C2)>1
        which becomes
        =COUNTIF($C$2:C14,C14)>1
        by the time it gets to D14
        is not concerned with any order, or referencing the row above, it is doing a Countif on an ever expanding range to see if there is more than one occurrence of the concatenated value.

    • #1330469

      Roger,

      I’m still getting the same results. :confused:
      Here’s my test workbook with both your formula and my macro, maybe you can see what I am doing wrong. :cheers:

      May the Forces of good computing be with you!

      RG

      PowerShell & VBA Rule!
      Computer Specs

      • #1330500

        Hi

        You have the wrong formula in D2

        You have
        =COUNTIF($C$2:C2,C1)>1
        It should be
        =COUNTIF($C$2:C2,C2)>1

        • #1330568

          If you’re using Excel 2007 or greater you can use the remove duplicates function once you’ve created a key field through concatenation (so the count formula is unecessary).

          FWIW it’s not necessary to delimit the fields (in your concatenated key field) as this is a reference field only, ie

          In C2 enter
          =A2&B2

          note that it can easily be extended to cover as many fields as necessary (and they don’t need to be contiguous)

          and the count field should cover the entire column so that the data doesn’t have to be sorted

          In D2 enter
          =COUNTIF(C:C,C2)

          • #1330579

            Hi Simon

            You are quite right that there is a remove duplicate function in later versions, I was trying to give a generic solution.
            With this particular set of data you are correct, a separator is not required in the concatenation.
            However, I do it as a matter of course with all concatenations, because there are occasions when you can get tripped up with different numbers of characters in the concatenated fields.

            On the matter of the Countif function, however, you are totally wrong.
            Your formula will count the number of occurrence in the whole data set, and that would not provide a list of uniques.
            You would not be able to select the rows you wanted to delete, as it would also contain the first occurrence of the value.

            The formula I proposed, does not need to be sorted.

    • #1330682

      Just to keep you all updated Roger & I had taken our discussion offline to try to resolve why we were getting different results.
      It turns out it was due to my final version of code checking the time in HH:MM format vs his solution checking in HH:MM:SS format. Roger offered the following to make his do the same thing.

      [noparse]=TEXT(A2,”hh:mm”)&”|”&B2[/noparse]

      I’d like to take this opportunity to thank Roger for hanging with me through this and discovering the problem, and of course for his elegant solution.:clapping:

      Thanks to him we all now have a new technique to add to our Excel toolbox. :cheers:

      May the Forces of good computing be with you!

      RG

      PowerShell & VBA Rule!
      Computer Specs

    Viewing 5 reply threads
    Reply To: Remove duplicates based on two fields

    You can use BBCodes to format your content.
    Your account can't use all available BBCodes, they will be stripped before saving.

    Your information: