• Macro to Remove Duplicate Entries (MS Word 2003)

    Home » Forums » AskWoody support » Productivity software by function » MS Word and word processing help » Macro to Remove Duplicate Entries (MS Word 2003)

    • This topic has 21 replies, 2 voices, and was last updated 20 years ago.
    Author
    Topic
    #419628

    I receive documents that create many duplicated entries. There are a lot of “garbage” at the top of the document so I have to be at the start of where the entries begin.

    The original document is in RTF format and I import it into Word. I am using MS Word 2003.

    I work with 3 documents. Doc 1 is the original, Doc 2 is the new document that will be checked against it, and then Doc 3 is where the duplicated entries are placed. I manually make sure that the entries that are duplicated on Doc 2 are placed at the end of Doc 3 so that I can keep track of how many times the same thing is reported (I sort the entries after they are in). I then copy Doc 2 into Doc 1 with the duplicated entries removed.

    I am looking for a macro to remove the duplicated entries from Doc 2 and place it at the end of Doc 3. It takes an awful long time to go through all the hundreds of pages that I sometimes get. After a while I do get tired and miss entries and that is definitely a nuisance. (I cannot sort Doc 1. The new entries are placed at the end of Doc 1.) That

    Viewing 0 reply threads
    Author
    Replies
    • #948093

      This sounds like a problem best addressed with an RTF parser (or at least an RTF tokenizer), but I’m assuming you already know that, and have reasons for asking for what you’re asking for (If you do want any tips on finding a good RTF tokenizer, let me know). Clearly, these are RTF command sequences in your document.

      It was easiest to split this into three separate macros (mostly to make up for functions missing in VBA), but it’s still fairly concise. I wasn’t sure how you wanted to handle the final output, but the result of the main macro is (1). Duplicated entries are removed, and (2). Duplicate entries are stored in an array, ready to be inserted wherever you need them. I commented out a simple example; you may need to modify it to suit your needs.

      Sub RemoveDupEntries()
      Dim para As Paragraph
      Dim doc As Document
      Dim vEntries() As Variant
      Dim vDupEntries() As Variant
      Dim sParaText As String
      ReDim vEntries(0)
      ReDim vDupEntries(0)
      Set doc = ActiveDocument
      For Each para In doc.Paragraphs
          sParaText = Left(para.Range.Text, para.Range.Characters.count - 1)
          If IsMember(vEntries, sParaText) Then
              If IsMember(vDupEntries, sParaText) = False Then
                  Push vDupEntries, sParaText
              End If
              para.Range.Delete
          Else
              Push vEntries, sParaText
          End If
      Next para
      ' Now vDupEntries contains any items that were duplicated.
      ' You can insert it at the end of doc 3 as needed
      ' For example:
      ' Documents("Doc3").Content.InsertAfter Join(vDupEntries, vbCr)
      End Sub
      '
      Function Push(ByRef vArray As Variant, ByVal str As String)
        ReDim Preserve vArray(UBound(vArray) + 1)
        vArray(UBound(vArray)) = str
      End Function
      '
      Function IsMember(vArray As Variant, ByVal str As String)
      Dim v As Variant
      For Each v In vArray
          If v = str Then
              IsMember = True
              Exit Function
          End If
      Next v
      IsMember = False
      End Function
      
      • #948156

        Thank you so much for the SUPER fast reponse.

        The macro runs but it keeps bringing the dialog box

        • #948179

          Doc3 would need to be open when you run the macro as posted. Otherwise, you’ll have to explicitly open it:

          Dim doc3 as Document
          Set doc3 = Documents.Open("C:My DocumentsSomePathDoc3.doc")
          doc3.Content.InsertAfter ...' and so forth
          
          • #948240

            I did change the macro as you suggested:

            Dim doc3 As Document
            Set doc3 = Documents.Open(“C:Documents and Settings……Doc3.doc”)
            doc3.Content.InsertAfter Join(vDupEntries, vbCr)

            The macro opens “Doc3.doc” but the result is a blank document.

            If I use the original instead of the above (with the document open):
            Documents(“Doc3”).Content.InsertAfter Join(vDupEntries, vgCr)
            I still get nothing on Doc3.

            So, both ways I am getting a blank Doc3.

            I even created two other documents with with some identical entries to make sure that I was not doing something wrong, but still I get a blank Doc3.

            I know I am missing something.

            Thank you again.

            • #948249

              Try recording a macro of yourself opening the correct Doc3, then take a look at the recorded macro to inspect the path that Word recorded.

            • #948319

              This is what I have in that section of the program:

              Dim doc3 As Document

              Documents.Open FileName:=”Doc3.doc”, ConfirmConversions:=True, ReadOnly:= _
              False, AddToRecentFiles:=False, PasswordDocument:=””, PasswordTemplate:= _
              “”, Revert:=False, WritePasswordDocument:=””, WritePasswordTemplate:=””, _
              Format:=wdOpenFormatAuto, XMLTransform:=””

              doc3.Content.InsertAfter Join(vDupEntries, vbCr)
              End Sub

              Now the macro opens the document, still with nothing in it, with a message box: Object variable or With block variable not set. When I click on OK the line doc3.Content.InsertAfter Join(vDupEntries, vbCr) is highlighted in yellow in the de######.

              I am sorry I am causing such ruckus.

            • #948321

              Looks like you need to assign the document to the doc3 variable:

              Dim doc3 as Document
              Set doc3 = Documents.Open(FileName:="Doc3.doc", ConfirmConversions:=True, ReadOnly:= _
              False, AddToRecentFiles:=False, PasswordDocument:="", PasswordTemplate:= _
              "", Revert:=False, WritePasswordDocument:="", WritePasswordTemplate:="", _
              Format:=wdOpenFormatAuto, XMLTransform:="")
              
            • #948381

              Thank you so much for all your help, however, I made the recommended changes. Now the macro says “File could not be found. Check the spelling. Try different name. (Doc3.doc).” and brings me back to the de######. There’s got to be an easier way.

            • #948383

              Can you post the exact code you’re using?

            • #948450

              Sub RemoveDupEntries()
              Dim para As Paragraph
              Dim doc As Document
              Dim vEntries() As Variant
              Dim vDupEntries() As Variant
              Dim sParaText As String
              ReDim vEntries(0)
              ReDim vDupEntries(0)

              Set doc = ActiveDocument

              For Each para In doc.Paragraphs
              sParaText = Left(para.Range.Text, para.Range.Characters.Count – 1)
              If IsMember(vEntries, sParaText) Then
              If IsMember(vDupEntries, sParaText) = False Then
              Push vDupEntries, sParaText
              End If
              para.Range.Delete
              Else
              Push vEntries, sParaText
              End If

              Next para

              Dim doc3 As Document
              Set doc3 = Documents.Open(FileName:=”Doc3.doc”, ConfirmConversions:=True, _
              ReadOnly:=False, AddToRecentFiles:=False, PasswordDocument:=””, PasswordTemplate:= _
              “”, Revert:=False, WritePasswordDocument:=””, WritePasswordTemplate:=””, _
              Format:=wdOpenFormatAuto, XMLTransform:=””)
              doc3.Content.InsertAfter Join(vDupEntries, vbCr)

              End Sub
              Function Push(ByRef vArray As Variant, ByVal str As String)
              ReDim Preserve vArray(UBound(vArray) + 1)
              vArray(UBound(vArray)) = str
              End Function
              Function IsMember(vArray As Variant, ByVal str As String)
              Dim v As Variant

              For Each v In vArray
              If v = str Then
              IsMember = True
              Exit Function
              End If
              Next v
              IsMember = False
              End Function

            • #948541

              When you only give a relative pathname “doc3.doc”, instead of a full pathname (“C:My Documentsdoc3.doc”), Word looks for the file in the “current” directory, which may or may not be what you think is the “current” directory. It’s whatever folder appears when you choose File->Open.

              Next time you get the “File could not be found” error, stop the macro, go back to Word, choose File->Open, and make sure you’re in the same folder as doc3. Then just cancel the Open dialog, and try running the macro again. Alternately, give the macro the full path to doc3.doc.

            • #948583

              I did put the full path on the macro and now it opens the document OK. However, there are no entries pasted on Doc3.

            • #948598

              Perhaps I should have been clearer from the get-go that you’d need to modify the code to suit your particular situation (the location of the documents, their names, etc.). I used the ActiveDocument as the document with the list of items in it (what you call “Doc2”) because it was easiest while experimenting with the macro.

              More properly, instead of this:

              Set doc = ActiveDocument

              you’d want:

              Set doc = Documents.Open("C:DocumentsDoc2.doc") ' or whatever the path is for Doc2

              or

              Set doc = Documents("Doc2.doc") ' if doc2 is alread open

              You may want to experiment with the variables to ensure you’ve correctly “got” both of them:

              Sub RemoveDupEntries()
              Dim para As Paragraph
              Dim doc2 As Document
              Dim doc3 As Document
              Dim vEntries() As Variant
              Dim vDupEntries() As Variant
              Dim sParaText As String
              ReDim vEntries(0)
              ReDim vDupEntries(0)
              
              Set doc2 = Documents.Open("C:DocumentsDoc2.doc") 
              Set doc3 = Documents.Open("C:DocumentsDoc3.doc") 
              
              ' Remove (or comment out) the two lines following this comment
              ' when you're sure things are working correctly:
              Msgbox "doc2 is now:  " & doc2.Name & ", and doc3 is now: " & doc3.Name
              Exit Sub
              
              For Each para In doc2.Paragraphs
               sParaText = Left(para.Range.Text, para.Range.Characters.Count - 1)
               If IsMember(vEntries, sParaText) Then
               If IsMember(vDupEntries, sParaText) = False Then
               Push vDupEntries, sParaText
               End If
               para.Range.Delete
               Else
               Push vEntries, sParaText
               End If
              Next para
              
              ' And remove (or comment out) the two lines  following this comment
              ' when you're sure the list is correctly populating
              MsgBox Join(vDupEntries, vbCr)
              Exit Sub
              
              doc3.Content.InsertAfter Join(vDupEntries, vbCr)
              End Sub
              
            • #948695

              I made the changes, which I should have known better.

              I ran the macro and the result is: Doc1 (original entries), Doc2 (some duplicates), and Doc3 (duplicates should be) all open, however, there are no duplicate entries in Doc 3. Yes, there are duplicate entries on Doc2

            • #948701

              Perhaps I misunderstood your initial posting. The sample you posted doesn’t have any duplicate entries, so doc3 would then be blank. When I manually duplicated one of the entries in the file you posted, the duplicate was correctly deleted from doc2, and inserted into doc3. Are you sure the file you’re using for doc2 contains duplicates? Or have I somehow misunderstood your initial post?

              I ran the following, with two files — the one you posted (saved as “doc2.doc”) and a separate, blank document (saved as “doc3.doc”). Both were open and in the same folder on the desktop when I ran the macro. Again, running on your doc as posted produced no results, because there are no duplicate entries. Running after manually duplicating one (or more) of the entries, resulted in those duplicates correctly being removed from doc2, and inserted into doc3.

              Sub RemoveDupEntries()
              Dim para As Paragraph
              Dim docEntries As Document
              Dim docDupsList As Document
              Dim vEntries() As Variant
              Dim vDupEntries() As Variant
              Dim sParaText As String
              ReDim vEntries(0)
              ReDim vDupEntries(0)
              
              Set docEntries = Documents("doc2.doc")
              Set docDupsList = Documents("doc3.doc")
              
              For Each para In docEntries.Paragraphs
                sParaText = Left(para.Range.Text, para.Range.Characters.count - 1)
                If IsMember(vEntries, sParaText) Then
                  If IsMember(vDupEntries, sParaText) = False Then
                      Push vDupEntries, sParaText
                  End If
                  para.Range.Delete
                Else
                  Push vEntries, sParaText
                End If
              Next para
              docDupsList.Content.InsertAfter Join(vDupEntries, vbCr)
              
              End Sub
              
            • #948757

              My original post was not clear.

              I have two documents.

              Document 1 is to be checked against number 2.
              If document number 2 contains entries that are also in document 1; therefore duplicates.
              I want the duplicate entries of document 2 to be placed on document 3.

              Again, thank you for taking so much time with this.

            • #948783

              (Edited by Andrew77 on 19-May-05 20:52. Fixed variant array declaration)

              Ah ha! Thanks for clearing that up. In that case, this should do the trick:
              (You’ll still need the functions Push and IsMember)

              Sub RemoveDupEntries()
              Dim para As Paragraph
              Dim doc1 As Document
              Dim doc2 As Document
              Dim doc3 As Document
              Dim vEntries() As Variant
              Dim vDupEntries() As Variant
              Dim sParaText As String
              
              ReDim vEntries(0)
              ReDim vDupEntries(0)
              
              Set doc1 = Documents("doc1.doc")
              Set doc2 = Documents("doc2.doc")
              Set doc3 = Documents("doc3.doc")
              
              For Each para In doc1.Paragraphs
                  sParaText = Left(para.Range.Text, para.Range.Characters.count - 1)
                  If Not IsMember(vEntries, sParaText) Then Push vEntries, sParaText
              Next para
                  
              For Each para In doc2.Paragraphs
                  sParaText = Left(para.Range.Text, para.Range.Characters.count - 1)
                  If IsMember(vEntries, sParaText) Then Push vDupEntries, sParaText
              Next para
              
              doc3.Content.InsertAfter Join(vDupEntries, vbCr)
              
              End Sub
              
            • #948822

              I can see the macro opening the documents; however, after the documents are opened these come up and bring me to the de######:

              ReDim vEntries(0) Array already dimensioned
              ReDim vDupEntries(0) Array already dimensioned

              ReDim Preserve vArray(UBound(vArray) + 1) This array is fixed or temporarily locked

              No entries obviously! And you thought I wasn’t going to bug anymore.

            • #948824

              Sorry about that. Leave off the 0 in the variant array declarations. I’ve edited the previous post to reflect the change.

            • #948847

              You are brilliant! Thank you. Thank you. Thank you. You have saved me thousands of hours of work. Thank you again.

      • #948269

        As a sidebar to this thread …

        This particular task is a great example of how handy it can be to have at least some familiarity with one of the big three so-called “scripting languages”, Perl, Python, or Ruby.

        Obviously, VBA is the most appropriate solution for 95% of Word scripting tasks, but occassionally certain ones come along that can really take advantage of some of the strengths of those other languages. What takes dozens of lines to accomplish in VBA (see above) can be done in just a handful of Ruby:

        require 'win32ole'
        wrd = WIN32OLE.connect('Word.Application')
        doc3 = wrd.Documents("Doc3.doc")
        doc3.Content.Text = wrd.ActiveDocument.Content.Text.split("r").uniq!.join("r")
        

        Sorting and comparing are areas where some of these other languages really excel, and can offer simpler solutions than even native VBA macros.

        Here’s another quick example (I’ll use Ruby again, cuz’ it’s my favorite, though Python and Perl are also well equipped here). Say you wanted to delete any paragraph in a document whose style begins with a lowercase letter or a number, or ends with a number (hey, I’ve done stranger things …):

        require 'win32ole'
        wrd = WIN32OLE.connect('Word.Application')
        doc = wrd.ActiveDocument 
        doc.Paragraphs.each {|p| p.Range.Delete if p.Style.NameLocal.match(/(^([a-z]|d)|d$)/) }
        

        Don’t get me wrong, I’m not knocking VBA here. Again, it’s obviously the most appropriate tool in most cases. But I’d also say that it only takes a day or two to become familiar enough with one of these other languages to make it worth the effort for those other 5% of times when VBA’s just not up to the task. (This second example’s actually fairly easy to do using the VBScript RegExp object, but still requires more lines than the Ruby version.)

        Sorry for the random sidebar, but wasn’t sure where else it belonged.

        Cheers!

    Viewing 0 reply threads
    Reply To: Macro to Remove Duplicate Entries (MS Word 2003)

    You can use BBCodes to format your content.
    Your account can't use all available BBCodes, they will be stripped before saving.

    Your information: