• Finding Duplicate Files and Keyword Searching Large Number of Files

    Home » Forums » AskWoody support » Windows » Windows – other » Finding Duplicate Files and Keyword Searching Large Number of Files

    Author
    Topic
    #500794

    I’ve got 60,000 files that I need to eliminate duplicates. They are mainly TIFF and PDF files. The problem is they are sequentially numbered so I cannot use the file name to find duplications.

    I have Adobe Acrobat Professional so I can use OCR to search inside the file for content. But, opening 60k files manually is not practical.

    Ideally, I’m looking for two solutions for automating the processes of (1) identifying duplicate files and (2) searching the file content for keywords. Does anyone have any suggestions?

    Viewing 2 reply threads
    Author
    Replies
    • #1513836

      Easy Duplicate Finder seems to work, it didn’t take long to find duplicates in my 23k or so of picture files.

      Eliminate spare time: start programming PowerShell

    • #1513954

      Thanx to access-mdb for addressing the duplication issue.

      Regarding the second part of the question, I’ve got all 60k file names in an Excel file with hyperlinks to all of the files. Is it possible to code (VBA, scripting, etc) it to open each file individually in Acrobat, run the OCR conversion in Acrobat, then save the converted file? I believe I could then use the Windows Explorer search tool to find words inside the files.

    • #1513955

      Zeno,

      If you want to search your files using Windows Search you need the Adobe iFilter. HTH :cheers:

      May the Forces of good computing be with you!

      RG

      PowerShell & VBA Rule!
      Computer Specs

    Viewing 2 reply threads
    Reply To: Finding Duplicate Files and Keyword Searching Large Number of Files

    You can use BBCodes to format your content.
    Your account can't use all available BBCodes, they will be stripped before saving.

    Your information: