News, tips, advice, support for Windows, Office, PCs & more
Home icon Home icon Home icon Email icon RSS icon

We're community supported and proud of it!

  • Wanting to find duplicate files

    Home Forums Tools Wanting to find duplicate files

    Viewing 23 reply threads
    • Author
      Posts
      • #2362494
        WCHS
        AskWoody Plus

        I am Win10/Pro Version 20H2. I upgraded from 1909 to 20H2 on April 9.

        I just noticed that part, but not all of a folder, has been copied as a sub-folder with the same foldername somewhere else in my Documents. This happened five months ago, while I was still in Version 1909, and I have no idea as to what could have made this happen. (I can imagine how an entire folder could be copied, but not part of a folder; it is very mysterious—the copied subfolder is about 75MB).

        Now since I have discovered this one instance of an untoward partially-copied folder, I want to be able to see if there are other duplicate folders/files like this anywhere else on my device. So, I am looking for a program that will do this. I am not interested in anything fancy … just a listing of duplicate files so I can delete the copied folders manually, once I can identify them and see if they are candidates for deletion.

        I have done a search for “duplicate files” and many posts say something about having identified/deleted duplicate files, but there is barely a mention of what program was used to do this. I have found a mention of ALLDUP, but it seems very complicated. All I want is a listing so that I can see if this duplication happened elsewhere on my device.

        Can someone make a recommendation?

      • #2362511
        SteveTree
        AskWoody Lounger

        You might try CloneSpy.

        If that doesn’t do what you want I’d be surprised but you’ll find other Open Source alternatives via Alternativeto.net

        From memory DupeGuru in that list works well.

        Group A (but Telemetry disabled Tasks and Registry)
        Win 7 64 Pro desktop
        Win 10 64 Home portable

      • #2362513
        Paul
        AskWoody Lounger

        You might try SwiftSearch https://sourceforge.net/projects/swiftsearch/
        It’s a file finder that, I think, can easily help you find duplicate files. Without any indexing, it searches for file names or directory names almost instantly, on all NTFS drives or selected drives. Gives you a clickable list with full path.  You don’t need to install SwiftSearch; just download and run.

        1 user thanked author for this post.
      • #2362514
        b
        AskWoody MVP

        Four options were discussed by Fred Langa in the AskWoody Plus Newsletter a year ago:

        Duplicate Duplicate Finder Finder Apps Apps

        Windows 10 Pro version 21H1 build 19043.1052 + Microsoft 365 (group ASAP)

        1 user thanked author for this post.
        • #2362517
          WCHS
          AskWoody Plus

          Didn’t think to check the newsletter. 🙂 Appreciate the help!!

      • #2362586
        PaulK
        AskWoody Lounger

        How many folders and files do you have to check?

        My \Users\Paul has >43K Files and >7500 Folders.
        But the majority of these are in AppData: >24K Files and >6600 Folders.

        My \Users\Paul\Documents has about 2K Files and 195 Folders.
        (The \Users\Public\Documents is larger.)

        How comfortable are you with using the Command Prompt?

        The following is slightly cumbersome, but may be worth trying.

        Key in each of the following in a Command Prompt.
        (You may type either in CAPS or in lower-case; I use CAPS here for clarity.)

        To review the syntax for these commands:
        FIND /?
        SORT /?

        Navigate to the highest level directory (folder) that you need to check.
        e.g.: CD C:\Users\WCHS\Documents
        – Adjust as needed; substitute the correct folder name for WCHS.

        Extract the names of the folders and files. And sort them.
        DIR /A /S *.* | FIND /V ” .” > WCHS1.TXT
        SORT /+40 /REC 400 WCHS1.TXT /O WCHS2.TXT
        Notes:
        1) If you are doing a Copy/Paste from here, ensure that the quotation marks are ‘straight’ not smart/curly. This has been a problem in the past, but may not be now.
        2) Careful: the string following /V is exactly “spaceperiod”

        Now, using two copies of an editor – Notepad is fine – Open both WCHS1.TXT and WCHS2.TXT.
        In WCHS2, visually scan for duplicates. Folder names are intermixed with File names.

        For each duplicate in WCHS2, in WCHS1 do an Edit>Find (Ctrl+F) for the name of interest.
        Find all duplicates in WCHS1, and deal appropriately.
        You may wish also to have (an) Explorer window(s) open to handle the deletions.
        I suggest that Deletions be to the Recycle Bin, not immediate deletes, just-in-case.

        • #2362788
          WCHS
          AskWoody Plus

          To review the syntax for these commands:
          FIND /?
          SORT /?

          I did a DIR /? and learned about all of the attributes. But, I didn’t realize that the attribute comes right after the “/A”, if you want to specify that (for example, /AR). Thanks for that.

          One question, though: after the list of possible attributes, the information says “prefix meaning not”. Can you explain how that works with an example??

      • #2362646
        WCHS
        AskWoody Plus

        Thanks for this information. I will check it out. My Documents folder has 4k folders and 555 folders. I’ll see if it’s manageable. I have a few questions first:
        1. In the DIR arguments, why do you use /A, if there are no attributes indicated?
        2. In the SORT argument /+n, why do you begin the comparison with the 40th character? i.e., why 40?
        3. about the SORT argument /REC 400 for specifying the number of characters in a record. Does that mean the number of characters in filename? or the number of characters in the content of a file?
        4. where will the WCHS1.txt and WCHS2.txt land?

        This is not going to find the duplicates, right? In other words, it will just provide a sorted listing so that you can use CTL-F or eyeball the sorted output for duplicates.

        Is the entire pathname, including filename, listed in the sorted output. I ask because I have many folders named for the same year, but the folder above it is unique, i.e., Documents\Plans\2021 and Documents\Reports\2021.

      • #2362706
        PaulK
        AskWoody Lounger

        Good questions.
        0 – 4K FILES? You’re probably pushing the limits of patience. But it is worth a try just for the educational experience.

        1 – Just plain [ /A ] means: include all entities, regardless of their attributes.
        If one specifies something, for instance /AR, then ONLY Read-only records are listed.
        Or, /AH, then ONLY Hidden files are listed.
        Omitting /A altogether will list only those items that are NOT: Hidden, Read-Only, etc.

        2. Column 40 is the fixed location where, in a Directory Listing, the names (of Files, Directories) start.

        3. It is not documented well. This appears to be an indication to the Sort program as to the maximum record length to be expected, in order to set up internal variables. The default is 4096. When I was experimenting for this, I had set 200 for one run. But the Sort aborted because one record (” Directory of C:\Users\Paul\AppData\etc.” ) was 221 characters long. You most likely can set your number much lower; I listed 400 just to be safe. The SORT command can sort any records-length file, up to 64K-1.

        This Sort command originated away back in the early DOS days, and apparently has not been enhanced much since then in flexibility. Records are sorted beginning at the /REC-number column, through the rest of the length of each record. (Other programs and applications support the specification of multiple sets of starting+ending column numbers upon which to sort.)

        4. The way that DOS and Windows Commands work is that the default location of a file is in the “current directory”. Hence, if I am in C:\Users\Paul\Documents, then – unless I otherwise specify a Path designation – any entity that I create will be there too.

        So, yes, when you are through with this exercise you’ll want to go back to the directory (or directories) where you were, and purge these working files.

        5. Correct. This will group all duplicate file names together. Eyeball+brain are needed to determine what to do with these. The CTRL+F applies within Explorer windows in order to make the determination that ‘these files are true duplicates, and one needs to go’ and ‘these files are valid duplicate names, but different valid folders, and need to be retained’. ‘True duplicates’ likely will have identical file dates/times and sizes; these are good discriminatory indications.

        6. No. ‘What you see (in a DIR listing) is what you get’. One advanced editing program that I have DOES include the full path information. And it supports multiple sort specifications.

        The method outlined here – DIR/FIND, SORT – is a brute force attack, best limited to relatively small sets of names to be reviewed. It is cheap and easy.

        If you want to experiment with sorting on other fields, here are the column numbers:
        1 – Month part of Date
        4 – Date (in month) of Date
        7 – Year part of Date
        – Note that the above assumes the format MM/DD/YYYY.
        10 – time
        19 – AM/PM
        25 – DIR, if present (There are angle brackets too, but they corrupt this entry if I show them.)
        30 – file size
        40 – file name
        But remember, a sort starts at the specified column, and continues all the way through the end of each record.

        1 user thanked author for this post.
      • #2362718
        rd23
        AskWoody Plus

        “Find and Delete Duplicate Files and Photos in Windows” at http://www.winhelponline.com mentions my favorite quick and dirty approach in “Method 2: Using HashMyFiles”.

        We should expect that Nirsoft has at least one utility that does this.

        (Once again, my original post was blocked as spam. I have to remove the exact link to the article.)

      • #2362721
        WCHS
        AskWoody Plus

        I’ve experimented with your commands and I like what I am seeing. I want to keep WCHS2.txt and WCHS3.txt, but I think that I need to do another step: to identify in WCHS2.txt all the lines that have <DIR> in them because the duplicated files that I am trying to identify are all in duplicated folders/directories. This would shorten the list I would be examining considerably, because I could spot duplicated folders/directories easily. Is it possible to do a FIND command on WCHS2.txt?

        If so, I think I understand how to use the FIND command and I think I understand how to specify the “string”, and I think I see how I specify the file to do the search in (C:\Users\WCHS\Documents\WCHS2.txt) but I am not sure how to specify where I want the output to go (let’s say it is supposed to go to C:\Users\WCHS\Documents\WCHS3.txt). So, what would the FIND command look like, if it were told to look for “<DIR>” in the file WCHS2.txt and to put the results in WCHS3.txt? Can I presume that the output listing would stay sorted, since the input listing was sorted?

        Let’s say that I am issuing the FIND commands after I do a CD C:\Users\WCHS\Documents at the command prompt.

      • #2362748
        PaulK
        AskWoody Lounger

        Congratulations, you are doing famously well.

        Background info:
        You have been using the ‘pipe’ operator [ | ]. In context, it means “take the output of the operation that precedes it, and pass it to the the operation that follows”.
        Another operator that is very useful is: “take that which precedes me, and put it in
        the location that I specify next”. This is the redirection operator: [ > ].
        And another command that is useful is TYPE. It “types out” (by default to the console, your display) a file. But the default destination can be changed by piping or redirecting.

        Will this work?
        FindDIR

        The output sequence is not changed; only a[nother] Sort does that.

        NB – For some reason in these AW postings I cannot insert a string with the word DIR surrounded by the double-quotation symbols and angle brackets (less-than and greater-than) symbols. The result is a broken mess. Hence I’ve had to insert a png file of the desired line.
        How did YOU manage to post it above: paragraph 2, line 5?

        Attachments:
      • #2362753
        Stephanie_Sy
        AskWoody Lounger

        You can not find duplicates without a dedicated application.

        • #2362757
          PaulK
          AskWoody Lounger

          Do you have one to recommend?

        • #2362773
          WCHS
          AskWoody Plus

          Well, I found duplicates using the Command Prompt commands that @PaulK outlined.

          • #2362797
            dg1261
            AskWoody_MVP

            I agree with Stephanie … though it really depends on how you want to define “duplicates”.

            Your command line method is only searching for files with the same name but does not compare file contents. It will miss duplicates that have a different filename, and will find false positives on files that may appear to be duplicates by filename and size but actually have different contents.

            If you want to check contents, you need a third-party app. The better apps will also search for files with identical contents but different filenames — a not infrequent problem with camera photo imports. Dedicated apps also filter out all the non-duplicates, which your command line method does not.

            In your case of searching merely for an erroneously copied folder, your method is probably sufficient, but do be aware of its limitations. It’s not finding duplicates, it’s searching for the names of some files that *might* be duplicates.

            When searching for true duplicates, I tend to favor portable or “no-install” apps such as “Easy Duplicate File Finder” or “Duplicate Files Finder Portable” rather than programs that require installation.

            • #2362801
              WCHS
              AskWoody Plus

              Thanks for the clarification. Indeed, I was searching for copied folders (with lots of subfolders and subsubfolders) all with the same foldernames. By serendipity, I discovered that this had happened to one master folder. And I wondered if it had happened to other master folders, due to some system mishap rather than a single instance of an inadvertent “copy” operation by me. So, in my case, foldernames and file sizes are identical. Folder contents are not necessarily the same, because the content of the copied folder is static (thus, older), whereas the contents of the master folder has in many instances changed (files added, modified, or deleted). The copied folder is a fluke and I need to delete it since it’s taking up a lot of unnecessary space.

              But, I will keep in mind that there are better methods for finding duplicate files when the contents are the same but the foldernames are different.

        • #2363101
          Chris Greaves
          AskWoody Plus

          I agree.

          I am currently testing a Word/VBA program I wrote to eliminate duplicate MP3 tracks from a collection of 20,000 files in 1,800 folders, using the contents of the files rather than the file names. The same will apply to my collection of image files and, I suppose, documents, workbooks and text files (txt/bas/cls/frm)

          The way to eliminating duplicates is littered with hazards:-

          1. Illegal filenames; Win10 allows me to import badly-named files from an Android (TubeMate) application. Because the filenames are illegal the program  can see the files, but cannot manipulate the names!
          2. An edited MP3 track (removing the 90-seconds of applause at the end, or at the beginning) means that a straight binary match is insufficient
          3. Optimizing algorithms; if two files contents are equal, it follows that their FileLen() must be equal, so examine only the subset of files with matching lengths.

          And so on. It is an interesting journey!

          Cheers

          Chris

          Unless you're in a hurry, just wait.

      • #2362762
        Biiljoy
        AskWoody Lounger

        Total commander I’m pretty sure can do it.

      • #2362769
        WCHS
        AskWoody Plus

        I tried FIND “<DIR>” WCHS2.txt > WCHS3.txt and it worked.

        Thanks for all of your help. I learned a lot.

        P.S. I just typed “<DIR>” from the keyboard.

        • #2362782
          WCHS
          AskWoody Plus

          I tried FIND “<dir>” WCHS2.txt > WCHS3.txt and it worked.</dir>

          Curiously, it worked, although the extension of the WCHS2 file was actually TXT!!
          Here, it did not pay attention to case. On the other hand, the extension created for the WCHS3 file was txt. Here it DID pay attention to case.

          Another thing: in my Documents folder, I have a subfolder and a sub-subfolder named .NET Framework. The sub-subfolder is a duplicate of the subfolder. But, neither WCHS1.TXT nor WCHS2.TXT have any line that says “<DIR> .NET Framework”. Do you have a speculation as to why? Is it because of the /V ” .” argument in the piped FIND command? If so, why did you use this argument in yours?

      • #2363023
        PaulK
        AskWoody Lounger

        Excellent observations.
        1 – Concerning the CaSe of the parts of a file-id.
        In old DOS, everything was upper case. (Historically this was because old (pre big/main frame) computers used a limited character set that consisted only of ‘capital’ letters, plus digits and some symbols. As time and technology and user-friendliness progressed, character sets expanded to include lower-case letters and more characters. And now, more character sets also support other ‘non-Latin-character’ characters. But we’re on the edge of the topic now.)

        When one keys a letter into a command that results in an output name, the case of that letter is honored in the result. But, Windows ignores the case when using it to locate a file. Note, however, that case IS respected when it is pertinent. (Other operating systems DO respect case in file ids, so one need to be fluent in them — and give them respect.)

        2. In the Find command, why is [ FIND /V ” .” ] included? A question which leads to another dissertation (or distraction?). Look at this sequence. Explanations follow.
        periods

        Perhaps you have noticed that the ‘raw’ result of a DIR always has two lines at the top: one has a name of just one period, the other has two periods. These have a special meaning when used in a command that includes a path specification.

        Two periods = back up one level in the path; that is, to ‘my’ parent directory.
        One period = this directory where I am now.

        For your screening you didn’t need to see hundreds of these redundant repeated lines, so you eliminated them with that Find.

        (The [ CD . ] above is only an illustration of what its effect is, and is not useful as-is. However it is used in some batch command applications.)

        To answer your question (thought I’d never get around to it?) – Yes, the dot-NET lines WERE filtered out because of this specification. Candidly, I didn’t think of them. You might then ask – why not change the find to [ “. ” ], that is, period+space? The problem here is that, for those lines, the last period is the end of the line: there is no trailing space IN THE RECORD. On the screen we see blanks, but the actual line has already ended. Try it.
        = = = = =
        The replies by others (concerning the limitations here) are quite correct. That is why

        ‘True duplicates’ likely will have identical file dates/times and sizes; these are good discriminatory indications.

        was stated in paragraph 5.
        = = = = =
        To answer

        One question, though: after the list of possible attributes, the information says “prefix meaning not”. Can you explain how that works with an example??

        You already used that. In your Find, the [ /v ] said: Find all lines that do-not have a [ space+immediate-period ]. This dropped all the lines discussed above. Were you to do a Find WITHOUT the /v, the result would eliminate everything except those (repeated) two lines for each folder. Try it. In any folder: [ dir /s | find ” .” ]. Hint: to abort the hundreds or thousands of lines scrolling down your monitor, press [ Ctrl+C ].

        Attachments:
        • #2363043
          WCHS
          AskWoody Plus

          Now that I understand that /V “ .” (that’s a spaceperiod) is intended to eliminate <DIR> . and <DIR> .. in the listing, I just changed the argument to /V “” (that’s two double-quotes In a row).
          So, that first command reads this way: DIR /A /S *.* | FIND /V “” > WCHS1.TXT (which would be the same as DIR /A /S *.* > WCHS1.TXT)

          The final command is FIND “<DIR>” WCHS2.TXT > WCHS3.TXT, which lists the lines that have <DIR> in them, since I know that I want only a listing of folder names (and no files). Consequently, with that change in the /V argument, I get in the final output file a listing that has lots of lines at the top that begin with <DIR>. and then <DIR>..
          And after that comes .NET FRAMEWORK and any other foldernames that begin with dot(s).

          That’s OK. I can just highlight all of those <DIR>. and <DIR>.. lines and cut them out.
          I will also manually cull out any foldernames that are not duplicates. And then, I end up with working list of duplicate foldernames that are worth investigating. It turns out in my case that the inadvertently copied very large folder to a subfolder elsewhere in \Documents happened only once (i.e., those duplicate folders have the same time-stamp– likely my mismanagement of the mouse and failure to notice what was going to happen).

      • #2363062
        PaulK
        AskWoody Lounger

        How about doing this in two ‘batches’?
        One of them is the original – deleting all the ‘space-dot’ names.

        The other run: Make this change: FIND ” .NET”
        NOTE: I just tested, and the content within the ” “s IS case respected.
        Thus: ” .NET” does not find ” .net”, and vice versa.
        Also, if there is mixed case in some names, one will have to be careful to match it.
        However, IF the word ‘Framework’ is always present: FIND “Framework” ?
        Again, verify the precise case of the spelling of the word.
        And if there are other folders with ‘Framework’ as part of the name … collateral damage.

        Note also that finds may be chained: FIND … | FIND … | FIND … . This is useful if one is not interested in saving the results of each individual stage of filtering.

        Now, as a bonus. If you DO have ‘intermediate’ files (one run w/o periods; one run for .NETs), you can combine them back into one file. Look at the syntax of COPY /? .
        COPY filein2 + filein1 fileout
        The spaces around the plus sign are not required.
        So [ COPY filein2+filein1 fileout ] is the same. User’s choice for readability.
        (I’ve shown filein2 before filein1, assuming that ‘2’ is the .NETs, which is where they would naturally sort to if they hadn’t been dropped in the first run.

        • #2363067
          WCHS
          AskWoody Plus

          Now that I know about the COPY command and that one can chain FIND, there are many ways to “skin the cat”. But, indisputably, a foldername beginning with a ‘dot’ does create some twists and turns.

          Right now, I’ve got what I need. And, I hope I don’t fall in this hole again, but if I do … I can always re-read this topic.

          Thanks for pulling me out this time. 🙂

      • #2363202
        RetiredGeek
        AskWoody MVP

        Hey Y’all,

        Here’s a short PowerShell script I created to create a file that lists all the files of a specified type in my Documents Directory. It sorts the files by length which makes it easy to scan down the list for duplicates. Of course, providing the full path to make finding them easy.

        Here’s a sample of my results, note I’ve edited the data to remove like duplicates and non-duplicates.

        File Name                 Bytes Full Path                          
        ---------                 ----- ---------                          
        powershell-cheat-she      63136 G:\xxxDocs\Courses\PowerShell\power
        et.pdf                          shell-cheat-sheet.pdf              
        Viking Cruise             63316 G:\xxxDocs\Travel Info\Viking      
        Itinerary 2022.pdf              Cruise Itinerary 2022.pdf          
        ...
        3-1 Principle Basic       70952 G:\xxxDocs\Samsung\SmartSwitch\back
        Spot Enforcement                up\SM-N910A\SM-N910A_18034783331\SM
        .pdf-1527286803.pdf             -N910A_20171003202427\Docs\Download
                                        \3-1 Principle Basic Spot          
                                        Enforcement .pdf-1527286803.pdf    
        3-1 Principle Basic       70952 G:\xxxDocs\Samsung\SmartSwitch\back
        Spot Enforcement                up\SM-N910A\SM-N910A_18034783331\SM
        .pdf-1527286803.pdf             -N910A_20181126124126\Docs\Download
                                        \3-1 Principle Basic Spot          
                                        Enforcement .pdf-1527286803.pdf    
        3-1 Principle Basic       70952 G:\xxxDocs\Samsung\SmartSwitch\back
        Spot Enforcement                up\SM-N960U1\SM-N960U1_18034783331\
        .pdf-1527286803.pdf             SM-N960U1_20200207103050\Docs\Samsu
                                        ng\SmartSwitch\backup\SM-N910A\SM-N
                                        910A_18034783331\SM-N910A_201811261
                                        24126\Docs\Download\3-1 Principle  
                                        Basic Spot Enforcement             
                                        .pdf-1527286803.pdf  
        ...								
        Excel - Hack60.pdf        97433 G:\xxxDocs\Courses\Excel\How To    
                                        Articles\Excel - Hack60.pdf        
        hack60.pdf                97433 G:\xxxDocs\Courses\Excel\How To    
                                        Articles\hack60.pdf                
        ...
        Excel - Hack81.pdf        99693 G:\xxxDocs\Courses\Excel\How To    
                                        Articles\Excel - Hack81.pdf        
        hack81.pdf                99693 G:\xxxDocs\Courses\Excel\How To    
                                        Articles\hack81.pdf                
        ...
        10 Things To Do          111330 G:\xxxDocs\Courses\Windows XP and  
        When XP Wont                    Internet\10 Things To Do When XP   
        Boot.pdf                        Wont Boot.pdf                      
        10_things_winxp_boot     111330 G:\xxxDocs\Courses\Windows XP and  
        .pdf                            Internet\10_things_winxp_boot.pdf 
        ... 
        2020 Q1 Vanguard         167132 G:\xxxDocs\ACTS\Financial          
        Statement.pdf                   Disclosure\Final\2020 Q1 Vanguard  
                                        Statement.pdf                      
        2020 Q1 Vanguard         167132 G:\xxxDocs\ACTS\Financial          
        Statement-Final.pdf             Disclosure\2020 Q1 Vanguard        
                                        Statement-Final.pdf 
        ...								
        Excel - Unique           212460 G:\xxxDocs\Courses\Excel\How To    
        Lists.pdf                       Articles\Excel - Unique Lists.pdf  
        Excel Unique             212460 G:\xxxDocs\Courses\Excel\How To    
        Lists.pdf                       Articles\Excel Unique Lists.pdf    
        ...
        xxx Passport             325844 G:\xxxDocs\Samsung\SmartSwitch\back
        2016.pdf                        up\SM-N910A\SM-N910A_18034783331\SM
                                        -N910A_20171003202427\Docs\Document
                                        s\xxx Passport 2016.pdf            
        xxx Passport             325844 G:\xxxDocs\Samsung\SmartSwitch\back
        2016.pdf                        up\SM-N910A\SM-N910A_18034783331\SM
                                        -N910A_20181126124126\Docs\_EXTERNA
                                        L_\PDFs\xxx Passport 2016.pdf      
        xxx Passport             325844 G:\xxxDocs\Samsung\SmartSwitch\back
        2016.pdf                        up\SM-N960U1\SM-N960U1_18034783331\
                                        SM-N960U1_20200207103050\Docs\_EXTE
                                        RNAL_\Legal\xxx Passport 2016.pdf  
        xxx Passport             325844 G:\xxxDocs\Samsung\SmartSwitch\back
        2016.pdf                        up\SM-N960U1\SM-N960U1_18034783331\
                                        SM-N960U1_20200207103050\Docs\_EXTE
                                        RNAL_\PDFs\xxx Passport 2016.pdf   
        xxx Passport             325844 G:\xxxDocs\Samsung\SmartSwitch\back
        2016.pdf                        up\SM-N960U1\SM-N960U1_18034783331\
                                        SM-N960U1_20200207103050\Docs\Samsu
                                        ng\SmartSwitch\backup\SM-N910A\SM-N
                                        910A_18034783331\SM-N910A_201811261
                                        24126\Docs\_EXTERNALDATA_\PDFs\xxx 
                                        Passport 2016.pdf                  
        xxx Passport             325844 G:\xxxDocs\Word\Legal              
        2026.pdf                        Files\Scanned Docs\xxx Passport    
                                        2026.pdf                           
        ...
        Covid 19                 478030 G:\xxxDocs\Word\Health             
        Vaccination                     Information\Covid 19 Vaccination   
        Cards.pdf                       Cards.pdf                          
        Covid 19                 478030 G:\xxxDocs\Word\Legal              
        Vaccination                     Files\Scanned Docs\Covid 19        
        Cards.pdf                       Vaccination Cards.pdf              
        ...
        RVadventure2008.1.pd    1171930 G:\xxxDocs\HTML\bkpresents\pdfs\RVa
        f                               dventure2008.1.pdf                 
        RVadventure2008.pdf     1171930 G:\xxxDocs\HTML\bkpresents\pdfs\RVa
                                        dventure2008.pdf                   
        ...
        

        Here’s the Code:

        Clear-Host
        
        $fmtInfo = @{Expression={$_.Name};
                     Label="File Name";Width=20},
                   @{Expression={$_.Length};
                     Label="Bytes"    ;Width=10},
                   @{Expression={$_.FullName};
                     Label="Full Path";Width=35}
        
        $x = (GCI -Path G:\BEKDocs -Filter "*.pdf" -recurse) | 
              Sort Length,Name,FullName
        $x | Format-Table $fmtInfo -Wrap > G:\CompareFiles.txt
        

        Notes:

        • You can change the extension to an * if you want all the files.
        • Of course you’ll change the path to search as well as the one on the output file.
        • It’s a good idea to place the output file somewhere you’re not searching.
        • You could also switch the $fmtInfo to display the Length first if you find that better.

        HTH 😎

        May the Forces of good computing be with you!

        RG

        PowerShell & VBA Rule!
        Computer Specs

      • #2363212
        WCHS
        AskWoody Plus

        Does the principle behind this code depend on one’s knowing in advance what type of file might be duplicated?

        If so, what modifications would you make to the code so that all file types are listed?

        Can I presume that if you wanted the sorting to be Name, Length, and Fullname, you would just change the penultimate line to read Sort Name,Length,FullName?

        Is the Powershell command to set the directory the same one that you use in the Command Prompt?
        i.e. CD C:\USERS\USERNAME\DOCUMENTS (for a search of the Documents directory)?
        If so, do you have to be attentive to case? I ask because File Explorer shows “Documents” for that folder.

        • #2363221
          RetiredGeek
          AskWoody MVP

          WCHS,
          It’s in the notes but the following answers your questions.

          Clear-Host
          
          $fmtInfo = @{Expression={$_.Name};
                       Label="File Name";Width=20},
                     @{Expression={$_.Length};
                       Label="Bytes"    ;Width=10},
                     @{Expression={$_.FullName};
                       Label="Full Path";Width=35}
          
          $GCIArgs = @{
            Path    = "$([environment]::getfolderpath("mydocuments"))\" 
            Filter  = "*.pdf" #Change extension or * for ALL, e.g. "*.*"
            File    = $True
            Recurse = $True
                      }
          $x = (Get-ChildItem @GCIArgs ) | 
                Sort Length,Name,FullName
          $x | Format-Table $fmtInfo -Wrap > C:\CompareFiles.txt
          
          

          I improved the script to automatically find the Documents folder. Of course you can hard code this if you wish. I also noted how to change the filter. Finally, I changed the output file to write to the root of the C: drive as that should be out of the way.

          HTH 😎

          May the Forces of good computing be with you!

          RG

          PowerShell & VBA Rule!
          Computer Specs

          1 user thanked author for this post.
          • #2363230
            PaulK
            AskWoody Lounger

            Finally, I changed the output file to write to the root of the C: drive as that should be out of the way.

            (emphasis added)

            Great idea!
            But there may be a problem here, by a Standard user, to write to C:\ .

            On Windows 7, and I presume on Windows 10 too, one can
            – Make a Directory, MD  –  (New folder)
            – Copy an existing file there, either
            — from elsewhere, or that was originally there
            but one cannot [ “Access denied” ]
            – directly create a file there
            – Save[As] there a file that was already there, even if unchanged

            Two workarounds that I’ve tested are:
            – Standard user
            — MD a directory
            — Save to, or Copy to, this directory
            — ‘Run as administrator’ the program that is to write to C:\
            – Administrative user – has no restriction

            So, to use Mr. Geek’s Script, I suggest that one
            – first, create a folder there (such as: MD C:\Comps )
            – modify part of the last line of the script:
            — old: C:\CompareFiles.txt
            — new: C:\Comps\CompareFiles.txt

            Edit addition:
            I’m always a bit nervous when working directly with C:-root.
            Crossed fingers for luck are acceptable, but crossed hands can lead to a disaster.
            It is far too easy to bypass the UAC caution when one is focusing on something else.

            • #2363233
              RetiredGeek
              AskWoody MVP

              Simple solution, run PowerShell as Admin. Just tested this so I know it works.  HTH 😎

              May the Forces of good computing be with you!

              RG

              PowerShell & VBA Rule!
              Computer Specs

      • #2363224
        anonymous
        Guest

        If you want the ability to work through the list an easily delete, a 3rd party program makes life easier. Music and photos present their own challenge. 3rd party programs may be capable of finding music recorded at different bitrates and similar images.

        I use this for similar images

        Note: similar, not identical. Be aware that the list can be confusing in that multiple similar files occupy a line each. Pay careful attention to the file path.

        I use a very rigid naming convention for music and can find those by file name.

        The rest, I persevere with AllDup, despite it being far from intuitive to use. .

      • #2363437
        WCHS
        AskWoody Plus

        OK. I have two laptops. One is older than the other one. I use the older one as the “master” laptop and I sync all of my Documents from the older laptop to the newer one. Therefore, the newer and older laptops both have the same duplicated folder in the same path.

        When I deleted the offending duplicate folder from the older laptop, it appeared in the Recycle Bin as one big superfolder. However, when I tried to delete the offending duplicate folder from the newer laptop, I received a dialog box saying something about filenames being too long and not being able to do the deletion. Has anyone seen this kind of dialog box when doing a folder deletion?

        Why would one laptop do the deletion without a hitch, deleting the one big superfolder in one fell swoop, whereas the other one raised an objection about filenames being too long for the Recycle Bin?

        P.S. I did manage to get the offending duplicated folder deleted by using a filemanagement program and there was no objecting dialog box. However, when I checked the Recycle Bin, all of the files within folders were deleted first, and then empty folders were deleted. I supposed this was a way to get around the ‘filenames too long’ problem.

        What is it about that laptop that forced the deletion to be done in steps, whereas the other laptop did it all at once?

        The one that deleted the superfolder in one fell swoop is an older laptop and has an i7 processor 6th generation (6500U–Skylake U). The laptop that had to do it step by step is a newer laptop and has an i7 processor, too, but is 8th generation (8565U–Whiskey Lake U.)

        Seems strange to me. Is there an explanation?

        • #2363590
          Paul T
          AskWoody MVP

          Windows Explorer has always had trouble with file path/names longer than 255 characters. Windows itself lost that limit many years ago, which is why a 3rd party tool had no trouble with the deletion.

          Sometimes moving/renaming a top level folder in Explorer is all you need to do to get under the 255 limit.

          cheers, Paul

          • #2363641
            WCHS
            AskWoody Plus

            But, my question was why one laptop would object to the deletion because of long filenames and the other did not. Both are Windows 20H2, build 928.

            • #2363779
              RetiredGeek
              AskWoody MVP

              There is a registry setting for this.

              HKLM\system\CurrentControlSet\Control\FileSystem\
              LongPathsEnabled

              You can use my CMsStdSettingsForm.ps1 program to check and adjust this setting.

              HTH 😎

              May the Forces of good computing be with you!

              RG

              PowerShell & VBA Rule!
              Computer Specs

              • #2364862
                WCHS
                AskWoody Plus

                @RetiredGeek:
                I’d like to use your CMsStdSettingsForm.ps1program to set the registry to enable long paths.

                I am a newby, especially when it comes to doing anything with the Registry. In fact, having been always reminded of the dangers of monkeying with the registry, I have never made any modifications to the registry myself. So, I have a few questions about your program.
                1. My device is using O/S Windows 10, 20H2, Build 928. It has a 512 GB SSD and an 8th generation i7-8565U–Whiskey Lake U processor. Will it work on this device?
                2. When I go to the link, I see only the image below. What do I do next?
                3. I don’t know what to expect once the program starts running. You say “check and adjust this setting”. How do I do that?
                4. Are there any precautions I should be aware of? (Nervous about registry changes, even if automated)
                Thanks.

                Attachments:
              • #2365031
                RetiredGeek
                AskWoody MVP

                WCHS, double click the icon to open the folder. You’ll then see all my programs available for download as well as a off document with the file hashes to verify your download.

                May the Forces of good computing be with you!

                RG

                PowerShell & VBA Rule!
                Computer Specs

      • #2364957
        Paul T
        AskWoody MVP

        According to MS, File Explorer still has that limit.
        LongPathsEnabled – Microsoft Q&A

        Are you sure the paths were exactly the same?
        Did you put the folder under a user ID?

        cheers, Paul

      • #2364993
        WCHS
        AskWoody Plus

        @PaulT:
        Not sure if this answers your question:
        Original Location: c:\Users\Username\Documents\Folder X. Note: Folder X has many more subfolders and many more files than the duplicate folder, because subfolders and files were added after Folder X was, unbeknownst to me, duplicated in another location.

        Duplicate Location: C:\Users\Username\Documents\Folder X\HOW-TO for Windows 10, 1909\Reading about 1909\Folder X Note: the duplicated folder and its subfolders all have the same time-stamp, which is the day and time the duplication occurred.
        ————————————————-
        Machine A: Deleted duplicate folder is one big superfolder in the Recycle bin; thus, the deleted superfolder Folder X is the only entry in the Recycle bin.

        Machine B: The entries in the Recycle bin are in this order: files in the subfolders, subfolders in the superfolder, the superfolder; thus, a) all of the files in subfolders of the deleted duplicate Folder X were deleted, b) then the subfolders from which the files had been deleted were deleted (subfolders are empty because the files in step a had been deleted), and c) then the superfolder Folder X was deleted (it is empty because all of the subfolders in step b had been deleted). Deletion into the Recycle bin occurred in steps a, b, and c because machine B said the filenames were too long to delete Folder X in one fell swoop.

      • #2365906
        Paul T
        AskWoody MVP

        Both machines had “c:\Users\Username”?
        Did they have an actual user name?

        cheers, Paul

        • #2365917
          WCHS
          AskWoody Plus

          Machine A has UsernameA and Machine B has UsernameB. They are actual usernames, but I give them a “covername” here. UsernameA is not the same as UsernameB.

          • #2366375
            Paul T
            AskWoody MVP

            I suspect user usernameB is longer than usernameA and breaks the 255 character limit.

            cheers, Paul

            • #2366438
              WCHS
              AskWoody Plus

              Is that the character limit on just a filename? or on the path, including the filename?

              • #2367552
                Paul T
                AskWoody MVP

                Full path including file name. That’s why shortening a folder name often fixes the issue.

      • #2367543
        Alex5723
        AskWoody Plus

        Ghacks :

        Find duplicate files and more with open source cross-platform tool Czkawka

        Czkawka is a free open source too to find duplicate images, broken files and more. It is written in Rust and available for Windows, Linux and Mac OS devices…

        The following operations are supported by Czkawka:

        Find duplicate files — searches for dupes based on file name, size, hash or first Megabyte hash.
        Empty folders — finds folders without content.
        Big files — displays the biggest files, by default the top 50 biggest files.
        Empty files — finds empty files, similarly to empty folders.
        Temporary files — finds temporary files with certain file extensions.
        Similar images — finds images which are not exactly the same, e.g. images with different resolutions.
        Zeroed files — finds files that are zeroed.
        Same music — finds music from the same artist, album and other search parameters.
        Invalid symbolic links — finds symbolic links that point to files or directories that are missing.
        Broken files — finds files with invalid extensions and files that are corrupted…

      • #2367615
        RetiredGeek
        AskWoody MVP

        Please see this article from 2016. Please note it only applies to NTFS formatted volumes.

        HTH 😎

        May the Forces of good computing be with you!

        RG

        PowerShell & VBA Rule!
        Computer Specs

    Viewing 23 reply threads

    Please follow the -Lounge Rules- no personal attacks, no swearing, no politics or religion.

    Reply To: Wanting to find duplicate files

    You can use BBCodes to format your content.
    Your account can't use Advanced BBCodes, they will be stripped before saving.