News, tips, advice, support for Windows, Office, PCs & more
Home icon Home icon Home icon Email icon RSS icon

We're community supported and proud of it!

  • Compare Contents of Two Documents

    Home Forums AskWoody support Microsoft Office by version Compare Contents of Two Documents

    Viewing 18 reply threads
    • Author
      Posts
      • #2299232
        Nathan Parker
        AskWoody_MVP

        I have two long lists of book titles I need to compare the differences between and see if one list has all of the titles from the second list or is missing any titles.

        In terms of word processors, I have Word, BBEdit, TextEdit, Nota Bene, Google Docs, Mellel, Nisus Writer Pro, and Pages.

        Is there a way to accomplish this, or would I have to simply comb through the lists manually?

        Thanks!

        Nathan Parker

      • #2299244
        access-mdb
        AskWoody MVP

        Personally I would use Access to import the list as two tables and then use the query wizard to list what’s in one and not in the other. I suspect that as you’re on a Mac this isn’t available, but do you have a relational database app?

        Otherwise use Excel (or similar). Copy the two lists into separate columns and sort them (separately). Go to the midpoint of the list (e.g. 100 items, go to line 50). If they match there’s no difference in lines 1-50. Delete them and repeat. If you find a non match then the extra ones are in the first 50 lines (using the example). Go to the midpoint of those and compare. Just rinse and repeat. You don’t say how long your lists are, so this method might take longer for more. But it’s actually quite efficient. And it will find differences in both columns.

        Every day is the dawn of a new error

        2 users thanked author for this post.
      • #2299252
        anonymous
        Guest

        You have not written the type of files that the long lists are in, but if .txt text files you could try “WinMerge” – see https://winmerge.org/downloads/?lang=en  .

        There is a portable version of WinMerge (scroll down to the unofficial PortableApps link at the page above) so you don’t need to install it if this is just a temporary thing.

        I have a combination of 2 programs set up as described at https://dottech.org/167404/how-to-add-compare-to-and-compare-later-options-to-context-menu-in-windows-tip/  which achieves the same kind of thing i.e. a side by side comparison of files.

        I only continue with this method because I am familiar with it, but portable WinMerge might be better if you are starting from scratch.

         

        1 user thanked author for this post.
      • #2299254
        Kathy Stevens
        AskWoody Lounger

        It would help to have a little more information regarding how the lists were created.

        Are both lists in the same format?

        Are they variations of the same MS Word document? Are they in Excel? Etc.

        If they are variations of the same MS Word document you can use Word’s compare tool.

        If not, and assuming that the lists are in the same format, you can paste the list into two separate Word documents, format each list so that each title is on a separate line using the same order (title, author, etc.). Separate the components of each title using the tab key. Then cut and paste each document into an Excel sheet. Sort the columns of each Excel sheet by title. Then cut and paste the titles form each list into a third Excel sheet, sort the columns alphabetically, and compare.

        1 user thanked author for this post.
        • #2299279
          mn–
          AskWoody Lounger

          … are the lists ordered?

          A lot of the usual “diff” tools will be a lot less useful if ordering is also significant. Oh and numbered lists are a bit of a bother requiring some additional work.

          The traditional way for plaintext files on Unix/Linux is to use the sort command line tool (forcing a fixed order for entries in a file), and then diff.

          Also if you only need to know how many entries are in only one of the lists, then you can do a concatenation with “sort file1 file2 |uniq -u” to report any lines that occur only once across both files. This won’t tell you which file any given line was in, though.

          (“sort -df …” to ignore case differences and non-alphanumeric characters in sort, “diff -iw” to ignore case and whitespace on a line in diff, “uniq -i” will also ignore case.)

          2 users thanked author for this post.
      • #2299255
        anonymous
        Guest
      • #2299267
        Ascaris
        AskWoody_MVP

        I use the excellent Meld program for this. While it is primarily a Linux tool, it does have a Windows and an unofficial (so far… it sounds like they may be planning Mac support from the way they put it, but not yet) Mac build also.

         

        Group "L" (KDE Neon Linux 5.21.4 User Edition)

        2 users thanked author for this post.
        • #2299803
          geekdom
          AskWoody Plus

          Exactly what I’ve been looking for. This site provides nice, useful resource software information.

          On Hiatus {with backup and coffee}
          offline▸ Win10Pro 2004.19041.572 x64 i3-3220 RAM8GB HDD Firefox83.0b3 WindowsDefender TRV=1909 WuMgr
          offline▸ Win10Pro 20H2.19042.685 x86 Atom N270 RAM2GB HDD WindowsDefender WuMgr GuineaPigVariant
          online▸ Win10Pro 20H2.19042.804 x64 i5-9400 RAM16GB HDD Firefox86.0 WindowsDefender TRV=20H2 WuMgr
          1 user thanked author for this post.
      • #2299276
        Kathy Stevens
        AskWoody Lounger

        Then there is the no brainier approach.

        Open both files.

        Block and copy the first title in the “base document” and do a search for it in the “second document”.

        If the search reveals that a title appears in both documents highlight it in both documents and move on to the second title in the “base document” and repeat the process.

        At the end of the process the titles that are not highlighted are not duplicated in either file.

        1 user thanked author for this post.
      • #2299392
        Nathan Parker
        AskWoody_MVP

        One list is on a web page at the moment, but I can copy the entire list into any form of document type (so I could do Word, TXT, RTF, etc).

        The second is a list inside a software app, but it gives me an option to copy all where I could again paste it into any document type.

        They both don’t seem to be in the same order, even though they should be since they’re a similar list of books.

        Nathan Parker

      • #2299448
        anonymous
        Guest

        Turn both versions into Word documents.  Save both.

        I have Word 2019 – open one document, then Review/Compare with the second?

        I recall that earlier versions of Word and Wordperfect had a similar tool.

        1 user thanked author for this post.
      • #2299525
        Vincenzo
        AskWoody Lounger

        I would turn them both into Word docs, use the built in tools to alphabetize them. (they may have to be in tables, can’t quite remember, in order to alphabetize.

        Then go to Review Tab. then Compare/Two Versions of a doc (legal blackline).

        I think you can do it in Excel too, but I’ve not done that.

        • This reply was modified 6 months, 2 weeks ago by Vincenzo.
        1 user thanked author for this post.
      • #2299720
        Nathan Parker
        AskWoody_MVP

        About 1,683 titles.

        Nathan Parker

      • #2299787
        access-mdb
        AskWoody MVP

        A way to do this in Excel is on https://trumpexcel.com/compare-two-columns/, look for

        Example: Compare Two Columns and Highlight Mismatched Data

        This works with a small sample by highlighting all in column 1 that are not in column 2 and vice versa and using styles to show differences. The list don’t have to be sorted. I shall be using this in future when I have lists to compare.

        Google sheets doesn’t appear to have this but probably MS Excel online might. I’ll have a look later.

        Every day is the dawn of a new error

        1 user thanked author for this post.
      • #2299793
        access-mdb
        AskWoody MVP

        Well I’ve found an even simpler way in Excel. Put your data into one column, select it. Now go to Data/remove duplicates. All done! Well it was in my simple test. Of course, you won’t find duplicates like ‘Lord of the Rings, The’ and ‘The Lord of the Rings’.

        Every day is the dawn of a new error

        • This reply was modified 6 months, 2 weeks ago by access-mdb.
        1 user thanked author for this post.
      • #2299808
        doriel
        AskWoody Lounger

        I use Notepad ++
        Download plugin “Compare” – it takes 10 seconds
        Output is crystal clear – I added two lines

        002-2

        But this is not for DOCX documents, just for plain text, source codes, etc.

        Dell Latitude E6530, Intel Core i5 @ 2.6 GHz, 4GB RAM, W10 1809 Enterprise

        HAL3000, AMD Athlon 200GE @ 3,4 GHz, 8GB RAM, Fedora 29

        Attachments:
        1 user thanked author for this post.
        • #2299825
          access-mdb
          AskWoody MVP

          Thanks Doriel, I had forgotten that Notepad++ had a compare plugin. The only thing I would say is that I had more differences than you and the output is a bit confusing (better learn how to interpret it). Nathan has said he has over 1600 titles, so this might not be easy (unless there’s not many differences). I think using Remove differences in Excel is much easier, but it’s always good to have more than one way of doing things!

          Notepad
          Excel difference
          Excel remove dups
          Excel result

          Every day is the dawn of a new error

          Attachments:
          2 users thanked author for this post.
          • #2299831
            doriel
            AskWoody Lounger

            I agree, you are absolutely true. Notepad++ is just basic tasks. When comparing so much titles, this is not the best nor easiest way.

            But it can help sometimes when performing basic tasks. Hope this colud help somebody.
            Thank you for the Excel how-to!

            Dell Latitude E6530, Intel Core i5 @ 2.6 GHz, 4GB RAM, W10 1809 Enterprise

            HAL3000, AMD Athlon 200GE @ 3,4 GHz, 8GB RAM, Fedora 29

            1 user thanked author for this post.
      • #2306761
        Nathan Parker
        AskWoody_MVP

        I managed to compare the content of both documents using Excel. Bad news is it turns out that both lists display the titles differently (some titles are listed with minor differences between the two lists), so there are a ton of “false differences”. I’ll still have to comb through them to see the real differences between the two.

        Nathan Parker

        • #2306794
          Kathy Stevens
          AskWoody Lounger

          Nathan

          When you set up your Excel sheets did you set up one column of book titles, one column of authors last name, one column of authors other names, etc.?

          If you did, you can sort your sheet by author and title.

          What I do when I have a problem similar to yours is to block, copy, and paste each list into a Word document using text only paste. I then go into each title and separate each component of the listing by tabs (title, tab, authors last name, tab, authors other name, etc.) Once the entire Word document is formatted you can simply copy and paste the material into an Excel sheet and each component of the list that is separated by a tab will fall into a different column. Once an Excel sheet is complete you can sort your list of titles by author’s last name, title, etc.

          Do the same for the second list of the books.

          Then standardize the columns between the two Excel sheets, and then you can copy and paste the second Excel sheet into the first and then sort by author, title, etc.

          By sorting by author you should be able to reconcile the variations between the two lists of publications.

          • #2306796
            Kathy Stevens
            AskWoody Lounger

            Nathan

            One more thing.

            The information for each book should be on a separate line in the Word document.

      • #2306775
        OscarCP
        AskWoody Plus

        Nathan: “I have two long lists of book titles I need to compare the differences between and see if one list has all of the titles from the second list or is missing any titles.

        If you can copy the two documents to a plain text file each (i.e. two ASCII files), then you can open Terminal and type in the command line:

        $ diff file1 file2 >diference

        where “file1”, “file2” are whatever the files are actually called, while the file “difference” has all the, well, differences between the two lists.

        You could pipe the output of diff to some formatting utility to get a nicer listing of the differences, but I woldn’t bother:

        $ diff file1 file2 | whatever > difference

        There are also different options in “diff” that make it possible to deal with binary files, mixed binary/ASCII, etc. and to learn about them and how to use them the command is:

        $ man diff

        That opens a list of options with the description of what each one does. One clicks on the space bar to move from the current page to the next one.

        Then the command with one or more of those options would be:

        $ diff option_1  option_2 option_3 … file1 file 2 > difference , or else:

        $ …. | whatever > difference.

        (“$” is whatever screen prompt you get in Terminal, and the command is to be typed right after the prompt.)

        Windows 7 Professional, SP1, x64 Group W (ex B) & macOS Mojave + Linux (Mint)

        1 user thanked author for this post.
      • #2306783
        Nathan Parker
        AskWoody_MVP

        Would this work even with the major title differences between the two lists?

        Nathan Parker

        • #2306787
          OscarCP
          AskWoody Plus

          I just tried to show you an example, but this got mysteriously blocked by Waterfox.

          Now I am trying again, with “Chrome”, see how this goes:

          Anyhow, you get a list of line comparisons with “<” indicating a line from the first file and “>” indicating one from the second file, both listed one immediately after the other, separated by a line “—–” whenever they are different.

          If there is one line in one file without a counterpart in the other, that unpaired line gets listed by itself (the difference here is between it and  another line does not exist).

          Lines that are identical in both files are not listed.

          And so on.

          It does not matter whether on line is a title or not: all lines are considered to be just “lines.”

          Windows 7 Professional, SP1, x64 Group W (ex B) & macOS Mojave + Linux (Mint)

          1 user thanked author for this post.
          • #2306793
            OscarCP
            AskWoody Plus

            Added to my previous comment: The length of the lists has been mentioned here as a possible problem when they are “long.”

            It does not matter how long are the lists being compared with”diff”, because it can compare gigabyte-size long files (but taking some time doing that… ) With more normal size files, say up to a megabyte, it takes practically the blinking of an eye (or two blinks, depending on the machine one is using).

            Windows 7 Professional, SP1, x64 Group W (ex B) & macOS Mojave + Linux (Mint)

            1 user thanked author for this post.
      • #2306795
        PaulK
        AskWoody Lounger

        If in Windows, use the FC (File Compare) command. See [ FC /? ] for syntax. Some of the switches support really powerful functionality. As others have commented, it is almost imperative that the data be in textual format.

        I do [ FC file1.txt file2.txt > file12.txt ]. (’12’ means differences file of ‘1’ vs ‘2’.)

        To clear false differences (due to formatting?), edit file1.txt, save as file3.txt. If necessary, edited file2.txt becomes file4.txt. Repeat recursively.

        2 users thanked author for this post.
        • #2306797
          OscarCP
          AskWoody Plus

          PaulK: Yes, thanks for pointing this out.

          Same as “diff”, “FC” is a very handy DOS file-compare application, at least for ASCII files (I’ve only used it with that type of files.).

          The idea in FC is the same as that in “diff” found in macOS, Linux, Unix, FreeBSD, etc. I think that several DOS commands were actually ports, from UNIX to IBM-clone Intel PCs, of “Terminal” line commands with their names and the exact way they are written somewhat modified. In the Macs and in Linux PCs “diff” is one of the operating system’s tools. I believe that, at least in the days when Win 7 was young, it was also part of “Windows Tools” along with other UNIX commands. It used to be available in Linux-emulation software for Windows, such as Cygwin. Linux emulation might be done differently in Windows 10, now that MS has decided to participate actively in the Linux universe.

          By the way, the file where the differences are saved can have any name the user likes to choose. I wrote “differences” but it can be anything else, as long is it is a normal text file name.

          Windows 7 Professional, SP1, x64 Group W (ex B) & macOS Mojave + Linux (Mint)

          1 user thanked author for this post.
      • #2306965
        Nathan Parker
        AskWoody_MVP

        I’ll take a look at this further and report back. Thanks!

        Nathan Parker

    Viewing 18 reply threads

    Please follow the -Lounge Rules- no personal attacks, no swearing, no politics or religion.

    Reply To: Compare Contents of Two Documents

    You can use BBCodes to format your content.
    Your account can't use Advanced BBCodes, they will be stripped before saving.