• Faster and FASTER searches (Word/Excel 2000)

    Home » Forums » AskWoody support » Productivity software by function » Visual Basic for Applications » Faster and FASTER searches (Word/Excel 2000)

    Author
    Topic
    #451319

    (attached full text and VBA code)
    I don’t get it. I’m brilliant, but not that brilliant.
    Why is my Word/VBA-written search faster than Microsoft’s by a factor of about 45?
    Attached is a very stripped-down version of a humungous search and move/copy tool I’ve written for a client. I’m testing the full version by making backups & archives across my network.
    A second client, with 5,000 English-language and 1,700 French language documents requests a service, and I use my first client’s tool to search for second client’s documents.
    I perform the tests both in Windows Explorer and using my stripped-down tool (“Findr“).

    English French
    Folders 164 151
    Files 5,034 1,760
    MBytes 222 40
    Explorer run time 17m 0s 7m 15s
    Explorer finds 300 6
    Findr run time 0m 28s 0m 8s
    Findr finds 442 10

    I find more files with “Temperature” because I force strings into upper-case, and hence some accented characters become un-accented and are available to me. (A bonus when I’m asked to strip “Detail” and “Garantie” from files).

    The document sets are regular documents, all Word 2000 *.doc, no graphics, no tables, ranging in size from 21KB to 34KB.
    I play no games while I’m running the Explorer job, no anti-virus running in the background, no Internet browsing, no LAN activity (e.g. backups). I run a check disk and defrag weekly; my disks are clean and rebooted.
    I know that Microsoft software can be appalling, but I’m having trouble explaining how it can be this bad.

    Viewing 2 reply threads
    Author
    Replies
    • #1110794

      Hi Chris
      I know that I suffer from refrigerator blindness but I cannot find the code.

      • #1110796

        >I cannot find the code.

        Don, you’ll find it on Pages 3-8 of the document body itself.You can paste the code into a code module in Word or Excel or, A.F.A.I.K., PPT, Access, Outlook, Project etc.

        You are not alone in “refrigerator blindness” (although I note with more than passing interest that the paragraph “Comments” in your referenced link introduces a trailing comma within the quoted string. Nice Touch!) for I often stand alone in aisle 5 of a supermarket, staring at all the dog food, searching for just a small bag of treats for He-Who-Must-Be-Cuddled. I walk the entire length, twice; check the overhead signs again (“Dog and cat food”), and, as the tears slowly run down my face, tackle a red-shirted shelf-stacker, pleading. He usually tells me I’m standing with my back to it! They put the cat food, treats, litter, toys, a cornucopia on the other side of the aisle!.
        On purpose, I sometimes think.

    • #1110831

      >I force strings into upper-case

      could have something to do with it. It would be interesting to test run your code with the case conversion commented out.

      But I’d be surprised if that was enough to account for even half the difference.

      • #1110845

        CG>I force strings into upper-case
        > could have something to do with it.
        I don’t think so.
        I put the Force Case flag in this morning after first puzzling over the speed increase Monday night. I ran several tests during Tuesday a smy schedule permitted. This (Wednesday) morning I re-checked the extra files found by me, and decided to institute the case flag, and re-ran both sets (Eng/Fr) both ways. Not a lot of difference on this machine (see next reply)

        >But I’d be surprised if that was enough to account for even half the difference.
        This is the root of my puzzle; being 20% faster or slower than someone else is understandable, but 30-50 times faster caused me much amazement, and a bit of fear (“How have I gone wrong ….?”)

        I see that Hans has some answers…

    • #1110833

      Your code is remarkably fast, but I don’t see such a huge difference as you did – it is about a factor 4 faster than Windows native search in my experiments using Word 2000 on a 2.5 GHz Pentium IV. I tried it on a set of folders with mixed content.
      Some factors that might contribute to the difference:

    • As far as I can tell, you open each file as a binary file and search for the text string. Windows tries to search for “real” text content. When I used a short search string, it was found by your algorithm in lots of .jpg files
    • As far as I can tell, you don’t take Unicode into account (where characters take up 2 bytes); I think Windows does.
    • By default, Windows searches in .zip files too, meaning that it has to unzip the contents temporarily.

      Still, I’m impressed!

    • #1110847

      > 2.5 GHz Pentium IV

      I should have added that I’m running this on a 2-year-old laptop 2GB/2GHz/100GB, still reasonably
      fast, for its age. Overall speed ought not to have a lot to do with it, since both Explorer and Findr
      are using the same CPU and hard drive.

      >set of folders with mixed content.
      My sample is a set of 7,000 Word documents, 20K-34K, and those could be branded as easy – however,
      I’ve been running this code in the humungous Bulker with similar results.
      Typically I sit at the client’s machine and we run some find and move/copy tests, check the results, and continue.
      It’s when I go to confirm “number of files found” with Explorer that I’m staggered.
      That is, the table in my first post was trumped up, in a sense, but even so, a factor of 4x faster seems odd to me.
      Always makes me wonder what I have done wrong.

      Your three bulleted points are all correct (although I thought that on my machine I’d turned of the ZIP/CAB DLLs).

      For most needs I suspect that the length of the (text) search string will be sufficiently long to
      avoid JPG, GIF, ZIP etc, although we could explore some COM and EXE or DLL.
      UniCode could be interesting. I could run that as a separate test, after the plain-text test.
      Below I show the nested test procedures, the testing is ANDed, so I’d be doing the lengthy UniCode test
      only if I’d passed through the plain code test.

      If blnFilterNameExtent(flt, strNameExtent) Then
        If blnFilterAttributes(flt, strFullName) Then
          If blnFilterContent(flt, strFullName) Then
            If blnFilterDateModified(flt, strFullName) Then
              If blnFilterDateAccessed(flt, strFullName) Then
                If blnFilterDateCreated(flt, strFullName) Then
                If blnFilterLength(flt, strFullName) Then
                  If blnFilterLOB(flt, strFullName) Then
                    If blnFilterActiveRetainDelete(flt, strFullName) Then
                      If blnFilterRetainDeleteDate(flt, strFullName) Then
                        If blnFilterOpenClosed(flt, strFullName) Then
                          If blnFilterOriginalCopy(flt, strFullName) Then
                            blnFilter = True

      Most filters accept an array of ranges, so that one can ask for files that are DateModified within
      any one of three non-contiguous ranges, or files of length 20K-40K, 100K-130K or 200K-220K, and so on.

      >Still, I’m impressed!
      You could have used the word “Brilliant”, ….. (grin!)
      I’m not. Well, OK, I am, but I suspect that Explorer is using old and inefficient code from 1983 or thereabouts.

      • #1110870

        You wrote[indent]


        >Still, I’m impressed!
        You could have used the word “Brilliant”, ….. (grin!)


        [/indent]OK, here you go:

        Still, I’m Brilliant! evilgrin

Viewing 2 reply threads
Reply To: Faster and FASTER searches (Word/Excel 2000)

You can use BBCodes to format your content.
Your account can't use all available BBCodes, they will be stripped before saving.

Your information: