• Excel vba – slow down while reading text file (2000)

    Home » Forums » AskWoody support » Productivity software by function » Visual Basic for Applications » Excel vba – slow down while reading text file (2000)

    Author
    Topic
    #422699

    Hi Folks,

    Attached is some code I’ve cobbled together for analysing a large amount of data (about 12million records) in a series of 8 text files, varying from ~120,000 records to 4,250,000 records each. The largest file is about 78Mb. Basically the code reads in a block of data, typically around 500 records, processes it, clears the worksheet and starts over with the next set of records.

    The problem is that the time taken to process the files (on my system – AMD 2500 Athlon, Win 2K & 512Mb RAM) increases by around 22 seconds per 100,000 records – eg the 1st 100,000 takes about 26 seconds, the 2nd takes 48 seconds, the 3rd takes 70 seconds, and so on.

    I’d be grateful if someone could take a look at the code and suggest some efficiencies – especially with a view to eliminating the increasing cycle times.

    Cheers

    Cheers,
    Paul Edstein
    [Fmr MS MVP - Word]

    Viewing 0 reply threads
    Author
    Replies
    • #964864

      The line Do While Seek(FileNum) <= LOF(FileNum) is very inefficient, it forces VBA to calculate the position within the file continuously. Instead, use Do While Not EOF(FileNum). It's about 7 times as fast on my PC.

      Your code now fills a worksheet, then clears the UsedRange repeatedly. I wonder if ActiveSheet.Cells.Clear might be faster than ActiveSheet.UsedRange.ClearContents, and also whether creating, filling and discarding a new workbook for each run might be better. I haven't tested these ideas.

      • #964868

        Hi Hans,

        I’ve just run a few tests, using a 300,000 record text file. What I’ve found is that using:
        . Do While Not EOF(FileNum) with ActiveSheet.Cells.Clear is about 4 seconds faster than what I was already getting.
        . Do While Not EOF(FileNum) with ActiveSheet.UsedRange.ClearContents (which is what I was originally using) is about 7 seconds faster than what I was already getting.
        This would seem to confirm what you said about Do While Not EOF(FileNum) being more efficient than Do While Seek(FileNum) <= LOF(FileNum). Conversely, it seems that ActiveSheet.UsedRange.ClearContents is more efficient than ActiveSheet.Cells.Clear.

        The faster execution is also accompanied by an apparent overall reduction in the increasing cycle times, which is a good thing. I'd have to test with a much larger file to confirm this.

        Cheers

        Cheers,
        Paul Edstein
        [Fmr MS MVP - Word]

        • #964951

          This code is a bit complicated. smile

          Does Excel have an Undo buffer you can clear? I wonder whether an accumulation of changes could explain part of the “slowing down” phenomenon.

          Also, if the file reading has anything to do with the slowing down, you could try using a TextStream object and its ReadLine method (part of the Scripting library’s FileSystemObject family of tools).

          • #964969

            Hi Jefferson,

            Thanks for the feedback. AFAIK Excel doesn’t have anything equivalent to Word’s UndoClear statement. In any event, since Excel’s Undo only remembers the last 16 actions, I doubt that’s having a material effect.

            Unfortunately, I don’t know enough about the “TextStream object and its ReadLine method” to make use of, let alone whether it might make a difference.

            Cheers

            Cheers,
            Paul Edstein
            [Fmr MS MVP - Word]

            • #964982

              The TextStream object is part of the Scripting library. Working with it is generally slower than using the “old-fashioned” file handling methods, so there is no point in trying it.

            • #965062

              Hi Hans,

              I decided to try a different approach, by reading the source data into an array, instead of into a worksheet, evaluating the items in the array and only writing out the items I am interested in.

              For what turned out to be 11.6 million records, the process took 16 minutes to complete (on a slower PC than the one in my original post) compared to 7.6 hours for only 4.9 million of the same records on the faster PC.

              Obviously, true array processing is much faster than using a worksheet as an array …

              Cheers

              Cheers,
              Paul Edstein
              [Fmr MS MVP - Word]

            • #965086

              Excellent!

            • #965112

              Better still, the process takes barely 5 minutes to complete processing 11.6 million records on the PC in my original post, or just 129 seconds for the previously-timed 4.9 million records, making the new code over 200 times faster than the original code!

              Cheers,
              Paul Edstein
              [Fmr MS MVP - Word]

    Viewing 0 reply threads
    Reply To: Excel vba – slow down while reading text file (2000)

    You can use BBCodes to format your content.
    Your account can't use all available BBCodes, they will be stripped before saving.

    Your information: