• Import TXT file into Excel (formatting issues)

    Home » Forums » AskWoody support » Productivity software by function » MS Excel and spreadsheet help » Import TXT file into Excel (formatting issues)

    Author
    Topic
    #467677

    I am looking for the easiest way to import the attached file into Excel so that analysis can be performed. The file was created by another application and has all the pertinent data. What I am struggling with is how to get the two lines of data onto one line in order to easily import it into Excel using the Text to Data feature. If you look at the copy/paste below, you’ll notice the header and the subsequent data is coming across on two lines. I cannot do a clean import into Excel and this is quite a big file. All suggestions are welcome.

    CHECK
    BANK CHECK # CHECK AMT CHECK DATE VENDOR # ADDRS # VENDOR NAME STATUS
    WO # AMOUNT G/L ACCT # DESCRIPTION INVOICE # INV VEND
    ————————————————————————————————————————————

    1 1 $38742.79 05/21/09 INT01 0 INTERNAL REVENUE SERVICE OUTSTANDING
    38742.79 01-00000-20600-00000-000 PAYROLL TAXES FOR PE 4/24/09 PPE 4/24/09

    1 2 $552.60 05/21/09 CON06 0 CONEXIS OUTSTANDING
    552.60 90-00000-23327-00000-000 FLEX 125-DEPENDENT CARE REIMBS 051509

    Thanks.

    Viewing 12 reply threads
    Author
    Replies
    • #1214942

      I think the easiest thing to do would be to have the program exporting this text file, export it in a more appropriate way. Some programs can export into an excel worksheet instead of using text.

      If it is only this file, I would manually work on it, trying to determine what “type” the line of text is. If you are only interested in the data and not the header, I would add first add a column numbering the items (to always be able to sort to the original list) then a column with a 0, 1, 2, 3 (0 for header, 1 for line1, 2 for line2, 3 for blank) then you can sort by type and then original number, delete the rows with 0 and 3, and then do a text to column with type 1 and type 2 and then move type 2 columns into the columns in the same rows as the type 1

      If you have to do this a lot a macro could be written to do it, but how complex will depend on how much it may change from time to time

      Steve

    • #1214946

      Hi aapke,

      Before loading the file into Excel, load it into Word and run the following macro against it, then re-save the file and load into Excel. From there you can use the Excel text-to-columns function, with fixed-with splits, to parse the data.

      Code:
      Sub StatementReformat()
      Application.ScreenUpdating = False
      With ActiveDocument.Range.Find
        .ClearFormatting
        .Replacement.ClearFormatting
        .Forward = True
        .Wrap = wdFindContinue
        .Format = False
        .MatchCase = False
        .MatchWholeWord = False
        .MatchAllWordForms = False
        .MatchSoundsLike = False
        .MatchWildcards = True
        'First Page
        .Text = "REPORT*EXTRACT FILE : [A-Z0-9]{1,}^13^12"
        .Replacement.Text = ""
        .Execute Replace:=wdReplaceOne
        'Column Headers (except New First Page)
        .Text = "^12*VEND^13*[-]{1,}^13"
        .Execute Replace:=wdReplaceAll
        'Header Underline on First Page
        .Text = "[-]{1,}^13"
        .Execute Replace:=wdReplaceAll
        'Last Page
        .Text = "          TOTAL #*^12REPORT*{1,255}PAGE*{1,255}FUND TOTALS*[=]{1,}*^13*^13"
        .Execute Replace:=wdReplaceAll
        'Unwanted Header Lines - First Page
        .Text = "REPORT*CHECK^13"
        .Execute Replace:=wdReplaceAll
        'Empty Paragraphs
        .Text = "[ ]{1,}^13"
        .Execute Replace:=wdReplaceAll
        'Header Line Wrap - First Page
        .Text = "(STATUS)^13"
        .Replacement.Text = "1"
        'Record Line Wraps
        .Execute Replace:=wdReplaceAll
        .Text = "(OUTSTANDING)^13"
        .Execute Replace:=wdReplaceAll
        .Text = "(CLEARED)^13"
        .Replacement.Text = "1    "
        .Execute Replace:=wdReplaceAll
        .Text = "(VOIDED)^13"
        .Replacement.Text = "1     "
        .Execute Replace:=wdReplaceAll
        'Pad Records with Multiple Rows
        .Text = "(^13)([!^13]{1,120}^13)"
        .Replacement.Text = "1                                                                                                                               2"
        .Execute Replace:=wdReplaceAll
        .Execute Replace:=wdReplaceAll
      End With
      Application.ScreenUpdating = True
      End Sub

      Update:
      I noticed that negative values in the data have the ‘-‘ sign trailing the values concerned. Excel doesn’t handle those too well. You can convert them to values with a preceding ‘-‘ sign by adding the following four lines immediately before the ‘End With’ statement in the macro:

      Code:
        'Reformat -ve values
        .Text = "( )([0-9]{1,}.[0-9]{2})(-)"
        .Replacement.Text = "321"
        .Execute Replace:=wdReplaceAll

      Cheers,
      Paul Edstein
      [Fmr MS MVP - Word]

    • #1215159

      Okay, that was pretty cool to just watch. Thank YOU! Do I need to run it more than once to fix the issue with the multiple lines for one record?

      1 3000022 $697.49 04/16/09 COA05 0 COASTAL BLUE, INC OUTSTANDING 24.84 50-00000-62250-07203-000 FILM/MAPS/BLUEPRINTS 423225

      25.91 25-00000-63608-80028-000 FILMS/MAPS/BLUEPRINTS 423272
      12.34 50-00000-62250-09207-000 FILMS/MAPS/BLUEPRINTS 423458

      20.10 25-00000-63608-80058-000 FILMS/MAPS/BLUEPRINTS 423527
      19.22 50-00000-62250-09207-000 FILMS/MAPS/BLUEPRINTS 423950

      59.86 25-00000-63608-80068-000 FILMS/MAPS/BLUEPRINTS 423980
      51.56 25-00000-63608-80064-000 FILMS/MAPS/BLUEPRINTS 424065

      77.85 01-62000-63608-00000-000 FILMS/MAPS/BLUEPRINTS 424076
      30.44 50-00000-62250-07104-000 FILMS/MAPS/BLUEPRINTS 424187

      77.58 50-00000-62250-07107-000 FILMS/MAPS/BLUEPRINTS 424191
      37.34 50-00000-62250-07101-000 FILMS/MAPS/BLUEPRINTS 424320

      47.02 01-62000-63608-00000-000 FILMS/MAPS/BLUEPRINTS 424439
      20.10 60-94030-63608-00000-000 FILMS/MAPS/BLUEPRINTS 424479

      29.56 60-94030-63608-00000-000 FILMS/MAPS/BLUEPRINTS 424480
      53.71 60-94030-63608-00000-000 FILMS/MAPS/BLUEPRINTS 424620

      34.53 50-00000-62250-07104-000 FILMS/MAPS/BLUEPRINTS 424695
      75.53 25-00000-63608-80065-000 FILMS/MAPS/BLUEPRINTS 425067

      Thanks.

    • #1215179

      Hi aapke,

      You shouldn’t need to. The two consecutive ‘.Execute Replace:=wdReplaceAll’ statements near the end of the first code module I posted should take care of that – it did in my testing.

      Cheers,
      Paul Edstein
      [Fmr MS MVP - Word]

    • #1215258

      I just knew I did something incorrectly. Basic steps I used are: Generate the report again in the finance system. Saved the generated file as .TXT. Opened it up in Word. Inserted your totally cool code into a module. Saved the file as a .docm. Ran the macro, sat back, and watched it work it’s magic. Saved the file as a .TXT again. Imported it into Excel using the Text to Data feature. And yes, I did incorporate the additional 4 lines.

      Did I miss a step along the way? Either way, the new file I have to work with is so much cleaner than the old one, it won’t take more than a few hours to clean it up manually if needed! Thank you so much!

      aapke

    • #1215261

      Hi aapke,

      I suspect what you’re seeing is line-wrapping effects. Attached is a Word file containing the full macro, with the prcoessed data and the page format set up so that each record takes a single line in the document. As you can see, the multi-entry records are fully padded.

      Cheers,
      Paul Edstein
      [Fmr MS MVP - Word]

    • #1215280

      Hi macropod,

      I’ve copied the steps exactly (I believe) and am still getting the line wrapping. Can you tell from a quick glance what I am doing incorrectly?

      Thanks again,
      aapke

    • #1215299

      Hi aapke,

      I don’t understand why your system isn’t behaving the same as mine. Is your copy of Word of Word 2007 fully updated? There were some problems with the vba implementation in the earlier releases. I’ve run the code with both Word 2000 SP3 and Word 2007 SP2 and it works fine. Nevertheless, it seems what you’re ending up with is not so much a line wrapping issue as unwanted line breaks in every 2nd padded line. You should be able to fix that by adding the following extra lines to your code:

      Code:
        'Fix Broken Lines
        .Text = "( )(^13)( )"
        .Replacement.Text = "13"
        .Execute Replace:=wdReplaceAll

      Cheers,
      Paul Edstein
      [Fmr MS MVP - Word]

    • #1215373

      Ah. That may just be it. My copy of Word 2007 shows as “MS Word 2007 (12.0.6514.5000) SP2 MSO (12.0.6521.5000). Hopefully that’ll fix everything!

      Thank you again for ALL the assistance.
      aapke

    • #1215378

      Hi aapke,

      My copy of Word 2007 shows as “MS Word 2007 (12.0.6514.5000) SP2 MSO (12.0.6521.5000).

      In that case, you do indeed have Word 2007’s SP2 update installed. Strange – perhaps it’s just some little glitch lurking somewhere else in your system.

      Cheers,
      Paul Edstein
      [Fmr MS MVP - Word]

    • #1215381

      Hi aapke,

      Instead of adding the extra lines of code for repairing the broken data lines that I suggested earlier, try changing:

      Code:
        'Pad Records with Multiple Rows
        .Text = "(^13)([!^13]{1,120}^13)"
        .Replacement.Text = "1                                                                                                                               2"
        .Execute Replace:=wdReplaceAll
        .Execute Replace:=wdReplaceAll
      

      to

      Code:
        'Pad Records with Multiple Rows
        .Text = "(^13)([ ]{20})([ ]{1,8}[0-9]{1,8}.[0-9]{2})"
        .Replacement.Text = "12222222       3"
        .Execute Replace:=wdReplaceAll

      You’ll notice that the new block has only one ‘.Execute Replace:=wdReplaceAll’ statement. That’s all I found necessary and it should make the overall execution just that little bit quicker – as should not needing those other extra lines of code for repairing the broken data lines .

      Cheers,
      Paul Edstein
      [Fmr MS MVP - Word]

      • #1581397

        Hi aapke,

        Instead of adding the extra lines of code for repairing the broken data lines that I suggested earlier, try changing:

        Code:
          'Pad Records with Multiple Rows
          .Text = "(^13)([!^13]{1,120}^13)"
          .Replacement.Text = "1                                                                                                                               2"
          .Execute Replace:=wdReplaceAll
          .Execute Replace:=wdReplaceAll
        

        to

        Code:
          'Pad Records with Multiple Rows
          .Text = "(^13)([ ]{20})([ ]{1,8}[0-9]{1,8}.[0-9]{2})"
          .Replacement.Text = "12222222       3"
          .Execute Replace:=wdReplaceAll

        You’ll notice that the new block has only one ‘.Execute Replace:=wdReplaceAll’ statement. That’s all I found necessary and it should make the overall execution just that little bit quicker – as should not needing those other extra lines of code for repairing the broken data lines .

        Hi,
        I also tried this code and got really impressed the way you have used regular expressions within the with…end with block. I am an absolute beginner as far as RegEx is concerned. Just want to know if you could allow me to share a similar file with you that I have from our database. That way looking at that code that you will assist me with, will give me a better understanding how RegEx could help me with kind of report I have to deal with on regular basis.

        I would be waiting for your feedback.

        Kind regards
        Syed

    • #1215434

      Will do!

      Thank you again.
      aapke

    • #1581679

      Hi Paul,
      OK now I understand this is not RegEx but wildcards within MS Office that are similar to RegEx. However, even though I have looked up some reference material but still need your help. Only if you could explain the logic behind following few lines extracted from your above code. Note that to make my query more understandable and precise I have avoided With…End With block and have re-written it otherwise:

      Code chunk #1:
      ActiveDocument.Range.Find.Text = ” TOTAL #*^12REPORT*{1,255}PAGE*{1,255}FUND TOTALS*[=]{1,}*^13*^13″

      Code chunk #2:
      ActiveDocument.Range.Find.Text = “(CLEARED)^13”
      ActiveDocument.Range.Find.Replacement.Text = “1 ”
      ActiveDocument.Range.Find.Execute Replace:=wdReplaceAll

      Code chunk #3:
      ‘Pad Records with Multiple Rows
      ActiveDocument.Range.Find.Text = “(^13)([ ]{20})([ ]{1,8}[0-9]{1,8}.[0-9]{2})”
      ActiveDocument.Range.Find.Replacement.Text = “12222222 3”
      ActiveDocument.Range.Find.Execute Replace:=wdReplaceAll

      Question referring to chunk# 1: I need your help as to what all those curly braces and numbers mean? I am not sure about spaces before word “TOTAL” in your code. Also the CHECKREG.TXT does not have word “FUND TOTALS”, so I am not sure what does this part of code line mean?

      Question referring to chunk# 2 : Understand that code looks for word “CLEARED” followed by paragraph mark but I need some clarity on the next two lines. What these last two lines are doing?

      Question referring to chunk# 3: I am totally clueless to these last 3 lines. Can you please explain about their function?

      It is hoped you will appreciate my intent to learn some code from your already created set of code instead of digging into something from scratch.

      I would be waiting for your response.

      REGARDS,
      SMI

    Viewing 12 reply threads
    Reply To: Import TXT file into Excel (formatting issues)

    You can use BBCodes to format your content.
    Your account can't use all available BBCodes, they will be stripped before saving.

    Your information: