• Copying text from a PDF into a Word document

    Home » Forums » AskWoody support » Productivity software by function » MS Word and word processing help » Copying text from a PDF into a Word document

    Author
    Topic
    #487920

    I attached the file in case my question related only to specific types of PDF files. Form the attached, I am trying to select text and paste it into a Word document. What I get however is text broken up by inserted extraneous returns, such as

    33213-Text

    I have tried with both Nitro PDF and Foxit and gotten the same result. Isn’t it possible to copy text directly from a PDF unformatted?

    Regards,
    Chuck Billow

    Chuck Billow

    Viewing 14 reply threads
    Author
    Replies
    • #1376084

      This is what I get copying it from pdf exchange. There is a hard return after each line in the original pdf.

      • #1376085

        So what’s different? Yours came out as it should.

        Chuck

        Chuck Billow

    • #1376093

      Chuck,

      When I copied and pasted, I got what you got. When I used my pdf software (Nitro 8) to convert the pdf to Word, the result looked a lot like jwitalka’s.

      • #1376206

        So then Pam, I would need a full-blown PDF editor to really do what I need? I’m surprised that there isn’t a way to just copy the text. I would get it if it were losing empty lines, but *adding* them? Pooh!

        Chuck

        Chuck Billow

        • #1376217

          So then Pam, I would need a full-blown PDF editor to really do what I need? I’m surprised that there isn’t a way to just copy the text. I would get it if it were losing empty lines, but *adding* them? Pooh!

          Chuck

          Rather than consider a full blown PDF editor, re-consider Office 2013. See post #5, #7 & #8. You can do what you want in Word 2013.

          Joe

          --Joe

          • #1376243

            You might be right Joe, and I might end up going that WA. But I gotta tell you that firstly, it would bug the hell out of me to pay ad infinitum for just a feature or so, and secondly, perhaps unimportant to most, I think 2013 is even less visually pleasant than 2010, which, to me, was a definite decline from 2007.

            I dunno…I was just hoping I suppose that there was an “easy” answer here that wouldn’t be quite such a major change or investment. So far, in the little reading I have done, editing PDF’s is the only truly useful/necessary change or added capability for the average user — and even that is a use that the average person wouldn’t necessarily ever even apply.

            Maybe I’m missing something…have to look again I suppose.

            Thanks,
            Chuck

            Chuck Billow

        • #1376260

          So then Pam, I would need a full-blown PDF editor to really do what I need? I’m surprised that there isn’t a way to just copy the text. I would get it if it were losing empty lines, but *adding* them? Pooh!

          Chuck

          I don’t understand. I thought you just wanted to copy text from a pdf into word. PDF Xchange and according to Pam Nitro 8 did that. What else do you need?

          Jerry

          • #1376262

            Jerry, I thought both of these did so too, but as yet I haven’t been able to get either to do it.

            Chuck

            Chuck Billow

          • #1377174

            I’m sure most know this quick trick for getting rid of the paragraph marks at the end of each line in the Microsoft Word document — the document you created when you copy text from a PDF. In case someone doesn’t, I’ll go through it.

            First, it is helpful to turn on “paragraph marks” in the Word document: In the “Paragraph” section of the Ribbon (Word 2007), click on what looks something like a backward capital letter P — the paragraph (or line feed) mark. Be sure you can see the paragraph marks throughout your document.

            Now, to quickly get rid of all the pesky extra paragraph marks, in Word, type Ctrl-H to open the “Find and Replace” wizard. Enter “^p” (without the quotes; that’s “caret-lower-case-p”) in the “Find What” window. Enter ” ” (that’s a space; one tap of the space bar) in the “Replace With” window. Type Alt-A to “Replace All.” Now everything is in one paragraph. You’ll have to go through and manually enter new paragraphs (“Enter” key) where desired.

            There is an advanced quick trick that makes preserving the original paragraphs a little bit easier. Before removing all the paragraph marks as described above, go through the Word document and locate each paragraph mark that you want to keep (the ones marking the ends of paragraphs). These should be easy to spot, because they will frequently be short lines. Each time you find one, enter a character sequence not found elsewhere in your document. (An easy one is “zzz”)

            Now, replace all paragraph marks, as described in the first paragraph of my message. (Once you’re comfortable with the keystrokes involved, it really takes only four or five seconds.)

            Having done that, go through the same procedure, but replace “zzz” with “^p” to begin a new paragraph. Or replace “zzz” with “^p^p” to do a double-space between paragraphs in your Word document.

            One more tip: If the Word document already has two paragraph-marks between paragraphs (causing a double-space between paragraphs), it’s even easier. Before doing anything else, Ctrl-H and in “Find what” type “^p^p” (without the quote marks). In “replace with” type “zzz” (without the quote marks.) Now Alt-A to Replace All. This differentiates between the single paragraph marks and the double-space double paragraph marks, by turning the double paragraph marks into something unique: “zzz” or whatever. Don’t panic at the looks of your document!

            Now replace “^p” (the single paragraph marks) with ” ” (a space).

            Finally, replace “zzz” with “^p^p” to restore your double-space between paragraphs.

            • #1377181

              I’m sure most know this quick trick for getting rid of the paragraph marks at the end of each line in the Microsoft Word document — the document you created when you copy text from a PDF. In case someone doesn’t, I’ll go through it.

              One could do that, or one could use the macro I supplied and have it done in an instant (almost).

              Your suggestion of deleting all paragraph breaks at the start (or editing the document by adding ‘zzz’ to the ends you want to restore) actually creates a lot of unnecessary work. And, if you don’t like using macros, you’ll find the manual equivalent of the four Find/Replace processes in the macro are both much simpler and more thorough.

              Cheers,
              Paul Edstein
              [Fmr MS MVP - Word]

    • #1376137

      BTW, maybe another reason to re-consider Office 2013. You can open PDFs for editing in Word. If the PDF contains many graphics it may not look the same in Word as it is converted to enable editing the text.

      Joe

      --Joe

    • #1376142

      You have to select the text you want to copy with a text select tool first and then do the copy. It looked like you only selected only part way horizontally.

      Jerry

    • #1376190

      In this case, W2013 isn’t the better choice. The pdf is an image that has been OCR’d to make it searchable. When W2013 opened it, both the image and the OCR text appeared on the page. The strange, spurious returns weren’t there, though.

    • #1376196

      @Pam,

      True about the image but you if you need to you can go page by page and easily remove the image leaving the text.

      Joe

      --Joe

      • #1376204

        I opened a pdf file (one I have on my computer) with Adobe Reader and highlighted a section, did a Ctrl C, then opened Word 2010 and paste the copied text using Ctrl V it went in as read in pdf file.

    • #1376244

      If you want to know what’s new from Microsoft’s perspective see What’s new in Office 2013. At the bottom of the page there are links to “What’s New” for individual programs.

      Joe

      --Joe

      • #1376248

        Thanks Joe. Maybe I’ll take a look-see, Maybe there’s something there to justify the cost.

        Chuck

        Chuck Billow

    • #1376265

      Did you click on the text select tool in pdf Exchange befor the copy? After clicking on the text select, you can do a ctrl – a to select the whole document.
      33227-pdfxchange

      Jerry

      • #1376267

        I thought I did Jerry, but then…I’ll look again…

        Chuck

        Chuck Billow

    • #1376270

      So then Pam, I would need a full-blown PDF editor to really do what I need? I’m surprised that there isn’t a way to just copy the text. I would get it if it were losing empty lines, but *adding* them? Pooh!

      Chuck

      I just tried Adobe Reader XI, you have to pay to have it convert the file to Word, but copy and paste from it into Word has returns only at line endings, which is what you likely expected. Upgrading the reader is free…still.

    • #1376572

      If copying & pasting the text is sufficient, apart from the messed-up formatting, you can easily enough clean it up with a macro. I use the following for that purpose. The only ‘rule’ you need to pay particular attention to is that the macro assumes lines within a paragraph will be separated by no more than one paragraph break or manual line break whilst paragraphs will be separated by at least two paragraph breaks or or manual line breaks (I use the macro for content pasted from websites & emails as well).

      Code:
      Sub CleanUpPastedText()
      Dim TrkStatus As Boolean      ' Track Changes flag
      ' Turn Off Screen Updating
      Application.ScreenUpdating = False
      ' Store current Track Changes status, then switch off
      With ActiveDocument
        TrkStatus = .TrackRevisions
        .TrackRevisions = False
      End With
      With ActiveDocument.Content.Find
        .ClearFormatting
        .Replacement.ClearFormatting
        .Forward = True
        .Wrap = wdFindContinue
        .Format = False
        .MatchAllWordForms = False
        .MatchSoundsLike = False
        .MatchWildcards = True
        'Replace single paragraph breaks and manual line breaks with a space
        .Text = "([!^13^11])([^13^11])([!^13^11])"
        .Replacement.Text = "1 3"
        .Execute Replace:=wdReplaceAll
        'Replace all double spaces with single spaces
        .Text = "[ ]{2,}"
        .Replacement.Text = " "
        .Execute Replace:=wdReplaceAll
        'Delete hypens in hyphenated text formerly split across lines
        .Text = "([a-z])-[ ]{1,}([a-z])"
        .Replacement.Text = "12"
        .Execute Replace:=wdReplaceAll
        'Limit paragraph breaks and manual line breaks to one 'real' paragraph per set.
        .Text = "[^13^11]{1,}"
        .Replacement.Text = "^p"
        .Execute Replace:=wdReplaceAll
      End With
      ' Restore original Track Changes status
      ActiveDocument.TrackRevisions = TrkStatus
      ' Restore Screen Updating
      Application.ScreenUpdating = True
      End Sub

      If you want to be able to run the macro against a selected range only, change ‘ActiveDocument.Content’ to ‘Selection’ and change ‘.Wrap = wdFindContinue’ to ‘.Wrap = wdFindStop’.

      Cheers,
      Paul Edstein
      [Fmr MS MVP - Word]

    • #1377103

      I just downloaded your document and converted to PDF no worries using deskUNPDF program, AHACHMENT.docx file was opened in OpenOffice

      Having installed my paid version of Office 2000 Premium i found i was unable to convert Word documents to PDF because the required MS driver wasn’t bundled with MS Office until version 2003.
      Having sort support from the deskUNPDF program people i found i needed to download OpenOffice to obtain the driver that runs that function and that system configuration works fine

      • #1377125

        Having installed my paid version of Office 2000 Premium i found i was unable to convert Word documents to PDF because the required MS driver wasn’t bundled with MS Office until version 2003.

        No Office version up to and including 2010 has included a filter for reading PDFs. Office 2013 (not 2003) is the first Office version to support reading & editing of PDFs.

        Cheers,
        Paul Edstein
        [Fmr MS MVP - Word]

    • #1377129

      If you open a PDF file in Cute PDF there is the option to view the PDF as text. A simple copy and paste into word is possible. It’s clean and simple but there may need to be some editing to get spacing to look OK.
      I tried to attach a sample but am not sure it loaded.
      Cute PDF has a free version as well as a paid. I am using an early evaluation copy.

    • #1377216

      This is all very interesting, but seems a little too one-sided. First — using a macro to strip out the stuff you don’t want from the PDF is more than beginner level, and totally unnecessary for simple copy-paste usage. Second — all the responders are concentrating on only M$$ Office in its various versions. What about LibreOffice? The latest version, still free, is LibreOffice 4.0.0.3. It has the ability to open, edit, and convert to Word any PDF — and has since its days as OpenOffice. A little bit large download (nearly 200MB), installs faster than previous versions, but its pretty much compatible with all recent versions of M$$ Office, and I use it for simple conversions between formats quite often (although not for PDFs so much as I have Acrobat Pro).

      Try it, you’ll like it!

      Rob

      • #1377365

        First — using a macro to strip out the stuff you don’t want from the PDF is more than beginner level, and totally unnecessary for simple copy-paste usage.[/quote]
        Sure it’s more than beginner level, but who wants to stay there? Besides, once installed (which is very easy), the macro can be used on content pasted not only from PDFs, but also from any web pages, emails and so on where the source insert line breaks at the end of every line. Installed, the macro (which works in any version of Word) is no more difficult to use than anything else you might want to access via a toolbar or a Ribbon tab’s dropdown and can be assigned to a shortcut key. And maybe you hadn’t noticed, but Chuck isn’t exactly a beginner anyway.

        Second — all the responders are concentrating on only M$$ Office in its various versions. What about LibreOffice?

        Maybe that’s because they’re paying attention to the thread subject: “Copying text from a PDF into a Word document“. As for download and installing LibreOffice, doing so is rather like using a wrecking ball to drive in a screw. And here you are, saying using a macro is “totally unnecessary” …

        Cheers,
        Paul Edstein
        [Fmr MS MVP - Word]

        • #1377452

          Maybe that’s because they’re paying attention to the thread subject: “Copying text from a PDF into a Word document“. As for download and installing LibreOffice, doing so is rather like using a wrecking ball to drive in a screw. And here you are, saying using a macro is “totally unnecessary” …

          LibreOffice handles “Word” documents OK and maybe the suggestion is to use it for the full process rather than both MS Word and Foxit. It’s also a very good program.

          • #1377454

            LibreOffice handles “Word” documents OK and maybe the suggestion is to use it for the full process rather than both MS Word and Foxit.

            Completely irrelevant – The OP uses Word and asked for help with that. Telling someone they should load up their system with unnecessary software is just dumb.

            Cheers,
            Paul Edstein
            [Fmr MS MVP - Word]

    • #1431883

      Have you tried converting the pdf to text (http://pdftoword.pro/)? then you can copy and paste the text from the text document directly 🙂
      hope this helps^^

    • #1431885

      Thanks Saintsraw

      I have how found a very good solution, I download the free Acrobat Pro 7 from the Adobe site, which worked very well, it load all text into memory and then you can paste where you want it.

      The other day I found a free copy of free Acrobat Pro 8 at http://www.pcadvisor.co.uk/downloads/ so I will upgrading to that.

      Thanks again for coming back to me.

      • #1432109

        Thank you okln. The link you gave requires a search. Here is the link to the download for Acrobat Pro 8. Be sure to read the admonitions because this is an older program.

        Two points::

        One admonition to anyone reading this thread: Converted documents can look great, but the formatting is almost always trash, regardless of which program does the conversion. Even a Word document saved as pdf and then converted back to Word is going to have junk formatting. These documents should not be used as the basis for new documents. Instead, just paste the plain text and do any formatting in Word (preferably using Styles).

        PDFs can have embedded text or can just be a picture. If just a picture, an OCR conversion is needed. You can tell by whether you can select individual words in a pdf or just get an entire page selected. If the latter, you need to do the OCR conversion before trying to paste into Word.

    Viewing 14 reply threads
    Reply To: Copying text from a PDF into a Word document

    You can use BBCodes to format your content.
    Your account can't use all available BBCodes, they will be stripped before saving.

    Your information: