• Macro needed to find embedded MS Word graphics and save out to a file name

    Home » Forums » AskWoody support » Productivity software by function » MS Word and word processing help » Macro needed to find embedded MS Word graphics and save out to a file name

    Author
    Topic
    #482018

    Hello,

    I have several hundred pages of embedded graphics that I need to save out into file names. I would like the macro to:

    1. Point to a folder full of graphics.
    2. Look at a style called: Figure_Title and then copy the name of the figure title to memory.
    3. Look at the style called Figure (right above the figure title) and then save that associated graphic out as a .JPEG file (e.g. C:Temp1) with the figure title
    name.
    4. Loop through the rest of that document, followed by the rest of the documents in the folder and repeat steps 2 and 3.

    If this is possible, it will save hours of having to right click and rename each graphic to a xyz.jpg file name (based on the figure title name).

    I am using MS Word 2010.

    Thanks for any assistance in advance.

    Regards,

    Jim

    Viewing 14 reply threads
    Author
    Replies
    • #1323516

      I don’t have the time to create the macro at present but it is easy to extract the images from docx/docm files without a macro.
      1. Rename the file to end with .zip
      2. Open the zip file and go into the folder word/media
      3. Copy all the graphics from there into a new folder (not in the zip file)
      4. Close the zip file and change its name back to what is was

      The graphics should all be named in order of appearance in the Word document (although I’m not sure of where the header/footer graphics appear).

      • #1323536

        Hello Andrew,

        Thanks for the rename the .docx file to zip. That works really well, and gives me a chronological listing of all the original graphics that were embedded in the MS Word document. Thank you for that. One last question on this topic. Is there a simple Macro out there that will: 1. Point to a folder full of documents / 2. Look for each .docx file and rename to .zip. 3. Repeat with every single document in the folder until all are renamed as .zip? If I was somehow able to rename every .docx file as a zip and then was able to combine all the .zip files into one large zip, would it put all the graphics from all the documents into one manageable zip file, from which they could be extracted? My goal is to pull out all the graphics from all the source documents and we could be talking about 5000 graphics. If I could get them all into one folder, then I could rename them with a spreadsheet macro that is on this forum, under the Excel section. That spreadsheet looks at the original graphic name in column A and then looks at the desired name in column B and then renames the graphic in the folder to the column B name. The zip method above will essentially give me all the graphics, that I can list out in the first column and then I just need to decide on new names for column B. This is very interesting to think about. Anyway, thanks again for taking the time to give a response to my question. It will certainly (even if renaming one .docx file to a .zip and extracting the graphics) save me many hours of manual work. Regards, Jim

    • #1323548

      Hi James,

      Another approach is to save the documents in HTML format. For each document, you’ll get a folder named after the document and, in that folder, will be all the images. A key difference with this approach is that the resolution of the images needs to be specified beforehand. This can be both a benefit and a drawback, in that doing so determines how much disc space each image will require, but it can also mean some images will become quite grainy if enlarged/printed and a sufficiently high resolution hadn’t been specified.

      Cheers,
      Paul Edstein
      [Fmr MS MVP - Word]

      • #1323556

        Thanks Paul. I appreciate that information as well. Do you know if there is a way to gather all of the embedded graphics into one folder? The rename as zip method works great, as it creates a media subfolder in the zip, containing all the graphics, but that is just one document. I have two hundred documents (some with graphics and some with not). It would be very, very helpful if I could somehow get one large folder with all the embedded graphics from all the zip or html files (that came from MS Word 2010). Thanks for any suggestions you (and Andrew) may have. It is truly appreciated. Regards, Jim

    • #1323660

      Hi James,

      Whether you’re working from the zip files or html folder, you can’t simply copy/move the image files to a single folder. That’s because their names will, for the most part, overlap. If you check out the zip or html folder, you’ll see their basically sequentially-numbered files. So, if you simply copy the lot into a single folder, what you’ll end up with is however many images were in the file that had the most, not a set of images per file, but the images you’ll have will quite probably be from a number of different files – not just from the one that had the most. So, if you have a hundred files, with one image each, all you’ll end up with is a folder containing one image – the last one copied/moved. Similarly, from a set of documents, one of which had 5 files, you’ll end up with a folder containing five files of (probably) mixed parentage.

      Cheers,
      Paul Edstein
      [Fmr MS MVP - Word]

      • #1323666

        Hi Paul,

        I understand what you are saying. I am just trying to find a way of getting all the graphics named, without having to rename 5000 graphics or more (for example) manually. I could at the very least run this excel macro that will rename for example the 30 graphics I have in my first document, to an appropriate name (e.g. replace column a with column b in the macro spreadsheet that points to the graphics folder), but I still have to repeat that process for another 200 documents containing graphics. It sounds like that is the best option at this point.

        Thanks for the feedback and suggestions. I appreciate it.

        Regards,

        Jim

    • #1323667

      Hi Jim,

      It should be possible to write a macro that:
      • navigates to a folder
      • opens a document
      • saves the document as html
      • copies the images from the document’s html folder to another folder
      • prefixes each copied image name with the document name
      • repeats the above opening – prefixing steps until all files are done
      The actual order of the copying & prefixing might vary, but you get the general idea. Such an approach retains a naming connection between the images as the documents they come from. If you don’t care about that, the images could simply be re-numbered sequentially. Is that the kind of thing you’re after?

      Cheers,
      Paul Edstein
      [Fmr MS MVP - Word]

      • #1323726

        Hi Paul,

        Yes, that approach would definitely work. I could always change the graphics afterwards by putting them all in one folder and running an excel macro (for instance) that takes a listing of row A1 (old graphic name) and names the graphic in the folder based on the name in row B1 (new graphic name), looping down the Excel list until all the graphics have been renamed. But for now, the approach you mentioned would be really great. Some Word 2010 (.docx) files have graphics in them and some do not. I would not know offhand until I opened each MS Word document, but I guess the Macro could just move on to the next MS Word document, if it did not find graphics in a particular document. So yes, the Macro you suggest would certainly do the job. Regards, Jim

    • #1324196

      Hi James,

      Try the following. After selecting the folder to process, the code extracts the images from all docx & docm files in that folder and outputs them to a new ‘DocMedia’ folder in that folder. Each output file’s name is prefixed with the parent document’s name. If the files have media other than images embedded, these will be extracted too. Note that, the macro only processes docx & docm files – doc files can’t be processed this way.

      Code:
      Sub ExtractDocMedia()
      Application.ScreenUpdating = False
      Dim StrInFold As String, StrOutFold As String, StrTmpFold As String
      Dim StrDocFile As String, StrZipFile As String, Obj_App As Object
      Dim StrFile As String, StrFileList As String, StrMediaFile As String, i As Long
      StrInFold = GetFolder
      If StrInFold = "" Then Exit Sub
      StrOutFold = StrInFold & "DocMedia"
      StrTmpFold = StrInFold & "Tmp"
      'Test for existing tmp & output folders, create they if they don't already exist
      If Dir(StrTmpFold, vbDirectory) = "" Then MkDir StrTmpFold
      If Dir(StrOutFold, vbDirectory) = "" Then MkDir StrOutFold
      'Create a Shell App for accessing the zip archives
      Set Obj_App = CreateObject("Shell.Application")
      'Look for docx files to process
      StrFile = Dir(StrInFold & "*.doc?", vbNormal)
      'Build the file list
      While StrFile  ""
        StrFileList = StrFileList & "|" & StrFile
        StrFile = Dir()
      Wend
      'process the file list
      For i = 1 To UBound(Split(StrFileList, "|"))
        'ID the document to process
        StrDocFile = StrInFold & "" & Split(StrFileList, "|")(i)
        'Define the zip name
        StrZipFile = Split(StrDocFile, ".")(0) & ".zip"
        'In case the file is in use or zip file has no media
        On Error Resume Next
        'Create the zip file, by simply copying to a new file with a zip extension
        FileCopy StrDocFile, StrZipFile
        'Extract the zip archive's media files to the temporary folder
        Obj_App.NameSpace(StrTmpFold & "").CopyHere Obj_App.NameSpace(StrZipFile & "wordmedia").Items
        'Delete the zip file - the loop takes care of timing issues
        Do While Dir(StrZipFile)  ""
          Kill StrZipFile
        Loop
        'Restore error trapping
        On Error GoTo 0
        'Get the temporary folder's file listing
        StrMediaFile = Dir(StrTmpFold & "*.*", vbNormal)
        'Process the temporary folder's files
        While StrMediaFile  ""
          'Copy the file to the output folder, prefixed with the source file's name
          FileCopy StrTmpFold & "" & StrMediaFile, StrOutFold & "" & Split(Split(StrFileList, "|")(i), ".")(0) & StrMediaFile
          'Delete the media file
          Kill StrTmpFold & "" & StrMediaFile
          'Get the next media file
          StrMediaFile = Dir()
        Wend
      Next
      'Delete the temporary folder
      RmDir StrTmpFold
      Application.ScreenUpdating = True
      End Sub
      
      Function GetFolder() As String
      Dim oFolder As Object
      GetFolder = ""
      Set oFolder = CreateObject("Shell.Application").BrowseForFolder(0, "Choose a folder", 0)
      If (Not oFolder Is Nothing) Then GetFolder = oFolder.Items.Item.Path
      Set oFolder = Nothing
      End Function

      Cheers,
      Paul Edstein
      [Fmr MS MVP - Word]

      • #1324375

        Hi Paul,
        Thanks so much!!! I have to say that is probably one of the best macros I have ever seen in this lounge. It worked perfectly, going through 200 documents and saving out the embedded images in each document to the media folder with the name of the document itself. In a few minutes it pulled out almost 1000 graphics across 200 MS Word documents. If I get down to Australia one of these days, count yourself in for a couple rounds of beer on me. ๐Ÿ™‚ Thanks again! Jim

    • #1324378

      Hi Jim,

      Thanks for the feedback!

      I imagine that, with so many documents, it took a while to process. It’s a simple matter to add some code to report the code’s progress on the status bar:

      Code:
      Sub ExtractDocMedia()
      Application.ScreenUpdating = False
      Dim SBar As Boolean           ' Status Bar flag
      Dim StrInFold As String, StrOutFold As String, StrTmpFold As String
      Dim StrDocFile As String, StrZipFile As String, Obj_App As Object, i As Long
      Dim StrFile As String, StrFileList As String, StrMediaFile As String, j As Long
      StrInFold = GetFolder
      If StrInFold = "" Then Exit Sub
      ' Store current Status Bar status, then switch on
      SBar = Application.DisplayStatusBar
      Application.DisplayStatusBar = True
      StrOutFold = StrInFold & "DocMedia"
      StrTmpFold = StrInFold & "Tmp"
      'Test for existing tmp & output folders, create they if they don't already exist
      If Dir(StrTmpFold, vbDirectory) = "" Then MkDir StrTmpFold
      If Dir(StrOutFold, vbDirectory) = "" Then MkDir StrOutFold
      'Create a Shell App for accessing the zip archives
      Set Obj_App = CreateObject("Shell.Application")
      'Look for docx files to process
      StrFile = Dir(StrInFold & "*.doc?", vbNormal)
      'Build the file list
      While StrFile  ""
        StrFileList = StrFileList & "|" & StrFile
        StrFile = Dir()
      Wend
      'process the file list
      j = UBound(Split(StrFileList, "|"))
      For i = 1 To j
        'ID the document to process
        StrDocFile = StrInFold & "" & Split(StrFileList, "|")(i)
        ' Report progress on Status Bar.
        Application.StatusBar = "Processing file " & i & " of " & j & ": " & StrDocFile
        'Define the zip name
        StrZipFile = Split(StrDocFile, ".")(0) & ".zip"
        'In case the file is in use or zip file has no media
        On Error Resume Next
        'Create the zip file, by simply copying to a new file with a zip extension
        FileCopy StrDocFile, StrZipFile
        'Extract the zip archive's media files to the temporary folder
        Obj_App.NameSpace(StrTmpFold & "").CopyHere Obj_App.NameSpace(StrZipFile & "wordmedia").Items
        'Delete the zip file - the loop takes care of timing issues
        Do While Dir(StrZipFile)  ""
          Kill StrZipFile
        Loop
        'Restore error trapping
        On Error GoTo 0
        'Get the temporary folder's file listing
        StrMediaFile = Dir(StrTmpFold & "*.*", vbNormal)
        'Process the temporary folder's files
        While StrMediaFile  ""
          'Copy the file to the output folder, prefixed with the source file's name
          FileCopy StrTmpFold & "" & StrMediaFile, StrOutFold & "" & Split(Split(StrFileList, "|")(i), ".")(0) & StrMediaFile
          'Delete the media file
          Kill StrTmpFold & "" & StrMediaFile
          'Get the next media file
          StrMediaFile = Dir()
        Wend
      Next
      'Delete the temporary folder
      RmDir StrTmpFold
      ' Clear the Status Bar
      Application.StatusBar = False
      ' Restore original Status Bar status
      Application.DisplayStatusBar = SBar
      Application.ScreenUpdating = True
      End Sub
      
      Function GetFolder() As String
      Dim oFolder As Object
      GetFolder = ""
      Set oFolder = CreateObject("Shell.Application").BrowseForFolder(0, "Choose a folder", 0)
      If (Not oFolder Is Nothing) Then GetFolder = oFolder.Items.Item.Path
      Set oFolder = Nothing
      End Function

      Cheers,
      Paul Edstein
      [Fmr MS MVP - Word]

    • #1346590

      I have a similar problem. I would like to streamline a job I have to take many documents and save the pictures in them with names that describe their use in the text. I thought of the following flow:
      1. I manually add meaningful text in the Alt Text dialog of each image (inside Format Picture dialog).
      2. Run a macro that loops through all the images in the document and uses the SavePictureAs command and the AltText property (I presume there is one) to save all the pictures in .png format using the names provided in the Title of the AltText dialog.
      I am not very familiar with VBA. I tried to record a single picture macro but when I looked inside there seemed to be nothing there besides the name. (I placed the cursor next to an image and used the keyboard to select it before opening and adding text).
      Can you help?
      Thanks.

    • #1346619

      Hi Stanley,

      There is no way for the process used in the above posts to retrieve the additional data you want. And no, there is no ‘Alt Text’ property for inserted graphics such as jpg & png files.

      Cheers,
      Paul Edstein
      [Fmr MS MVP - Word]

    • #1346700

      Paul – I think you may be mistaken about the lack of an Alt Text property on inserted graphics. If I insert a jpg and then right click it to choose Format Picture, the Alt text tab allows me to edit two fields there (Title and Description). The Description field corresponds to the AlternativeText property of the graphic.

      Stanley
      The code that Paul provided works externally to the document (by extracting media files from the zip format) and it would be tricky to modify to extract the Alt Text tag on your graphics. Conceptually, the Alt text information would also be accessible inside the zip file but the code would have to parse the relevant xml file(s) to align each graphic with the associated alt tag.

      On the other hand, it should be more straightforward to create VBA to open the word document and then iterate through each graphic, export the image and save it using the contents of the Alt Tag. However, I didn’t spot any method in VBA of saving the current graphic and recording a macro of the GUI doing it doesn’t capture the relevant VBA. It would still be doable using SendKeys but that isn’t going to be very elegant. Code that almost gets there is attached below – perhaps someone else can work out the missing link to SaveAs or Export the graphic. If your graphics are floating then you would need to include a similar structure for Shapes.

      Code:
      Sub Temp1()  Dim aPict As InlineShape, sPath As String, sName As String
        sPath = “D:Work”
        For Each aPict In ActiveDocument.InlineShapes
          sName = sPath & aPict.AlternativeText & “.png”
          ‘Code to export picture would go here
          
        Next aPict
      End Sub
      

      An alternative piece of coding for extracting an image is available from http://www.lebans.com/msword.htm

      • #1347245

        Thanks, Andrew, for the explanation and sample code. I tried the lebans website but there was only an image there! Did I misunderstand something?
        As you figured out, I was looking for a way to do this work with a Word macro. I also tried recording a macro but it would not record the Save Picture command. On the other hand, I have found the ObjectSaveAsPicture control name in a Microsoft WordControls resource file (Office 2010 Help Files: Office Fluent User Interface Control Identifiers) . So I presume it can be used in vba. The same resource listed a button named AlternativeText which is probably what opens the AltText dialog box. That leaves the question of how to get a handle on the text that is IN the dialog box…
        I am researching this info for someone else who writes macros, so I have not tested anything. Thanks for everything so far.

    • #1346701

      hi Andrew,

      Thanks for that – I was looking for them via the Ribbon.

      Cheers,
      Paul Edstein
      [Fmr MS MVP - Word]

    • #1347295

      The ObjectSaveAsPicture command looks to be the thing you need. The trick is now working out the name of the dialog box it calls so we can hijack it.

      I don’t have the time to work this out at present but may be able to spare some time on the weekend to have another look at it.

    • #1347312

      I’m new to this game. It seems unbelievable that there is no resource available for finding this out – some sort of reference guide or documenetation generated form the code, like Javadocs.
      Thanks for whatever time you have spent.

    • #1347314

      Hi Stanley,

      Just because something can be done through the user interfsace in Word doesn’t mean there’s a vba interface for it. Do bear in mind that Word is a word processor and what you’re asking for isn’t something normally expected of one.

      As Andrew said, it’s technically possible to extract the alternative text from the document.xml file inside the zip archive, but that would require extracting that file also, then parsing it and updating the image filenames, a task complicated by the fact the the image names in the document.xml file aren’t necessarily the same as those assigned to the images themselves. For example, in a test file I created, the image named ‘image1.jpeg’ appears in the xml file as:

      (plus some other lines that never mention ‘image1’). Interestingly, the document.xml file also holds the original image names (assuming they’re copied from a file, at least), though not their paths.

      Cheers,
      Paul Edstein
      [Fmr MS MVP - Word]

    • #1347403

      Thanks for the explanation.
      I think we can get around the Alt Text by simply placing a “tag” with the image name immediately before or after the image in the text. (We could even make it hidden text). I presume the function of searching for the next image and accessing the text immediately surrounding it is not a problem.
      About the dialog box, I’m not clear. Does the ObjectSaveAsPicture command not have an API where one could select the output extension and supply a pathname?

    • #1347406

      Does the ObjectSaveAsPicture command not have an API where one could select the output extension and supply a pathname?

      One may be able to invoke it via the Word APIs, but I have no idea what the code would be. What I was saying is that it doesn’t appear to be available to vba via, say, Application.Dialogs and, even if you add it to the QAT, you can’t record its use.

      Cheers,
      Paul Edstein
      [Fmr MS MVP - Word]

    Viewing 14 reply threads
    Reply To: Reply #1323666 in Macro needed to find embedded MS Word graphics and save out to a file name

    You can use BBCodes to format your content.
    Your account can't use all available BBCodes, they will be stripped before saving.

    Your information:




    Cancel