Topic: MSHTML, VBA and HttpResponse code (IE 6) @ AskWoody

MSHTML, VBA and HttpResponse code (IE 6)
Home » Forums » AskWoody support » Questions: Browsers and desktop software » Internet Explorer and Edge » MSHTML, VBA and HttpResponse code (IE 6)
- This topic has 8 replies, 2 voices, and was last updated 21 years, 5 months ago.
Author

Topic
New Reply
WSpeterl
AskWoody Lounger

January 20, 2004 at 4:11 am #399370

I’m scraping a website using VBA (from Access) and MSHTML (via IE 6), similar to covered on thread starting with post post 292415.
My question is: how do I tell what response code I received? How do I tell whether I got a 200 (OK) or 404 (Not found)? The actual page returned for those codes varies depending on the web server, but I should be able to get the HttpResponse code. My basic code is:

Dim objMSHTML As New MSHTML.HTMLDocument Dim objDocument As MSHTML.HTMLDocument 'This function is only available with Internet Explorer 5 and later Set objDocument = objMSHTML.createDocumentFromUrl(sURL, vbNullString) 'Tricky, to make the function wait for the document to complete, usually the 'transfer is asynchronous. Note that this string might be different if you have 'another language than English for Internet Explorer on the machine where the code is 'executed. While objDocument.readyState "complete" DoEvents Wend 'OK, now we've got the page If objDocument.Title = "404 Not Found" Then 'This is not a robust solution 'Need to get "objDocument.HttpResponseCode" or similar '...

Reply | Quote
Viewing 1 reply thread
Author

Replies
- WSjscher2000
  AskWoody Lounger
  
  January 20, 2004 at 5:26 am #770954
  
  Interesting and tough problem. It appears that you need to do quite a bit of low-level API work to get this information, using several functions before you can request the status line with HttpQueryInfo. Some resources for you:
  
  Win32 Internet HTTP Functions in Visual Basic MSDN, Sept. 1996
  FIX: Internet Transfer Control 5.0 Has Bug with “HEAD” Request MSKB #171271
  HTTP Status Codes (Platform SDK: Windows Internet) MSDN
  
  I look forward to seeing the solution.
  
  Reply | Quote
- WSpeterl
  AskWoody Lounger
  
  January 20, 2004 at 9:57 am #771032
  
  Oh that I was using Java or Perl!
  I’ll dive into the problem with these leads – thank you very much.
  I’ll post the solution when I get it, but it might not be today!
  Peter
  
  Reply | Quote
  
  WSpeterl
  AskWoody Lounger
  
  January 21, 2004 at 10:53 pm #771985
  
  The code on the first of your links works, although the MS site is missing the sample application.
  The second link applies to the control, that the code doesn’t actually need.
  Unfortunately, the [InternetReadFile] function retrieves the page as text, not as a MSHTML object. So I’d need to either rewrite my system to parse the document itself (too hard), or create a MSHTML object from the text (too likely to bomb with bad HTML , and I don’t know whether this is possible ) or make multiple requests (too much traffic).
  So my problem is now refined to how do I get the Http Status Code for a MSHTML request.
  I’ve redefined part of my code so that the effect of a 404 is minimised, so I don’t need the answer to this problem now.
  
  [philosophy]
  I’m still interested from a curiosity point of view. Why would MS hide (so effectively) this standard piece of information? On the web, headers are almost as important as content (from a program’s point of view).
  [/philosophy]
  
  Thanks for your help.
  
  Reply | Quote
  
  WSjscher2000
  AskWoody Lounger
  
  January 22, 2004 at 12:33 am #772013
  
  I realize it’s terribly cumbersome, but I thought you could use the API calls for the sole purpose of obtaining the header information, but continue to use the rest of your code “as is.” As for why it isn’t part of the MSHTML document object model, good question!!
  
  Reply | Quote
  
  WSjscher2000
  AskWoody Lounger
  
  January 22, 2004 at 12:33 am #772014
  
  I realize it’s terribly cumbersome, but I thought you could use the API calls for the sole purpose of obtaining the header information, but continue to use the rest of your code “as is.” As for why it isn’t part of the MSHTML document object model, good question!!
  
  Reply | Quote
  
  WSpeterl
  AskWoody Lounger
  
  January 21, 2004 at 10:53 pm #771986
  
  The code on the first of your links works, although the MS site is missing the sample application.
  The second link applies to the control, that the code doesn’t actually need.
  Unfortunately, the [InternetReadFile] function retrieves the page as text, not as a MSHTML object. So I’d need to either rewrite my system to parse the document itself (too hard), or create a MSHTML object from the text (too likely to bomb with bad HTML , and I don’t know whether this is possible ) or make multiple requests (too much traffic).
  So my problem is now refined to how do I get the Http Status Code for a MSHTML request.
  I’ve redefined part of my code so that the effect of a 404 is minimised, so I don’t need the answer to this problem now.
  
  [philosophy]
  I’m still interested from a curiosity point of view. Why would MS hide (so effectively) this standard piece of information? On the web, headers are almost as important as content (from a program’s point of view).
  [/philosophy]
  
  Thanks for your help.
  
  Reply | Quote
  
  WSpeterl
  AskWoody Lounger
  
  January 20, 2004 at 9:57 am #771033
  
  Oh that I was using Java or Perl!
  I’ll dive into the problem with these leads – thank you very much.
  I’ll post the solution when I get it, but it might not be today!
  Peter
  
  Reply | Quote
- WSjscher2000
  AskWoody Lounger
  
  January 20, 2004 at 5:26 am #770955
  
  Interesting and tough problem. It appears that you need to do quite a bit of low-level API work to get this information, using several functions before you can request the status line with HttpQueryInfo. Some resources for you:
  
  Win32 Internet HTTP Functions in Visual Basic MSDN, Sept. 1996
  FIX: Internet Transfer Control 5.0 Has Bug with “HEAD” Request MSKB #171271
  HTTP Status Codes (Platform SDK: Windows Internet) MSDN
  
  I look forward to seeing the solution.
  
  Reply | Quote
Viewing 1 reply thread

Reply To: MSHTML, VBA and HttpResponse code (IE 6)
You can use BBCodes to format your content.
Your account can't use all available BBCodes, they will be stripped before saving.

Your information:
Name (required):

Mail (will not be published) (required):

Website:

Cancel

Plus Membership

Donations from Plus members keep this site going. You can identify the people who support AskWoody by the Plus badge on their avatars.

AskWoody Plus members not only get access to all of the contents of this site -- including Susan Bradley's frequently updated Patch Watch listing -- they also receive weekly AskWoody Plus Newsletters (formerly Windows Secrets Newsletter) and AskWoody Plus Alerts, emails when there are important breaking developments.

Welcome to our unique respite from the madness.

It's easy to post questions about Windows 11, Windows 10, Win8.1, Win7, Surface, Office, or browse through our Forums. Post anonymously or register for greater privileges. Keep it civil, please: Decorous Lounge rules strictly enforced. Questions? Contact Customer Support.

MSHTML, VBA and HttpResponse code (IE 6)

Plus Membership

Search Newsletters

Search Forums

View the Forum

Search for Topics

Recent Topics

Recent blog posts

My Profile

Key Links

Remembering Woody

MSHTML, VBA and HttpResponse code (IE 6)

Plus Membership

Search Newsletters

Search Forums

View the Forum

Search for Topics

Recent Topics

Recent blog posts

My Profile

Login and Registration

Key Links

Remembering Woody