A website visitor wants help on how to automate the extraction of a specific hyper-link from web pages using Excel VBA
‘Hello Dr. Takyar,
It was pleasure talking to you the other day. Hope you and your friends/relatives were safe from the recent storm.
Here is the task that I would like to automate with Excel VBA and seeking help on. I liked your code for extracting table data from multiple web pages and creating and pasting it in respective worksheets – http://www.youtube.com/watch?v=qbOdUaf4yfI However my task is a bit different which is the following:
1) There are several (100s or 1000s) web-links in column A.
2) A connection is established with a web-link’s webpage.
3) On the connected webpage, the first HREF link that contains the text “About” is copied and it’s web-link is pasted in the adjacent cell in column B. If “About” HREF
link is not found on connected webpage, then the cell is populated with “NOT FOUND”.
4) The connection is closed and steps 1 – 3 are repeated until there are no more web-links in column A.
Some examples we can try with are the following web-links in
Column A Column B
1) https://www.google.com/ https://www.google.com/intl/en/about/
2) https://www.facebook.com/ https://www.facebook.com/facebook
3) http://www.youtube.com/ http://www.youtube.com/yt/about/
4) http://www.yahoo.com/ http://info.yahoo.com/
5) http://www.wikipedia.org/ NOT FOUND
6) http://www.linkedin.com/ http://www.linkedin.com/about-us
1) When a respective “About” HREF link is copied from the webpage of a web-link in Column A, and the same is pasted in
Column B, the web-link of the “About” HREF link is displayed in Column B.
2) “About” HREF links on webpages can be one of several variations like – About, About Us, About “Company_Name”, etc. Whatever it is, we want to capture the first HREF link that contains the text – “About”.
Please feel free to email me with any questions or if you need me to call you.
Here is the complete VBA with proper remarks:
‘First define all the variables
Dim ie As Object ‘Internet Explorer
Dim html As Object ‘ HTML document
Dim myLinks As Object ‘ Links collection
Dim myLink As Object ‘Single Link
Dim result As String
Dim myURL As String ‘Web Links on worksheet
Dim LastRow As Integer ‘ VBA execution should stop here
Set ie = CreateObject(“InternetExplorer.Application”)
LastRow = Sheet1.Cells(Rows.Count, “A”).End(xlUp).Row
‘Loop through all the web links on the worksheet one by one and then do some things
For i = 2 To LastRow
‘Get the link from the worksheet and assign it to the variable
myURL = Sheet1.Cells(i, 1).Value
‘Now go to the website
‘Keep the internet explorer visible
ie.Visible = True
‘Ensure that the web page has downloaded completely
While ie.readyState <> 4
‘Get the data from the web page that is in the links and assign it to the variable
result = ie.document.body.innerHTML
‘create a new html file
Set html = CreateObject(“htmlfile”)
‘now place all the data extracted from the web page into the new html document
html.body.innerHTML = result
Set myLinks = html.getElementsByTagName(“a”)
‘loop through the collected links and get a specific link defined by the conditions
For Each myLink In myLinks
If Right$(myLink, 13) = “about-us.html” Or Right$(myLink, 10) = “about.html” Or Right$(myLink, 8) = “about-us” Or Right$(myLink, 5) = “about” Then
Sheet1.Cells(i, “B”).Value = myLink
‘go to the next link
‘once the last web link on the sheet has been visited close the internet explorer
If i = LastRow Then
‘ go to the next web link on the worksheet
Watch the video:
Using Web Queries and a Loop to Download 4000 Database Entries from 4000 Web Pages