Skip to content

How to get href from class in beautifulsoup (Easy Method Explained)

I was scraping and had the problem getting hrefs from div class but couldn’t find anything on the internet.

I have found the easy solution of finding class from beautifulsoup and then again to put in beautifulsoup to find hrefs.

We have below HTML where we want hrefs but not all in the website. Only that are in the class.

HTML

<div class="menu-header-234123">
	<a href="/home">home</a>
	<a href="/about">about</a>
</div>
<div class="product-grid-1234">
	<div class="product-235123">
		<a href="/product/apple">
			<h4>Apple<h4>
		</a>	
	</div>
	<div class="product-425123">
		<a href="/product/orange">
			<h4>Orange<h4>
		</a>	
	</div>
	<div class="product-135123">
		<a href="/product/pine">
			<h4>Pine<h4>
		</a>	
	</div>
</div>

Python Code

To get hrefs in product-grid-1234 we need to do this.


import requests
from bs4 import BeautifulSoup

URL = "https://www.url.com"  #put your own url
page = requests.get(URL)

soup = BeautifulSoup(page.content, "html.parser")

clas= soup.find(class_='product-grid-1234')

href=clas.find_all('a',href=True)      #print (clas.find('a',href=True)['href'])  #for 1 href

for i in range(len(href)):
    print ("Found the URL:", href[i]['href'])

Results

The results of the code are displayed here. (to achieve this, copy the below code. click here)

Found the URL: /product/apple
Found the URL: /product/orange
Found the URL: /product/pine

All Code for Testing It locally

If you want to test this same code to get the above results, you can do it by copying below.

Make sure you have beautifulsoup installed. If not, run the command in terminal ‘pip install beautifulsoup

html='''
<div class="menu-header-234123">
	<a href="/home">home</a>
	<a href="/about">about</a>
</div>
<div class="product-grid-1234">
	<div class="product-235123">
		<a href="/product/apple">
			<h4>Apple<h4>
		</a>	
	</div>
	<div class="product-425123">
		<a href="/product/orange">
			<h4>Orange<h4>
		</a>	
	</div>
	<div class="product-135123">
		<a href="/product/pine">
			<h4>Pine<h4>
		</a>	
	</div>
</div>
'''

#import requests
from bs4 import BeautifulSoup

#URL = "https://www.url.com"
#page = requests.get(URL)

soup = BeautifulSoup(html, "html.parser")

clas= soup.find(class_='product-grid-1234')

href=clas.find_all('a',href=True)      #print (clas.find('a',href=True)['href'])  #for 1 href

for i in range(len(href)):
    print ("Found the URL:", href[i]['href'])

Tip

Use inspection in chrome & ctrl + F in inspection to save you time and help you find all classes with similar names. Most products have the same class name, but it looks like they are different.’

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments