
From the begining of my programming days I am a python guy and also like to find new ways to solve a problem. I also like to do fun stuff with python like playing with different libraries of python like pandas and numpy and so on. Somewhere in the youtube I saw about webscraping with python and found that requests and bs4 library is widely use for this work. So as a regular python guy I installed those libraries with the help of pip
and start looking into their documentation for getting start.
So, let me tell you about what exactly web scraping is. It is a programming technique by which we retrieve some website data to do some meaningful stuff. Here first we request the webpage and then we parse the data we get and then we retrieve the data that we want. Here below some code that I have written for fun 😛
In this code i have retrieve all the repositories names from my github profile.
import requests
import bs4
def web_scrapping(webUrl):
res = requests.get(webUrl)
soup = bs4.BeautifulSoup(res.text, 'html.parser')
for i in soup.select('.d-inline-block > h3 > a'):
print(i.text)
web_scrapping('https://github.com/aniruddha2000?tab=repositories')
In this case I have retrieve the laptops name and their short description from the flipkart 😀
def web_scrapping_flipkart(webUrl):
res = requests.get(webUrl)
soup = bs4.BeautifulSoup(res.text, 'html.parser')
for link1 in soup.find_all('div', {'class': '_3wU53n'}):
print(link1.text)
web_scrapping_flipkart('https://www.flipkart.com/search?q=laptop')
Thank you. 🙂