Fun with python requests module

I was reading this part of the pym book and thought that I should change this code a little bit so that it can do some better thing like instead of storing the content in a text file it stores it in a HTML file and it also checks if there is any same named file exists in the directory or not. Here is the code –

import os.path
import requests

def download(url):
    """
    Download the given url and saves it to the current directory
    :arg url: URL of the file to be downloaded.
    """
    req = requests.get(url)
    if req.status_code == 404:
        print('No such file found at %s' % url)
        return
    fileName = url.split('/')[-1].split('.')[0] + '.html'
    print(fileName)
    if os.path.isfile(fileName):
        print('Same file name already exist')
    else:
        with open(fileName, 'wb') as fobj:
            fobj.write(req.content)
        print('Download over')

if __name__ == "__main__":
    url = input("Enter a URL: ")
    download(url)

Above we are getting the content of the content of the url by requests.get(url) method. Then checking if that url is valid or not. If valid then parsing the url by split() method like first we are splitting it by “/” and taking the last value of the list and then splitting it again with “.” and taking the first value of the list. Then checking if there is no same name file exist and if there is no same name file then we are creating a file then writing the content in the file.
Thank you 🙂

Advertisement

Exploring web scraping with requests and bs4

Photo by Rahul Nayak on medium

From the begining of my programming days I am a python guy and also like to find new ways to solve a problem. I also like to do fun stuff with python like playing with different libraries of python like pandas and numpy and so on. Somewhere in the youtube I saw about webscraping with python and found that requests and bs4 library is widely use for this work. So as a regular python guy I installed those libraries with the help of pip and start looking into their documentation for getting start.

So, let me tell you about what exactly web scraping is. It is a programming technique by which we retrieve some website data to do some meaningful stuff. Here first we request the webpage and then we parse the data we get and then we retrieve the data that we want. Here below some code that I have written for fun 😛

In this code i have retrieve all the repositories names from my github profile.

import requests
import bs4

def web_scrapping(webUrl):
    res = requests.get(webUrl)
    soup = bs4.BeautifulSoup(res.text, 'html.parser')

    for i in soup.select('.d-inline-block > h3 > a'):
        print(i.text)

web_scrapping('https://github.com/aniruddha2000?tab=repositories')

In this case I have retrieve the laptops name and their short description from the flipkart 😀

def web_scrapping_flipkart(webUrl):
    res = requests.get(webUrl)
    soup = bs4.BeautifulSoup(res.text, 'html.parser')
    for link1 in soup.find_all('div', {'class': '_3wU53n'}):
        print(link1.text)

web_scrapping_flipkart('https://www.flipkart.com/search?q=laptop')

Thank you. 🙂