In this article, we'll explore how to perform web scraping in Python using the Beautiful Soup library.
Introduction to Web Scraping
Web scraping is the process of extracting data from websites. It involves fetching HTML content from web pages and then parsing and extracting the desired information.
Using Beautiful Soup
Beautiful Soup is a Python library for parsing HTML and XML documents. It provides a simple interface for navigating and manipulating the parse tree.
Example:
import requests
from bs4 import BeautifulSoup
# Fetch HTML content from a web page
response = requests.get('https://example.com')
html_content = response.text
# Parse the HTML content using Beautiful Soup
soup = BeautifulSoup(html_content, 'html.parser')
# Find and extract specific elements
title = soup.title.text
paragraphs = soup.find_all('p')
# Print the extracted data
print("Title:", title)
print("Paragraphs:")
for paragraph in paragraphs:
print(paragraph.text)
In this example, we fetch HTML content from a web page using the requests
library, parse it using Beautiful Soup, and then extract the title and paragraphs from the page.
Conclusion
Beautiful Soup is a powerful tool for web scraping in Python, allowing you to extract data from web pages easily. By combining Beautiful Soup with other libraries like requests
for fetching web pages, you can automate the process of gathering data from the web.