From a7b9de8e51533216c54253d6f1a3c077c84ab697 Mon Sep 17 00:00:00 2001 From: Durgesh Vaigandla Date: Sun, 16 Jun 2024 11:32:24 +0530 Subject: [PATCH 1/3] fix: Update image source path for Power BI blog post --- index.html | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/index.html b/index.html index a5bcd5c..9679a7c 100644 --- a/index.html +++ b/index.html @@ -205,7 +205,7 @@

- Professional LinkedIn account creation
April 25, 2024
From 30edc3c63b508533ee93d75d5a6a9d8ee2ee144b Mon Sep 17 00:00:00 2001 From: Durgesh Vaigandla Date: Mon, 17 Jun 2024 10:03:41 +0530 Subject: [PATCH 2/3] Add new URLs to sitemap.xml --- .../Building-chatBot-with-NLTK.html | 331 ------------------ ...ampleOfBuildingWebScrapingApplication.html | 244 ------------- .../WebScrapingWithBeautifulSoup.html | 328 ----------------- posts/python/Building-chatBot-with-NLTK.html | 318 +++++++++++++++++ ...ampleOfBuildingWebScrapingApplication.html | 218 ++++++++++++ .../python/WebScrapingWithBeautifulSoup.html | 326 +++++++++++++++++ sitemap.xml | 9 + 7 files changed, 871 insertions(+), 903 deletions(-) delete mode 100644 posts/ChatBot with NLTK/Building-chatBot-with-NLTK.html delete mode 100644 posts/Web Scraping tool with BeautifulSoup/ExampleOfBuildingWebScrapingApplication.html delete mode 100644 posts/Web Scraping tool with BeautifulSoup/WebScrapingWithBeautifulSoup.html create mode 100644 posts/python/Building-chatBot-with-NLTK.html create mode 100644 posts/python/ExampleOfBuildingWebScrapingApplication.html create mode 100644 posts/python/WebScrapingWithBeautifulSoup.html diff --git a/posts/ChatBot with NLTK/Building-chatBot-with-NLTK.html b/posts/ChatBot with NLTK/Building-chatBot-with-NLTK.html deleted file mode 100644 index 6be67a8..0000000 --- a/posts/ChatBot with NLTK/Building-chatBot-with-NLTK.html +++ /dev/null @@ -1,331 +0,0 @@ - - - - - - - - - ChatBot with NLTK - - - - - - - - - - -
-
- -
-

- Building an Intelligent Chatbot with NLTK -

- -
- ChatBot-with-NLTK -
-
-
- -

- Creating a chatbot involves understanding and processing human language, which can be achieved through Natural Language Processing (NLP). Python’s NLTK library is a powerful tool for NLP that provides easy-to-use interfaces to over 50 corpora and lexical resources. -

-
-

Step-by-Step Guide:


- flowchat-chatbot -
-
-
    -
  1. Environment Setup:
  2. -
      -
    • Python installation: Ensure you have Python installed on your system. Python 3.x versions are recommended.
    • -
    • NLTK Installation: Install the NLTK package using pip:
      - - pip install nltk
      -
      -
    • -
    • Data Sets and Tokenizers: Download necessary NLTK data sets and tokenizers which are essential for processing natural language:
      -
      Python Code
      -
      -import nltk
      -nltk.download('popular')
      -                                        
      -
      -
    • -
    -
  3. Designing Conversaiton Patters:
  4. -
      -
    • Patterns and Intents: Define a dictionary with various intents such as ‘greetings’, ‘goodbyes’, and ‘faq’. Each intent contains a list of possible patterns and responses:
      -
      Python Code
      -
      -CONVERSATION_PATTERNS = {
      -    "greetings": {
      -        "patterns": ["hello", "hi", "hey"],
      -        "responses": ["Hello!", "Hi there!", "Hey!"]
      -    },
      -    "goodbyes": {
      -        "patterns": ["bye", "goodbye", "see you"],
      -        "responses": ["Goodbye!", "See you later!", "Bye!"]
      -    },
      -    // Add more intents as needed
      -}
      -                                        
      -
    • -
    -
  5. Text Processing:
  6. -
      -
    • - Tokenization: Split the text into individual words or tokens. -
    • -
    • Stemming and Lemmatization: Reduce words to their root form to understand the general meaning without tense or plurality.
      -
      Python Code
      -
      -from nltk.stem import WordNetLemmatizer
      -lemmatizer = WordNetLemmatizer()
      -
      -def process_input(input_text):
      -    tokens = nltk.word_tokenize(input_text)
      -    lemmas = [lemmatizer.lemmatize(token.lower()) for token in tokens]
      -    return lemmas
      -                                      
      -
    • -
    -
  7. Classification Model:
  8. -
      -
    • Training Data Preparation: Prepare the dta for training by associating each pattern with its corresponding intent.
    • -
    • Model Training: Use a classification algorithm like Naive Bayes to train the model on the prepared data.
      -
      Python Code
      - -
      -from nltk import NaiveBayesClassifier
      -
      -def train_classifier(patterns):
      -    training_data = []
      -    for intent, data in patterns.items():
      -        for pattern in data['patterns']:
      -            tokens = process_input(pattern)
      -            training_data.append((tokens, intent))
      -    classifier = NaiveBayesClassifier.train(training_data)
      -    return classifier
      -                                      
    • -
    -
  9. Response Generation:
  10. -
      -
    • Response Selection: Based on the classified intent, select an appropriate response from the predefined list.
    • -
    • Chatbot Functionality: Implement the chatbot functionality that takes user input, processes it, classifies it, and then generates a response. -

      Python Code
      - -
      -import random
      -
      -def generate_response(classifier, user_input):
      -    category = classifier.classify(process_input(user_input))
      -    if category in CONVERSATION_PATTERNS:
      -        return random.choice(CONVERSATION_PATTERNS[category]['responses'])
      -    else:
      -        return "I'm not sure how to respond to that."
      -
      -// Example usage:
      -classifier = train_classifier(CONVERSATION_PATTERNS)
      -user_input = "hello"
      -print(generate_response(classifier, user_input))
      -
    • -
    -
-
-
-
-

Integrating Web Scraping:

-

With your interest in web scraping and HTML parsing, you can enhance your chatbot by integrating real-time data extraction. For instance, you could use BeautifulSoup to scrape news headlines or weather information and provide it as part of the chatbot’s responses.

- -

Conclusion:

-

Building a chatbot with NLTK is an enriching experience that hones your skills in NLP. It lays the groundwork for more complex AI projects and opens up possibilities for integrating various functionalities like web scraping.

-
-
-
-
- -
- -
-
Search
-
-
- - -
-
- -
-
- -
-
Categories
-
-
-
- -
-
- -
-
-
-
- -
-
Recent Posts
-
-

Coming Soon..!

-
-
-
-
- - - - - - - - - -
-
-
-
-
- -
-
-

- Copyright © CSEdge Learn 2024 -

-
-
- - - - - - - - -
-
- - \ No newline at end of file diff --git a/posts/Web Scraping tool with BeautifulSoup/ExampleOfBuildingWebScrapingApplication.html b/posts/Web Scraping tool with BeautifulSoup/ExampleOfBuildingWebScrapingApplication.html deleted file mode 100644 index 012c2fa..0000000 --- a/posts/Web Scraping tool with BeautifulSoup/ExampleOfBuildingWebScrapingApplication.html +++ /dev/null @@ -1,244 +0,0 @@ - - - - - - - - Example of Web Scraping application with Beautifulsoup - CSEdge - - - - - - - - - - - -
-
- -
-

Building a Web Scraping Application with BeautifulSoup

- -
- WebScraping -
-
-
- Web scraping is the process of extracting data from websites. It allows you to gather information from various web pages and present it in a structured format. In this article, we’ll explore how to create a simple web scraping application using Python and the BeautifulSoup library. -

Prerequisites

- Before we begin, make sure you have the following installed: -
    -
  • Python: You’ll need Python installed on your system. You can download it from the official Python website.
  • -
  • BeautifulSoup: Install BeautifulSoup using pip:
    - pip install beautifulsoup4
  • -
-

Steps to Create the Web Scraping Application

-
    -
  • Choose a Website to Scrape: Decide which website you want to scrape. For this example, let’s scrape product information from an e-commerce site.
  • -
  • Inspect the HTML Structure: Open the website in your browser and inspect the HTML structure. Identify the elements (tags, classes, or IDs) that contain the data you want to extract.
  • -
  • Write Python Code: Create a Python script to fetch the HTML content of the webpage and parse it using BeautifulSoup. Here’s a basic example: -
    - - import requests
    -   from bs4 import BeautifulSoup
    -
    -   # URL of the website to scrape
    -     url = 'https://example.com/products'
    -
    -   # Send an HTTP request to the website
    -     response = requests.get(url)
    -
    -   # Parse the HTML content
    -     soup = BeautifulSoup(response.content, 'html.parser')
    -
    -   # Find relevant elements (e.g., product names, prices)
    -     product_names = soup.find_all('h2', class_='product-name')
    -     product_prices = soup.find_all('span', class_='product-price')
    -
    -   # Extract data and store it (e.g., in a CSV or JSON file)
    -     for name, price in zip(product_names, product_prices):
    -     print(f"Product: {name.text.strip()}, Price: {price.text.strip()}")
    -
    -   # You can save this data to a CSV or JSON file
  • -
  • Run the Script: Execute your Python script, and it will scrape the product information from the specified website.
  • -
  • Data Storage: Depending on your requirements, you can store the extracted data in a CSV file, JSON file, or a database.
  • -
-
-
-

Conclusion:

-

Web scraping with BeautifulSoup is a powerful technique for extracting data from websites. Remember to respect the website’s terms of use and robots.txt file.
Happy scraping!

-
-
-
-
-
- -
- -
-
Search
-
-
- - -
-
- -
-
- -
-
Categories
-
-
-
- -
-
- -
-
-
-
- -
-
Recent Posts
-
-

Coming Soon..!

-
-
-
-
- - - - - - - - - -
-
-
- -
-
-

- Copyright © CSEdge Learn 2024 -

-
-
- - - - - - diff --git a/posts/Web Scraping tool with BeautifulSoup/WebScrapingWithBeautifulSoup.html b/posts/Web Scraping tool with BeautifulSoup/WebScrapingWithBeautifulSoup.html deleted file mode 100644 index e183c6f..0000000 --- a/posts/Web Scraping tool with BeautifulSoup/WebScrapingWithBeautifulSoup.html +++ /dev/null @@ -1,328 +0,0 @@ - - - - - - - - Web Scraping with Beautifulsoup - - - - - - - - - - - -
-
- -
-

Web Scraping with Beautiful Soup: A Beginner’s Guide

- -
- Flow of WebScraping -
-
-
- -

- Web scraping is the extraction of data from websites. It lets you to collect information from websites, analyze it, and utilize it for a variety of reasons. Python's Beautiful Soup module is a sophisticated online scraping tool that allows you to easily traverse and extract data from HTML and XML pages. -

-

- In this article, we’ll explore how to use Beautiful Soup for web scraping. We’ll cover the following topics: -

- -
    -

  1. Installation and Setup:
  2. -
      -
    • - First, make sure you have Python installed on your system. You can download it from the official Python website. -
    • -
    • - Next, install Beautiful Soup using pip: -
    • - pip install beautifulsoup4 -
    • -
    - -

  3. Understanding HTML Structure:
  4. -
      -
    • - Before scraping a website, inspect its HTML structure. Use your browser’s developer tools (usually accessible via right-clicking and selecting “Inspect” or “Inspect Element”) to explore the page’s elements. -
    • -
    • - Identify the tags, classes, and IDs that contain the data you want to extract. -
    • -
    -

  5. Creating a Beautiful Soup Object:
  6. -
      -
    • - Import Beautiful Soup and the requests library (for fetching web pages) in your Python script: -
    • - Python
      - import requests
      - from bs4 import BeautifulSoup -
      -
    • - Fetch the web page using requests.get(url) and create a Beautiful Soup object: -
    • - Python
      - url = 'https://example.com'
      - response = requests.get(url)
      - soup = BeautifulSoup(response.content, 'html.parser')
      -
      -
    -

  7. Navigating the HTML Tree:
  8. -
      -
    • - Use Beautiful Soup’s methods to navigate the HTML tree: -
        -
      • soup.find(tag, attrs) finds the first occurrence of a tag with specified attributes.
      • -
      • soup.find_all(tag, attrs) finds all occurrences of a tag with specified attributes.
      • -
      -
    • -
    • - Example:
      - Python:
      - title_tag = soup.find('title')
      - print(title_tag.text)
      -
      -
    • -
    -

  9. Extracting Data:
  10. -
      -
    • - Once you’ve located the relevant tags, extract the data: -
    • - - Python:
      - # Extract all links
      -   links = soup.find_all('a')
      -   for link in links:
      -   print(link['href'])
      -
      - # Extract text from a specific element
      -   paragraph = soup.find('p', class_='content')
      -   print(paragraph.text)
      -
      -
    -

  11. Storing Data:
  12. -
      -
    • - You can store the extracted data in various formats: -
        -
      • - CSV: Use the csv module to write data to a CSV file. -
      • -
      • - JSON: Convert data to a JSON object using json.dumps(). -
      • -
      • - Example (CSV):
        - Python:
        - import csv
        - - with open('data.csv', 'w', newline='') as csvfile:
        -    writer = csv.writer(csvfile)
        -    writer.writerow(['Title', 'Link'])
        -    for link in links:
        -    writer.writerow([link.text, link['href']])
        - -
        -
      • -
      -
    • -
    -

  13. Handling Errors and Edge Cases:
  14. -
      -
    • - Websites may change their structure or block scrapers. Handle exceptions and adapt your code accordingly. -
    • -
    • - Respect website terms of use and robots.txt files. -
    • -
    -
-
-
-

Conclusion:

-

- Beautiful Soup simplifies web scraping by providing an intuitive interface for parsing HTML and extracting data. - Explore more Beautiful Soup methods and customize your scraping based on specific requirements. - -

-
-
-
-
-
- -
- -
-
Search
-
-
- - -
-
- -
-
- -
-
Categories
-
-
-
- -
-
- -
-
-
-
- -
-
Recent Posts
-
-

Coming Soon..!

-
-
-
-
- - - - - - - - - -
-
-
- -
-
-

- Copyright © CSEdge Learn 2024 -

-
-
- - - - - - diff --git a/posts/python/Building-chatBot-with-NLTK.html b/posts/python/Building-chatBot-with-NLTK.html new file mode 100644 index 0000000..4921576 --- /dev/null +++ b/posts/python/Building-chatBot-with-NLTK.html @@ -0,0 +1,318 @@ + + + + + + + + + + ChatBot with NLTK + + + + + + + + + + + + +
+
+ +
+

+ Building an Intelligent Chatbot with NLTK +

+ +
+ ChatBot-with-NLTK +
+
+
+ +

+ Creating a chatbot involves understanding and processing human language, which can + be achieved through Natural Language Processing (NLP). Python’s NLTK library is a + powerful tool for NLP that provides easy-to-use interfaces to over 50 corpora and + lexical resources. +

+
+

Step-by-Step Guide:


+ flowchat-chatbot +
+
+
    +
  1. +
    Environment Setup:
    +
  2. +
      +
    • Python installation: Ensure you have Python installed on + your system. Python 3.x versions are recommended.
    • +
    • NLTK Installation: Install the NLTK package using pip:
      + + pip install nltk
      +
      +
    • +
    • Data Sets and Tokenizers: Download necessary NLTK data sets + and tokenizers which are essential for processing natural language:
      +
      Python Code
      +
      +import nltk
      +nltk.download('popular')
      +                                        
      +
      +
    • +
    +
  3. +
    Designing Conversaiton Patters:
    +
  4. +
      +
    • Patterns and Intents: Define a dictionary with various + intents such as ‘greetings’, ‘goodbyes’, and ‘faq’. Each intent contains a + list of possible patterns and responses:
      +
      Python Code
      +
      +CONVERSATION_PATTERNS = {
      +    "greetings": {
      +        "patterns": ["hello", "hi", "hey"],
      +        "responses": ["Hello!", "Hi there!", "Hey!"]
      +    },
      +    "goodbyes": {
      +        "patterns": ["bye", "goodbye", "see you"],
      +        "responses": ["Goodbye!", "See you later!", "Bye!"]
      +    },
      +    // Add more intents as needed
      +}
      +                                        
      +
      +
    • +
    +
  5. +
    Text Processing:
    +
  6. +
      +
    • + Tokenization: Split the text into individual words or + tokens. +
    • +
    • Stemming and Lemmatization: Reduce words to their root + form to understand the general meaning without tense or plurality.
      +
      Python Code
      +
      +from nltk.stem import WordNetLemmatizer
      +lemmatizer = WordNetLemmatizer()
      +
      +def process_input(input_text):
      +    tokens = nltk.word_tokenize(input_text)
      +    lemmas = [lemmatizer.lemmatize(token.lower()) for token in tokens]
      +    return lemmas
      +                                      
      +
      +
    • +
    +
  7. +
    Classification Model:
    +
  8. +
      +
    • Training Data Preparation: Prepare the dta for training by + associating each pattern with its corresponding intent.
    • +
    • Model Training: Use a classification algorithm like Naive + Bayes to train the model on the prepared data.
      +
      Python Code
      + +
      +from nltk import NaiveBayesClassifier
      +
      +def train_classifier(patterns):
      +    training_data = []
      +    for intent, data in patterns.items():
      +        for pattern in data['patterns']:
      +            tokens = process_input(pattern)
      +            training_data.append((tokens, intent))
      +    classifier = NaiveBayesClassifier.train(training_data)
      +    return classifier
      +                                      
      +
    • +
    +
  9. +
    Response Generation:
    +
  10. +
      +
    • Response Selection: Based on the classified intent, select + an appropriate response from the predefined list.
    • +
    • Chatbot Functionality: Implement the chatbot functionality + that takes user input, processes it, classifies it, and then generates a + response. +

      Python Code
      + +
      +import random
      +
      +def generate_response(classifier, user_input):
      +    category = classifier.classify(process_input(user_input))
      +    if category in CONVERSATION_PATTERNS:
      +        return random.choice(CONVERSATION_PATTERNS[category]['responses'])
      +    else:
      +        return "I'm not sure how to respond to that."
      +
      +// Example usage:
      +classifier = train_classifier(CONVERSATION_PATTERNS)
      +user_input = "hello"
      +print(generate_response(classifier, user_input))
      +
      +
    • +
    +
+
+
+
+

Integrating Web Scraping:

+

With your interest in web scraping and HTML parsing, you can enhance your chatbot by + integrating real-time data extraction. For instance, you could use BeautifulSoup to + scrape news headlines or weather information and provide it as part of the chatbot’s + responses.

+ +

Conclusion:

+

Building a chatbot with NLTK is an enriching experience that hones your skills in + NLP. It lays the groundwork for more complex AI projects and opens up possibilities + for integrating various functionalities like web scraping.

+
+
+
+
+
+ +
+ +
+
Search
+
+
+ + +
+
+ +
+
+ +
+
Categories
+
+
+
+ +
+
+ +
+
+
+
+ +
+
Recent Posts
+
+

Coming Soon..!

+
+
+
+
+ + + + + + + + + +
+
+
+
+
+ +
+
+

+ Copyright © CSEdge Learn 2024 +

+
+
+ + + + + + + + + +
+
+ + + \ No newline at end of file diff --git a/posts/python/ExampleOfBuildingWebScrapingApplication.html b/posts/python/ExampleOfBuildingWebScrapingApplication.html new file mode 100644 index 0000000..527fa79 --- /dev/null +++ b/posts/python/ExampleOfBuildingWebScrapingApplication.html @@ -0,0 +1,218 @@ + + + + + + + + + Example of Web Scraping application with Beautifulsoup - CSEdge + + + + + + + + + + + +
+
+ +
+

Building a Web Scraping Application with BeautifulSoup

+ +
+ WebScraping +
+
+
+ Web scraping is the process of extracting data from websites. It allows you to gather + information from various web pages and present it in a structured format. In this + article, we’ll explore how to create a simple web scraping application using Python and + the BeautifulSoup library. +

Prerequisites

+ Before we begin, make sure you have the following installed: +
    +
  • Python: You’ll need Python installed on your system. You can + download it from the official Python website.
  • +
  • BeautifulSoup: Install BeautifulSoup using pip:
    + pip install beautifulsoup4 +
  • +
+

Steps to Create the Web Scraping Application

+
    +
  • Choose a Website to Scrape: Decide which website you want to + scrape. For this example, let’s scrape product information from an e-commerce + site.
  • +
  • Inspect the HTML Structure: Open the website in your browser + and inspect the HTML structure. Identify the elements (tags, classes, or IDs) + that contain the data you want to extract.
  • +
  • Write Python Code: Create a Python script to fetch the HTML + content of the webpage and parse it using BeautifulSoup. Here’s a basic example: +
    + + import requests
    +   from bs4 import BeautifulSoup
    +
    +   # URL of the website to scrape
    +     url = 'https://example.com/products'
    +
    +   # Send an HTTP request to the website
    +     response = requests.get(url)
    +
    +   # Parse the HTML content
    +     soup = BeautifulSoup(response.content, 'html.parser')
    +
    +   # Find relevant elements (e.g., product names, prices)
    +     product_names = soup.find_all('h2', class_='product-name')
    +     product_prices = soup.find_all('span', class_='product-price')
    +
    +   # Extract data and store it (e.g., in a CSV or JSON file)
    +     for name, price in zip(product_names, product_prices):
    +     print(f"Product: {name.text.strip()}, Price: {price.text.strip()}")
    +
    +   # You can save this data to a CSV or JSON file
    +
  • +
  • Run the Script: Execute your Python script, and it will scrape + the product information from the specified website.
  • +
  • Data Storage: Depending on your requirements, you can store + the extracted data in a CSV file, JSON file, or a database.
  • +
+
+
+

Conclusion:

+

Web scraping with BeautifulSoup is a powerful technique for extracting data from + websites. Remember to respect the website’s terms of use and robots.txt file.
+ Happy scraping!

+
+
+
+
+
+ +
+ +
+
Search
+
+
+ + +
+
+ +
+
+ +
+
Categories
+
+
+
+ +
+
+ +
+
+
+
+ +
+
Recent Posts
+
+

Coming Soon..!

+
+
+
+
+ + + + + + + + + +
+
+
+
+
+ + +
+
+

+ Copyright © CSEdge Learn 2024 +

+
+
+ + + + + + + \ No newline at end of file diff --git a/posts/python/WebScrapingWithBeautifulSoup.html b/posts/python/WebScrapingWithBeautifulSoup.html new file mode 100644 index 0000000..b914a5d --- /dev/null +++ b/posts/python/WebScrapingWithBeautifulSoup.html @@ -0,0 +1,326 @@ + + + + + + + + + Web Scraping with Beautifulsoup + + + + + + + + + + + +
+
+ +
+

Web Scraping with Beautiful Soup: A Beginner's Guide

+ +
+ Flow of WebScraping +
+
+
+ +

+ Web scraping is the extraction of data from websites. It lets you to collect + information from websites, analyze it, and utilize it for a variety of reasons. + Python's Beautiful Soup module is a sophisticated online scraping tool that allows + you to easily traverse and extract data from HTML and XML pages. +

+

+ In this article, we’ll explore how to use Beautiful Soup for web scraping. We’ll + cover the following topics: +

+ +
    +

    +
  1. Installation and Setup:
  2. +

    +
      +
    • + First, make sure you have Python installed on your system. You can download + it from the official Python website. +
    • +
    • + Next, install Beautiful Soup using pip: +
    • +
    • + pip install beautifulsoup4 +
    • +
    + +

    +
  3. Understanding HTML Structure:
  4. +

    +
      +
    • + Before scraping a website, inspect its HTML structure. Use your browser’s + developer tools (usually accessible via right-clicking and selecting + “Inspect” or “Inspect Element”) to explore the page’s elements. +
    • +
    • + Identify the tags, classes, and IDs that contain the data you want to + extract. +
    • +
    +

    +
  5. Creating a Beautiful Soup Object:
  6. +

    +
      +
    • + Import Beautiful Soup and the requests library (for fetching web pages) in + your Python script: +
    • + Python
      + import requests
      + from bs4 import BeautifulSoup +
      +
    • + Fetch the web page using requests.get(url) and create a Beautiful Soup + object: +
    • + Python
      + url = 'https://example.com'
      + response = requests.get(url)
      + soup = BeautifulSoup(response.content, 'html.parser')
      +
      +
    +

    +
  7. Navigating the HTML Tree:
  8. +

    +
      +
    • + Use Beautiful Soup’s methods to navigate the HTML tree: +
        +
      • soup.find(tag, attrs) finds the first occurrence of a + tag with specified attributes.
      • +
      • soup.find_all(tag, attrs) finds all occurrences of a + tag with specified attributes.
      • +
      +
    • +
    • + Example:
      + Python:
      + title_tag = soup.find('title')
      + print(title_tag.text)
      +
      +
    • +
    +

    +
  9. Extracting Data:
  10. +

    +
      +
    • + Once you've located the relevant tags, extract the data: +
    • + + Python:
      + # Extract all links
      +   links = soup.find_all('a')
      +   for link in links:
      +   print(link['href'])
      +
      + # Extract text from a specific element
      +   paragraph = soup.find('p', class_='content')
      +   print(paragraph.text)
      +
      +
    +

    +
  11. Storing Data:
  12. +

    +
      +
    • + You can store the extracted data in various formats: +
        +
      • + CSV: Use the csv module to write data to a CSV file. +
      • +
      • + JSON: Convert data to a JSON object using json.dumps(). +
      • +
      • + Example (CSV):
        + Python:
        + import csv
        + + with open('data.csv', 'w', newline='') as csvfile:
        +    writer = csv.writer(csvfile)
        +    writer.writerow(['Title', 'Link'])
        +    for link in links:
        +    writer.writerow([link.text, link['href']])
        + +
        +
      • +
      +
    • +
    +

    +
  13. Handling Errors and Edge Cases:
  14. +

    +
      +
    • + Websites may change their structure or block scrapers. Handle exceptions and + adapt your code accordingly. +
    • +
    • + Respect website terms of use and robots.txt files. +
    • +
    +
+
+
+

Conclusion:

+

+ Beautiful Soup simplifies web scraping by providing an intuitive interface for + parsing HTML and extracting data. + Explore more Beautiful Soup methods and customize your scraping based on specific + requirements. + +

+
+
+
+
+
+ +
+ +
+
Search
+
+
+ + +
+
+ +
+
+ +
+
Categories
+
+
+
+ +
+
+ +
+
+
+
+ +
+
Recent Posts
+
+

Coming Soon..!

+
+
+
+
+ + + + + + + + + +
+
+
+
+
+ +
+
+

+ Copyright © CSEdge Learn 2024 +

+
+
+ + + + + + + \ No newline at end of file diff --git a/sitemap.xml b/sitemap.xml index 8ad17f3..db127df 100644 --- a/sitemap.xml +++ b/sitemap.xml @@ -30,4 +30,13 @@ https://learn.csedge.courses/posts/technology/How-to-build-rest-api-using-node-and-mongo-db.html + + https://learn.csedge.courses/posts/python/WebScrapingWithBeautifulSoup.html + + + https://learn.csedge.courses/posts/python/ExampleOfBuildingWebScrapingApplication.html + + + https://learn.csedge.courses/posts/python/Building-chatBot-with-NLTK.html + \ No newline at end of file From ce541a469e29ea7df27ecfcde374732ff739e9fe Mon Sep 17 00:00:00 2001 From: Durgesh Vaigandla Date: Mon, 17 Jun 2024 14:32:56 +0530 Subject: [PATCH 3/3] Add Google Analytics tracking code to HTML files --- index.html | 118 ++++++++++-------- posts/python/Building-chatBot-with-NLTK.html | 63 ++++++---- ...ampleOfBuildingWebScrapingApplication.html | 9 ++ .../python/WebScrapingWithBeautifulSoup.html | 9 ++ 4 files changed, 119 insertions(+), 80 deletions(-) diff --git a/index.html b/index.html index e82c4d1..aefc6e0 100644 --- a/index.html +++ b/index.html @@ -2,6 +2,15 @@ + + + @@ -461,7 +470,9 @@
About Us
Simplify your internship experience with our easy-to-follow articles and docs.

- Subscribe on LinkedIn + Subscribe on LinkedIn
@@ -502,63 +513,64 @@
Quick Links
- - - - -
- -
- - -
-
-
- -
-
-
+ + + + +
+ + +
+
+
+ +
+ + + + diff --git a/posts/python/Building-chatBot-with-NLTK.html b/posts/python/Building-chatBot-with-NLTK.html index 4921576..e171777 100644 --- a/posts/python/Building-chatBot-with-NLTK.html +++ b/posts/python/Building-chatBot-with-NLTK.html @@ -2,6 +2,15 @@ + + + - - +
diff --git a/posts/python/ExampleOfBuildingWebScrapingApplication.html b/posts/python/ExampleOfBuildingWebScrapingApplication.html index 527fa79..68e65e4 100644 --- a/posts/python/ExampleOfBuildingWebScrapingApplication.html +++ b/posts/python/ExampleOfBuildingWebScrapingApplication.html @@ -2,6 +2,15 @@ + + + + + +