In this lesson, you will learn about extracting information from structured data sets. This includes parsing data from XML formats such as HTML, which is the language in which web pages are written and stored. To do this you will learn about the BeautifulSoup parsing library and the libxml parsing engine. You also will review the basics of regular expressions, which can speed up the extraction of specific data from XML formatted files.
###Objectives ### By the end of this lesson, you will be able to:
- Understand how to use a data parsing library like BeautifulSoup.
- Understand how to find and extract information from an XML format file
- Understand how to extract data from a webpage.
- Understand the document object model
Approximately 2 hours.
- Course IPython Notebook on Data Parsing
- BeautifulSoup documentation
- Scrapy, a new web scraping framework in Python
- A course primer notebook on Pandas
- A web scraping in Python tutorial
- Another web scraping in Python tutorial
When you have completed and worked through the above readings, please take the Week 14 Lesson 2 Assessment.