Week 14 Lesson 2

Data Parsing##

In this lesson, you will learn about extracting information from structured data sets. This includes parsing data from XML formats such as HTML, which is the language in which web pages are written and stored. To do this you will learn about the BeautifulSoup parsing library and the libxml parsing engine. You also will review the basics of regular expressions, which can speed up the extraction of specific data from XML formatted files.

###Objectives ### By the end of this lesson, you will be able to:

Understand how to use a data parsing library like BeautifulSoup.
Understand how to find and extract information from an XML format file
Understand how to extract data from a webpage.
Understand the document object model

Time Estimate

Approximately 2 hours.

Readings

Course IPython Notebook on Data Parsing
BeautifulSoup documentation
Scrapy, a new web scraping framework in Python

Optional Additional Readings####

A course primer notebook on Pandas
A web scraping in Python tutorial
Another web scraping in Python tutorial

Assessment

When you have completed and worked through the above readings, please take the Week 14 Lesson 2 Assessment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lesson2.md

lesson2.md

Week 14 Lesson 2

Data Parsing##

Time Estimate

Readings

Optional Additional Readings####

Assessment

Files

lesson2.md

Latest commit

History

lesson2.md

File metadata and controls

Week 14 Lesson 2

Data Parsing##

Time Estimate

Readings

Optional Additional Readings####

Assessment