[QCL Workshop] Web Scraping with Python (Level2-Data/Coding)
By Jeho Park
Date and time
Friday, November 8, 2019 · 1 - 3pm PST
Location
Claremont McKenna College/Roberts North 12 (RN12)
320 East 9th Street Claremont, CA 91711Description
# Web Scraping with Python (Level2-Data/Coding)
## Summary
In this 2-hour workshop, you will learn a way to collect data from web pages such as Wikipedia using web scraping functions and data manipulation packages in Python.
Learning objectives of the workshop:
- Understanding Robots.txt and HTTP requests.
- Understanding basic components of a webpage and HTML.
- Get familiar with Pandas Module.
- Parsing html string into Pandas.
- Parse URL class into Pandas.
- Parse Tables from Wikipedia into Pandas.
- Parse non-Wikipedia Tables into Pandas.
- Parse Wiki InfoBoxes.
- Write html parsed tables into flat csv.
- Advanced understanding of HTML parsing using tagging and CSS selection.
## Date and Time
November 8, 2019 from 1 pm to 3 pm (2 hours)
## Location
Roberts North 12 (RN12)
## Pre-requisites
Internet Use: Introductory level (search, log-in, navigation of websites, etc.)
Programming: Basic Python programming skills (functions, packages, etc.)
## Participants
CMC Students, Faculty and Staff