[QCL Workshop] Web Scraping with Python (Level2-Data/Coding)

[QCL Workshop] Web Scraping with Python (Level2-Data/Coding)

By Jeho Park

Date and time

Friday, November 8, 2019 · 1 - 3pm PST

Location

Claremont McKenna College/Roberts North 12 (RN12)

320 East 9th Street Claremont, CA 91711

Description

# Web Scraping with Python (Level2-Data/Coding)

## Summary

In this 2-hour workshop, you will learn a way to collect data from web pages such as Wikipedia using web scraping functions and data manipulation packages in Python.

Learning objectives of the workshop:

  • Understanding Robots.txt and HTTP requests.
  • Understanding basic components of a webpage and HTML.
  • Get familiar with Pandas Module.
  • Parsing html string into Pandas.
  • Parse URL class into Pandas.
  • Parse Tables from Wikipedia into Pandas.
  • Parse non-Wikipedia Tables into Pandas.
  • Parse Wiki InfoBoxes.
  • Write html parsed tables into flat csv.
  • Advanced understanding of HTML parsing using tagging and CSS selection.

## Date and Time

November 8, 2019 from 1 pm to 3 pm (2 hours)

## Location

Roberts North 12 (RN12)

## Pre-requisites

Internet Use: Introductory level (search, log-in, navigation of websites, etc.)

Programming: Basic Python programming skills (functions, packages, etc.)

## Participants

CMC Students, Faculty and Staff

Organized by

Director, Murty Sunak Quantitative and Computing Lab

Sales Ended