From Web Pages to Insights: Integrating Web Scraping and Text Analysis with GPT

Use GPT to summarise content and extract structured data from web pages

Yu Dong
10 min readJun 7, 2024
GPT Generated Image Based on the Article Content

In my past article Topic Summarization and Categorization with GPT, I explored how to summarize and categorize text using the OpenAI API. In this post, I will take it one step further and show you how to integrate web scraping into this workflow to extract structured data info from a series of web pages.

This was originally posted on my blog here.

Context

I have been making one visualization weekly since I started my full-time DS job in 2018. Initially, I followed the datasets posted every week by makeovermonday.co.uk. However, starting in October 2021, I have been finding datasets to visualize every week myself. The topics are mostly inspired by my interests or experience at the time. There have been 139 visualizations since then, so sometimes I wonder what my favorite data topics and visualization types are.

Screenshot of my weekly visualizations

While I have visualization catalog pages with a running list of visualization titles and data sources, they don’t…

--

--