Article Scraping in Python

Some useful libs for scraping content:

  • Newspaper is a fancy setup with categories, metadata, downloading and NLP summarization. 
  • Python-Goose is a somewhat simpler setup which can scrape content and basic metadata from html articles in several languages.
  • And then there's the classic Beautiful Soup (which the other libs use) for more low level, specific scrape jobs. 
Brendan Thursday 13 February 2014 at 11:19 am | | Notes
Used tags: , ,

Tag Cloud