Portfolio
Business Process Automation (BPA)

Pandemic Insights Analyzer

Industry: Public Health / Data Analytics
Researchers and public health officials to automate the categorization and analysis of massive COVID-19 text datasets.

Problem

The sheer volume of pandemic-related literature and news made manual tracking impossible for researchers.
  • Information Overload: Public health officials couldn't process thousands of sources in real-time.
  • Data Fragmentation: Essential details like dates, locations, and gathering sizes were buried in unstructured text.
  • Geographical Mapping Lag: Identifying localized trends required manual cross-referencing of city and state data.

Solution

A multi-stage NLP pipeline designed to transform raw article data into structured, actionable insights.
  • Multi-Stage Filtering: Implements a rigorous cleaning and sampling process to ensure only high-quality, relevant data reaches the final analysis.
  • Advanced Entity Extraction: Utilizes spaCy and Flair to pinpoint specific entities, including dates, numerical data, and US-specific locations.
  • Geospatial Logic Engine: Automatically maps city mentions to counties and states using a custom integration of the uscities.csv database.
  • Semantic Scoring: Employs GloVe embeddings and cosine similarity to identify and quantify specific event metrics, such as gathering sizes.

Tech Stack

  • Core Language: Python
  • Data Manipulation: Pandas
  • NLP Frameworks: spaCy, Flair, GloVe
  • Formats: CSV and Excel integration for cross-platform compatibility

Results

  • Operational Efficiency: Automated the filtering and extraction process, reducing analysis time from weeks to hours.
  • High-Fidelity Tracking: Provided precise geographical insights into the spread of the pandemic and public response patterns.
  • Data Accuracy: Removed noise and addressed missing values systematically, ensuring decision-makers worked with clean datasets.