Portfolio
Business Process Automation (BPA)
Pandemic Insights Analyzer
Industry:
Public Health / Data Analytics
Researchers and public health officials to automate the categorization and analysis of massive COVID-19 text datasets.
Problem
The sheer volume of pandemic-related literature and news made manual tracking impossible for researchers.
Information Overload:
Public health officials couldn't process thousands of sources in real-time.
Data Fragmentation:
Essential details like dates, locations, and gathering sizes were buried in unstructured text.
Geographical Mapping Lag:
Identifying localized trends required manual cross-referencing of city and state data.
Solution
A multi-stage NLP pipeline designed to transform raw article data into structured, actionable insights.
Multi-Stage Filtering:
Implements a rigorous cleaning and sampling process to ensure only high-quality, relevant data reaches the final analysis.
Advanced Entity Extraction:
Utilizes
spaCy
and
Flair
to pinpoint specific entities, including dates, numerical data, and US-specific locations.
Geospatial Logic Engine:
Automatically maps city mentions to counties and states using a custom integration of the uscities.csv database.
Semantic Scoring:
Employs
GloVe
embeddings and cosine similarity to identify and quantify specific event metrics, such as gathering sizes.
Tech Stack
Core Language:
Python
Data Manipulation:
Pandas
NLP Frameworks:
spaCy, Flair, GloVe
Formats:
CSV and Excel integration for cross-platform compatibility
Results
Operational Efficiency:
Automated the filtering and extraction process, reducing analysis time from weeks to hours.
High-Fidelity Tracking:
Provided precise geographical insights into the spread of the pandemic and public response patterns.
Data Accuracy:
Removed noise and addressed missing values systematically, ensuring decision-makers worked with clean datasets.