Portfolio
Business Process Automation (BPA)

Crunchbase Data Aggregation System

Industry: Venture Capital / Market Research
Built for analysts to identify and compare high-potential startups using real-time scraping and AI-driven synthesis without manual research.

Problem

The manual collection and comparison of emerging company data led to inconsistent reporting and missed market opportunities.
  • Data Fragmentation: Gathering specific startup metrics like funding, IPO status, and traffic across multiple sources was a slow, manual process.
  • Analysis Bottlenecks: Synthesizing raw data from dozens of companies into a comparative trend report required excessive human effort.
  • Latency Issues: Relying on static reports meant that market intelligence was often outdated by the time it reached decision-makers.

Solution

The pipeline automates the entire lifecycle of market research, from dynamic keyword triggers to structured AI analysis.
  • Dynamic Workflow Orchestration: Built an n8n system that initiates targeted searches based on user-defined keywords.
  • Asynchronous Data Ingestion: Implemented an API polling loop with Bright Data to retrieve startup snapshots reliably once processing is complete.
  • Python ETL Layer: Developed a processing script to clean raw JSON data, standardize financial fields, and isolate the most relevant emerging companies.
  • Single-Pass AI Orchestration: Utilized Google Gemini to perform batch comparative analysis, identifying market trends and differentiators across the entire cohort.
  • Automated Structured Export: Merges AI-generated insights with standardized company data for direct delivery into Google Sheets.

Tech Stack

  • Automation: n8n.
  • Scraping: Bright Data API.
  • AI Engine: Google Gemini.
  • Data Processing: Python (ETL) and Google Sheets API.

Results

  • Operational Speed: Reduced the startup research and comparison cycle from hours to minutes.
  • Strategic Insights: Delivered high-level comparative trends that are impossible to spot through manual spreadsheet entry.
  • Data Accuracy: Eliminated human error in financial reporting by standardizing raw JSON data through a dedicated ETL layer.
  • Scalable Intelligence: Enabled the tracking of multiple industry niches simultaneously without increasing researcher workload.