What I am building at ICA

Vursa.ai ↗

An AI-powered platform for Systematic Literature Reviews — a process that traditionally takes research teams months by hand. Vursa compresses the full pipeline (research question refinement, search strategy, abstract and full-text screening, data extraction, PRISMA reporting) into a single multi-tenant product. Leading the build end-to-end and writing the majority of the code across data pipelines, async AI workers, retrieval, application layer, and cloud infrastructure.

Multi-tenant SaaS with org-level subscriptions, seats, and cross-org collaboration
FastAPI + async SQLAlchemy backend, React/Mantine frontend, PostgreSQL, S3, Taskiq workers on Valkey
Provider-agnostic AI layer (Claude, GPT, Gemini) via LiteLLM with batched async screening and extraction
Owning the DevOps surface: AWS infra as Terraform, Dockerized services, GitHub Actions CI/CD with Trivy scanning, and dev/prod environment parity
Enterprise hardening focus — typed API contracts, centralized permissions, audit logging, PRISMA-compliant outputs

Regulatory Document Processing

Building document-processing pipelines that extract text, images, and tables across heterogeneous "regulatory documents" — ingesting 10,000+ documents per week of widely varying length and content. The pipeline runs on AWS so a quiet week and a peak week look the same operationally.

Multi-modal extraction: text, embedded images, and structured tables
AWS-native compute via Batch and ECS tasks with auto-scaling worker fleets
10,000+ documents/week — formats and lengths that swing from a few pages to several thousand
Fault tolerant and idempotent — pages and documents recover independently on retry
Standardized output schema for downstream search and analysis

FDA Signal Detection

Identifying emerging safety signals in noisy healthcare data. The challenge is separating real signal from randomness when reporting is delayed, incomplete, and biased — early detection and prioritization are what matter.

Noisy, delayed reporting data with high variance
Signal detection vs. false positive control
Ranking and prioritization across many entities

What I am building on my own time

Statcast Lab

A personal sandbox for baseball analytics on pitch-level data. Lighter-weight than production work but the underlying problems — feature engineering, modeling assumptions, performance — transfer back to the day job.

Pitch-level modeling and analysis
Investigation of models for xStats
Built with Polars and Clickhouse for fast data pipelines

What I previously built

OpenBB ↗

Founding-team member at OpenBB, the open-source investment research platform that grew out of Gamestonk Terminal during COVID. Helped take it from a viral community project into a venture-backed company, working across the Python codebase as the product and team scaled.

Founding team — joined as the project transitioned from community OSS to a venture-backed company
Python codebase for financial data aggregation, analysis, and modeling
Integrations across many third-party financial data providers and APIs
Open-source contributions, code review, releases, and documentation