What I am building at ICA
An AI-powered platform for Systematic Literature Reviews — a process that traditionally takes research teams months by hand. Vursa compresses the full pipeline (research question refinement, search strategy, abstract and full-text screening, data extraction, PRISMA reporting) into a single multi-tenant product. Leading the build end-to-end and writing the majority of the code across data pipelines, async AI workers, retrieval, application layer, and cloud infrastructure.
- Multi-tenant SaaS with org-level subscriptions, seats, and cross-org collaboration
- FastAPI + async SQLAlchemy backend, React/Mantine frontend, PostgreSQL, S3, Taskiq workers on Valkey
- Provider-agnostic AI layer (Claude, GPT, Gemini) via LiteLLM with batched async screening and extraction
- Owning the DevOps surface: AWS infra as Terraform, Dockerized services, GitHub Actions CI/CD with Trivy scanning, and dev/prod environment parity
- Enterprise hardening focus — typed API contracts, centralized permissions, audit logging, PRISMA-compliant outputs
Regulatory Document Processing
Building document-processing pipelines that extract text, images, and tables across heterogeneous "regulatory documents" — ingesting 10,000+ documents per week of widely varying length and content. The pipeline runs on AWS so a quiet week and a peak week look the same operationally.
- Multi-modal extraction: text, embedded images, and structured tables
- AWS-native compute via Batch and ECS tasks with auto-scaling worker fleets
- 10,000+ documents/week — formats and lengths that swing from a few pages to several thousand
- Fault tolerant and idempotent — pages and documents recover independently on retry
- Standardized output schema for downstream search and analysis
FDA Signal Detection
Identifying emerging safety signals in noisy healthcare data. The challenge is separating real signal from randomness when reporting is delayed, incomplete, and biased — early detection and prioritization are what matter.
- Noisy, delayed reporting data with high variance
- Signal detection vs. false positive control
- Ranking and prioritization across many entities
What I am building on my own time
Statcast Lab
A personal sandbox for baseball analytics on pitch-level data. Lighter-weight than production work but the underlying problems — feature engineering, modeling assumptions, performance — transfer back to the day job.
- Pitch-level modeling and analysis
- Investigation of models for xStats
- Built with Polars and Clickhouse for fast data pipelines
What I previously built
Founding-team member at OpenBB, the open-source investment research platform that grew out of Gamestonk Terminal during COVID. Helped take it from a viral community project into a venture-backed company, working across the Python codebase as the product and team scaled.
- Founding team — joined as the project transitioned from community OSS to a venture-backed company
- Python codebase for financial data aggregation, analysis, and modeling
- Integrations across many third-party financial data providers and APIs
- Open-source contributions, code review, releases, and documentation