Projects
Engram→
Hybrid memory layer for AI agents combining knowledge graph traversal, vector search, and temporal awareness into a single retrieval system.
TokenRouter→
TokenRouter routes LLM requests to the cheapest adequate model using real-time difficulty classification and semantic caching — achieving 68% cost reduction at 94% quality parity.
Lookout→
Distributed multi-camera video RAG with on-device multimodal embeddings over peer-to-peer QUIC. Built in 24 hours — took Grand Prize, Best B2B, and Best Technical Hack at the Cactus × DeepMind × YC hackathon, earning a guaranteed YC interview.
TrajAI→
Open-source testing framework for AI agents. Mock tools and assert on agent behavior rather than raw outputs — filling the gap that standard unit testing frameworks leave for agentic systems.
JudgeCalibrator→
Open-source auditing tool that measures LLM judge reliability. Runs 4 diagnostic probes to detect bias and miscalibration in AI evaluators.
Event-Guided Video Frame Interpolation→
Neural video frame interpolation using synthetic event cameras. Four architectures compared — the best achieves +4.14 dB over RGB-only baselines. Interactive demo with before/after comparisons.
Talon→
Adversarial red-teaming framework for LLM agents. Simulates multi-turn attacks to detect policy violations, jailbreaks, and unsafe behaviors. Deployable as a standalone library or a reusable GitHub Action for CI/CD.
MCP Debug Server→
MCP server that enables Claude Code to interactively debug Python and Node.js applications — set breakpoints, step through code, inspect variables, and evaluate expressions directly from a Claude conversation.
ResumeOptimizer→
Full-stack AI resume tailoring platform. Paste a job description and your resume; AI suggests targeted improvements and rewrites. Includes database persistence for tracking multiple resume versions.
BlinkMonitor→
macOS menubar app that uses real-time facial landmark detection to track blink rate and automatically dims the display when prolonged eye strain is detected.