Projects

Large Language Models & Natural Language Processing

IntervalTree.rs

Implemented an efficient interval tree in Rust and exposed PyO3 python bindings. GitHub repo.

SEC Filings Data

The MCP server provides end-to-end workflows for SEC filings and earnings call transcripts—including ticker resolution, document retrieval, OCR, embedding, on-disk resource discovery, and semantic search—exposed via MCP and powered by the same olmOCR and embedding backends as vLLM backends. GitHub repo.

Sasha Rush Puzzles

Completed the Tensor Puzzles, GPU Puzzles, and Helion and Triton Puzzles

Teaching Distributed Data Parallelism for LLMs

This playlist is about going from theory to practice for training large models. Watch the playlist.

MoE-ReFT

Parameter-efficient finetuning with ReFT (representation finetuning) on OLMoE-7B-A1B using interventions before and after MoE layers. GSM8k test loss: base 0.82, pre-MoE 0.91, post-MoE 10.8 (worse). GitHub repo.

Enhancement to Grouped Query Attention

Developed a weight-based aggregation of key-value heads to improve T5-small summarisation performance by 2.75% over the grouped query attention baseline. Explore the GitHub repo and the Weights & Biases report.

Sequence Length Balancing in Rust

Built a Rust project for sequence length balancing with a scheduler backed by a ZMQ server using a ROUTER-DEALER architecture. See the GitHub repo.

Benchmark plot for sequence length balancing — Sequence length balancing benchmark.

SEC Filings Question Answering Agent

Built an end-to-end system that parses 10-Q and 10-K filings to answer investor questions about company health. Explore the original project and the revamped finance data LLM repo.

Dashboard from the SEC QA agent — SEC filings question answering workflow.

Open-source contributions

Added specialised loaders to LlamaHub for SEC filings, IMDB reviews, and earnings call transcripts.

Movie Reviews Question Answering Agent

Built a MongoDB Atlas-powered QA system. Browse the code and watch the demo.

Movie QA agent interface — Movie reviews QA prototype for *Parasite*.

Old days of being a finance bro

Reinforcement Learning

Algorithmic Trading with Google Trends

Leveraged web search data as state space to improve RL trading performance. GitHub · Medium

Black-Litterman Portfolio Optimisation

Work-in-progress on applying RL to portfolio construction. GitHub

FinRL Optimisation Contributions

Authored explainers and tutorials for hyperparameter optimisation workflows in FinRL. Article series

Large Language Models & Natural Language Processing

IntervalTree.rs

SEC Filings Data

Sasha Rush Puzzles

Teaching Distributed Data Parallelism for LLMs

MoE-ReFT

Enhancement to Grouped Query Attention

Sequence Length Balancing in Rust

SEC Filings Question Answering Agent

Open-source contributions

Movie Reviews Question Answering Agent

Old days of being a finance bro

Reinforcement Learning

Algorithmic Trading with Google Trends

Black-Litterman Portfolio Optimisation

FinRL Optimisation Contributions

Machine Learning

Applied Machine Learning Portfolio

Investment Management Specialisation notes

Investment Management with Python projects

Optimisation with Gradient Descent & Newton-Raphson

UMass Advanced NLP coursework

Latent Dirichlet Allocation with Gibbs Sampling

Bayesian Regularised Linear Regression