My Research

I give a technical summary of the projects below and provide links to presentations of some findings, figures & methodology

Trading at Georgia Tech: Quantitative Research

Alpha Research: I develop alpha signals by modeling and exploiting microstructure inefficiencies in high-frequency cryptocurrency markets. My work centers on analyzing limit order book data and constructing predictive features and applying advanced statistical models to forecast mid-to-high frequency returns on cryptocurrency perpetual futures.

Arbitrage: I design statistical arbitrage strategies using cointegrated portfolio construction and eigenportfolio decomposition. My research focuses on implementing and backtesting robust strategies accounting for latency, slippage, dynamic transaction costs, and execution constraints, as well as integrating dynamic risk-management

techniques into our existing statistical arbitrage frameworks to account for breakdown in the underlying statistical properties we need to make a profit.

Once strategies and features are rigorously researched, developed and tested, we apply our strategies and trade with the Trading Club's endowment. We use colocated AWS servers to minimize execution latency. We also collaborate with the Quant Dev team to deploy our strategies on customized low-latency software developed in Rust.

I've included a link below to reports and demonstrations of my research.

Reports

Financial Services Lab: S-1 Filings NLP

I am working on a team developing a software library that automates the extraction and preprocessing of corporate S-1 filings using the SEC EDGAR API. The system performs structured ingestion, parsing, and persistent storage of raw filing content, enabling scalable ingestion of IPO disclosures. Parsed documents are processed into a format suitable for downstream NLP tasks, including sentiment analysis, keyword extraction, and transformer-based language modeling. The goal is to identify linguistic and structural signals predictive of IPO overpricing or underpricing by linking textual features to post-IPO return behavior.

Some of the qualitative metrics of our scraped data we want to look at include specificity of risk factors and prevalence of buzzwords. The hypothesis is that an IPO that makes heavy use of buzzword filled speech and a broad description of risk-factors (as opposed to specific, detailed quantifiable risks) is likely to be overvalued.

CEAR Hub: Machine Learning for Real-Estate Valuation with Flood Risk

This research project involves an end-to-end Machine Learning system that integrates IoT-based environmental sensing, time-series machine learning, and real estate economics to quantify the pricing impact of flood risk on housing in Tybee Island, Georgia.

In the first phase, I built a streaming data pipeline from a custom IoT water-level sensor deployed in Tybee Island, capturing real-time hydrological data. This data feeds into a time-series ML model designed to estimate a robust monthly flood-risk index.

In the second phase, we use this index and apply a hedonic regression model to housing prices in flood-prone neighborhoods where home insurance pricing is expected to reflect actual risk. These fitted models are then extended to other regions to detect systemic underpricing or overpricing in the housing market based on environmental exposure. The goal is to produce a geographic distribution of mispricing, enabling climate-aware valuation and more informed investment or policy decisions.

Source Code