Developed RAG system with Sentence-BERT and FAISS, achieving 90% precision@k on 170K+ FAQ queries.
Developed ingestion and indexing modules to support scalable and source-linked LLM responses.
Validated system performance on synthetic and real-world FAQ datasets using precision@k, recall@k, and NDCG to measure retrieval relevance and ranking quality.
Transit System
Aggregated real-time transit and bike-share data from public APIs to model the effects of editing transit nodes.
Conducted traffic and usage forecasting using regression models, validated findings with historical demand patterns.
Extracted environmental context features (e.g., sidewalk presence, signage density) from Mapillary images to assess safety implications for bike-share stations.
Depression Detection (Kaggle Competition)
Cleaned and preprocessed noisy, synthetic survey data to improve model signal quality.
Engineered features and performed exploratory analysis to uncover limitations of the synthetic dataset.
Optimized XGBoost classifier for depression detection, attaining 94.5% accuracy on Kaggle via hyperparameter tuning.