Available for Internships, Full-Time Roles & Research

Data Science & Research Enthusiast

B.Tech Computer Science Engineering student and Senior Lead Data Analyst. Progressed from Data Analyst Intern to Senior Lead Data Analyst within 4 months. Delivered systems with 98%+ AUC on real-world datasets, built low-latency order-execution engines, and led intern teams to 100% on-time delivery. Architecting production-grade ML systems at the intersection of LLM fine-tuning, multi-agent orchestration, and high-throughput data pipelines - with measurable business impact at every layer.

15+ Projects
50+ Certifications
98.2% Best AUC Score
Payal Mishra
Payal Mishra Senior Lead Data Analyst // ML Engineer // B.Tech CSE

Experience &
Professional Path

Dec 2025
Mar 2026
Senior Lead Data Analyst
UpToSkills
Progressed from Data Analyst Intern to Team Lead to Domain Senior Team Lead within 4 months, managing cross-functional teams across concurrent ML projects. Began by performing EDA, data cleaning, and preprocessing on real production datasets, then advanced to leading and mentoring a team of data analyst interns, fostering technical growth and collaboration. In the senior lead capacity, headed department operations, defined team OKRs, ran sprint reviews, and enforced code quality standards, achieving 100% on-time delivery across all project tracks. Standardized Python-based EDA and preprocessing pipelines that were adopted across the 15+ intern cohort, reducing onboarding time for new analysts.
Jan 2026
Data Analyst Intern
SkillsCraft Technology
Analyzed real-world production datasets using Python, Pandas, and SQL, uncovering revenue patterns and customer behaviour anomalies that directly informed product and strategic decisions. Delivered automated EDA reports, outlier-detection scripts, and visual dashboards consumed by senior management. Collaborated cross-functionally with product and business teams in a fast-paced, outcome-driven environment, translating ambiguous business questions into precise data queries and statistically sound findings within tight deadlines.
Apr 2025
Sep 2025
Web Developer
Andaman Dream Yatra
Architected and launched a production-grade WordPress website for a commercial travel & tourism client, owning every phase from requirements scoping and UI/UX wireframing to server configuration, SEO optimisation, and final deployment. Post-launch, retained on a long-term remote maintenance retainer - handling performance tuning, plugin updates, content management, and uptime monitoring - demonstrating full product ownership and the ability to manage ongoing client relationships independently. The engagement delivered a measurable lift in organic search visibility and a modernised booking inquiry pipeline for the client.

Education

🏫
Bachelor of Technology
Computer Science Engineering
Dr. B.R. Ambedkar Institute of Technology
Nov 2024 to Present • Sri Vijaya Puram
Cumulative GPA: 8.0 / 10.0
🎓
Senior Secondary (PCM)
+ Computer Science
St. Mary's Senior Secondary School
Passed May 2022 • Sri Vijaya Puram
Score: 91%

Selected Work &
Analytical Inquiries

01 // Systems Engineering
HFT Alpha Generator & Order Pipeline
A simulated high-frequency trading system ingesting live order-book tick data via WebSocket, processing order decisions in under 5ms. Demonstrates systems-level performance optimization, concurrency handling, and low-latency execution with real-time P&L tracking across multiple simulated instruments.
Python WebSocket Redis Pandas NumPy Async I/O
Engineering Highlight
Integrated Redis for sub-millisecond state caching, reducing pipeline overhead and enabling replay of historical tick sequences for strategy backtesting. Statistical alpha signal generation with dynamic threshold tuning and risk guardrails.
View on GitHub
Order Latency
<5ms
Per Order Decision
State Caching
Redis
Sub-ms Latency
Instruments
Multi
Parallel P&L Tracking
02 // Generative AI
Enterprise Multimodal Multi-Agent RAG Workspace
A corporate AI system with three parallel specialized agents: a vision LLM (LLaVA) for financial chart interpretation, an OCR agent for structured PDF table extraction, and a dense retrieval agent for scanned document Q&A. Achieved ~70% reduction in simulated document analysis time via multi-agent RAG orchestration.
Python LangChain LLaVA FAISS PyMuPDF Tesseract OCR Async
Architecture Highlight
Designed for enterprise-scale document ingestion with modular agent interfaces, each component independently swappable or upgradeable. FAISS vector indexing and async agent coordination deliver the 70% speed gain over sequential processing.
View on GitHub
Analysis Speed
~70%
Faster than Sequential
Agents Running
3
In Parallel
03 // LLM Research
Custom LLM Fine-Tuning for Hinglish NLP
Fine-tuned Llama-3-8B on a curated 15,000+ multilingual Hinglish dataset for e-commerce review moderation. Applied QLoRA (4-bit quantization) on consumer-grade hardware, achieving an 18% improvement in intent classification accuracy over the base model. Rigorously evaluated with ROUGE-1/2/L and BERTScore metrics.
Llama-3-8B Hugging Face QLoRA LoRA Pandas ROUGE Metrics BERTScore 4-bit Quantization
View on GitHub
Dataset Engineering
Curated and cleaned 15,000+ multilingual Hinglish samples: deduplication, label normalization, and stratified train/val/test splits ensuring class balance across code-switched sentence patterns.
Key Result
+18% intent classification accuracy over base Llama-3-8B. Error analysis on low-confidence predictions directly identified dataset gaps, closing the loop between evaluation and data curation.
15K+
Hinglish Samples
+18%
Accuracy Gain
4-bit
QLoRA Quant.
04 // Machine Learning
Credit Card Fraud Detection System
A production-ready fraud detection pipeline achieving 98.21% AUC on Kaggle's 284K transaction dataset with a 0.17% fraud rate. Engineered to handle extreme class imbalance in real-world financial data.
Python Scikit-Learn Imbalanced-learn SMOTE Pandas Matplotlib
Research Impact
The core challenge was the 584:1 class imbalance ratio. By applying SMOTE oversampling and precision-recall optimization instead of standard accuracy metrics, the model achieves high sensitivity without compromising specificity, a critical tradeoff in financial fraud detection.
View on GitHub
AUC Score
98.21%
on 284K Transactions
Class Imbalance Ratio
584:1
Solved via SMOTE
05 // Healthcare Tech
Child Vaccination Management System
A production-ready system compliant with India's National Immunization Schedule (NIS) 2025 for healthcare providers and parents. Streamlines schedule tracking and notification workflows.
Python SQL Healthcare Data NIS 2025
Analytical Insight
Modeled real-world immunization schedules as a state-machine: each child as a node, each vaccine as a timed edge. This graph-theoretic approach enables bulk schedule generation and missed-dose detection at scale.
View on GitHub
06 // Time Series Analytics
Time Series Sales Forecasting
Transforms raw transactional data into daily and monthly sales series, visualizing trends and creating a baseline 6-month forecast to support inventory and planning decisions. Fully implemented in Python.
Python Pandas Matplotlib Time Series Kaggle
Analytical Insight
Applied seasonal decomposition to isolate trend, seasonality, and residual noise components. The 6-month forecast baseline serves as a reproducible benchmark for evaluating more complex ARIMA and Prophet models.
View on GitHub
07 // Computer Vision
Facial Recognition Attendance System
Real-time attendance tracking via webcam using TensorFlow Lite and OpenCV, logging directly to CSV.
Python TensorFlow Lite OpenCV
Research Impact
Edge ML deployment: runs inference locally without cloud dependency, enabling offline use in low-connectivity educational settings.
View on GitHub
08 // Big Data
US Accidents Big Data Analysis
Analyzed 7.7 million traffic records to identify accident hotspots and peak risk times via geospatial modeling.
Python Geospatial Sampling
Scale
7.7M records processed with strategic sampling to balance computational cost and statistical representativeness.
View on GitHub
09 // Longitudinal Study
World Population Analysis
Analyzed World Bank data from 1960 to 2023, tracking global growth trends across 63 years of demographic data.
Python Pandas World Bank API
Research Impact
Automated cleaning workflows ensure reproducible pipelines, essential for academic-grade longitudinal research.
View on GitHub

Skills &
Technologies

Languages
Python Java JavaScript SQL Bash
AI, ML & LLM
Llama-3 Mistral LangChain RAG Pipelines Multi-Agent Systems TensorFlow Lite OpenCV Hugging Face LoRA / QLoRA FAISS Scikit-Learn
Data Engineering
Pandas NumPy EDA Statistical Modeling Regression Time Series Forecasting Tableau
Systems & Infrastructure
Redis WebSocket Linux Git / GitHub REST APIs
Web & Design
Full-Stack Development HTML / CSS WordPress Figma

The Binary
Profile

+
Core Strengths
Current Competencies
End-to-End ML Pipelines
From raw data ingestion through EDA, feature engineering, model training, and evaluation using Python, Pandas, and Scikit-Learn.
Team Leadership under Pressure
Progressed from intern to Senior Team Lead within 3 months, coordinating multiple data science teams simultaneously.
Statistical Thinking
Deep comfort with regression, classification, imbalanced datasets, and time-series forecasting with an emphasis on valid inference over raw accuracy.
Rapid Certification & Self-Learning
50+ certifications from Google, Deloitte, Meta, Yale, and Cisco, demonstrating disciplined, continuous skill acquisition.
Computer Vision Applications
Deployed edge ML models with TensorFlow Lite and OpenCV for real-world attendance and recognition systems.
Research Methodology
Structured problem framing, reproducible workflows, and academic-grade documentation across all projects.
-
Future Areas of Mastery
PhD-Level Growth Targets
Deep Learning Theory
Bridging applied ML experience with rigorous mathematical foundations in neural architectures, backpropagation theory, and optimization landscapes.
Academic Research Writing
Developing the discipline of peer-reviewed publication: hypothesis formulation, literature synthesis, and structured scientific argumentation.
Large-Scale Distributed Computing
Moving beyond single-machine pandas workflows toward Spark, Dask, and cloud-native data processing for truly large datasets.
Advanced NLP and LLM Architecture
Building theoretical depth in transformer architectures, attention mechanisms, and fine-tuning strategies beyond surface-level API usage.
Causal Inference
Mastering the distinction between correlation and causation through Bayesian networks, instrumental variables, and do-calculus, critical for PhD-level research.
Domain Specialization
Converging broad interdisciplinary skills (healthcare, finance, social data) into a focused research niche for dissertation-level contribution.

Certifications &
Recognition

Data Science League: 2nd Place, All India Google Advanced Data Analytics Deloitte Data Analytics Simulation Google Data Science Foundations Forage Data Science Simulation Google Cybersecurity Professional Meta JavaScript Programming Google UX Design Professional Google Generative AI Yale Introduction to Psychology Nuts and Bolts of Machine Learning Regression Analysis (Google) LLM Fine-Tuning (Llama-3 / QLoRA) Multi-Agent RAG Systems HFT Alpha Generator IoT Professional Training (BR Ambedkar Institute) Google ML Crash Course Microsoft Office Specialist: Excel (MOS) Data Science Job Simulation (Forage) Data Science League: 2nd Place, All India Google Advanced Data Analytics Deloitte Data Analytics Simulation Google Data Science Foundations Forage Data Science Simulation Google Cybersecurity Professional Meta JavaScript Programming Google UX Design Professional Google Generative AI Yale Introduction to Psychology
NPTEL Design Thinking Cisco Introduction to Cybersecurity Google AI Essentials Accenture Web Analytics Google Fundamentals of Digital Marketing Microsoft Excel Specialist Full Stack Development (Udemy) Figma High-Fidelity Prototypes Tally Prime and GST Google Linux and SQL Investment Risk Management Building Dynamic UI (Google) 2nd Place Data Science League (Uptoskills) Fraud Detection: 98.21% AUC Statistical Modeling & EDA Time Series Forecasting Computer Vision (OpenCV) Google Professional Cybersecurity Certificate Web Development (WordPress) Python for Data Science (IBM) SQL for Data Science (Coursera) Generative AI with LLMs (DeepLearning.AI) Crash Course on Python (Google) Data Visualization with Tableau Agile Project Management (Google) Foundations of Project Management Neural Networks and Deep Learning Version Control with Git Introduction to Cloud Computing Prompt Engineering for ChatGPT NPTEL Design Thinking Cisco Introduction to Cybersecurity Google AI Essentials Accenture Web Analytics Google Fundamentals of Digital Marketing Microsoft Excel Specialist Full Stack Development (Udemy) Figma High-Fidelity Prototypes

Let's build something
meaningful

Open to internships, full-time roles, and research collaborations. Drop a message below. I read every one.

GitHub LinkedIn View CV Download CV
Send a Message