
about
Hi, I'm Sinjoy, a second-year MS student in CSE at Penn State, where I focus on NLP, ML, and large-scale document analysis. My thesis focuses on improving LLMs in assisting in research and education. As a research assistant in the 'AI in Education' project under Dr. Suman Saha and Dr. Mahfuza Farooque, I am working on a multimodal recommendation system for microlearning based on students' performance data.
During my summer internship at Amazon Web Services, in the Bedrock Agents team I built an agentic customer ticket triaging system, which led to the reduction of initial turnaround time from ~3 days to 5 minutes. At Human Language Technologies Lab, led by Dr. Shomir Wilson, as a student researcher, I improved PrivaSeer, a privacy policy search engine of 3.1 million privacy policies, and worked on finding contradicitons and inconsistencies in legal documents using LLMs.
Previously, I was a Sr. Research Engineer at Siemens Healthineers, where I worked on image and language processing applications. I developed an automated grading system for gamma camera images and led the development of an automated Failure Mode, Effects, and Criticality Analysis (FMECA) for improving product quality, leading to two patent filings.
Check out my Paperstack and my GitHub profile!
intern
experience

Software Development Engineer Intern - AI/ML
Amazon Web Services, Bellevue, WA
May 2025 - Aug 2025
- Developed an Agentic AI system with AWS Bedrock Agents, Kendra and custom action Lambdas to automate customer ticket triaging actions like classification, routing, status updates and context retrieval from CloudWatch logs and Knowledge Base with service wikis.
- Deployed modular Infrastructure-as-Code (IaC) with support for multiple foundation models (Claude 3.7 Sonnet, Nova Pro).
- Leveraged AWS EventBridge, SQS, DynamoDB, S3, REST APIs and Slack SDK for real-time invocation, customer follow-ups for missing info and Slack notifications for SLA failures, reducing initial response time from 3 days to 5 minutes.
- Designed benchmark experiments and metrics for evaluation against ground truth and LLM-as-a-Judge, achieving 93.6% accuracy.
work
experience

Senior Engineer - Research and Technology
Siemens Healthineers, Bangalore, KA
Mar 2023 - Aug 2024
- Fine-tuned a Siamese BERT network on query-result pairs for re-ranking search results and obtained NDCG at 10 of 0.71.
- Led the deployment of a search API using fastAPI on AWS cloud, leveraging on-premise LLM and Milvus VectorDB to serve RAG pipelines and low-code platforms like Power App and Teams chatbot, saving $20,000 per annum in license costs.
- Designed semi-supervised system for identifying failure modes from service tickets using Named Entity Recognition (NER), HDBSCAN clustering and generative labelling by Orca-2, leading to an invention disclosure.
- Created a Power BI dashboard for visualizing failure modes across installed base and monitor trends in KPIs over time such as service costs, parts replaced and downtime, leading to the redesign of a SPECT subcomponent.
- Developed histopathology WSI segmentation model utilizing DeepLabV3+ and ResNet50 (Dice 0.85, IoU 0.63). Optimized using ONNX, pruning, quantization, normalization and background separation, achieving sub-15 minute inference.

Engineer - Research and Technology
Siemens Healthineers, Bangalore, KA
Dec 2021 - Feb 2023
- Implemented end-to-end Cause and Action phrase extraction from service tickets leveraging roBERTa and NER, increasing average ticket coverage in search results from 14% to 52% compared to earlier POS tagging model.
- Developed an unsupervised method leveraging Word2Vec embeddings and DBSCAN to cluster domain-specific words into a thesaurus, increasing clusters from 15 to 346 and enhancing full-text SQL search for PET/SPECT service tickets.
- Engineered a novel image feature set for Random Forest classifier and an ROI-agnostic artifact segmentation model for automated grading of SPECT QC images, improving F1-score from 0.46 to 0.75, resulting in a patent filing.
research
experience

Research Assistant
AI in Education Project, Penn State
Aug 2025 - Present
- Developing multimodal recommendation systems for personalized microlearning based on student performance data.
- Developing interactive teaching agents to assist undergraduate students in understanding course materials and solving exercise problems.

Student Researcher
Human Language Technologies Lab, Penn State
Sep 2024 - Apr 2025
- Optimized visualization rendering latency for PrivaSeer, a privacy policy search engine containing ~3.1 million policies.
- Leveraged on-premise Llama 3.1-8B-it to extract contradictory and inconsistent statements from privacy policies in JSON format.
- Designed output parsing and self-verification systems to filter incorrectly formatted outputs and reduce hallucinations.

Research Intern
Plant Vision Lab, University of Nebraska-Lincoln
Jul 2021 - Dec 2021
- Developed two methods to predict onset of plant drought stress using Dynamic Time Warping (DTW) algorithm on computed phenotypes and 1D-CNN for temporal stress propagation on hyperspectral images (F1: 0.98, mean soil water content (SWC) corr: -0.85), leading to a journal publication.

Research Intern
Sensordrops Networks Pvt. Ltd., IIT Kharagpur
Jul 2021 - Dec 2021
- Introduced a novel algorithm for Federated Learning in non-IID data by clustering on client data statistics, surpassing FedAvg accuracy by 2% and reducing aggregation time by 67% compared to clustering on local weights.
education

Masters of Science in Computer Science and Engineering
Pennsylvania State University - University Park, PA
Aug 2024 - May 2026
- Courses: Machine Learning - Tools and Algorithms, Design and Analysis of Algorithms, Fundamentals of Computer Architecture

Bachelor of Technology in Electronics and Communication Engineering
Pennsylvania State University - University Park, PA
Oct 2016 - Sep 2020
- Courses: Engineering Math – I & II, Data Structures & Algorithms, Computer Organization & Architecture, Microprocessors, Analog & Digital Circuits, Control Theory, Digital Signal Processing and Communication Theory.
- Thesis: Environmental Monitoring using SFCW Radar for Minor Crack Detection
publications
patents filed
Method and System for improving Product Quality with modified and automated FMECA.
Daga S., Hurley R., Khan K., Morris B., Philip S., Saha S., TK Sowmya.
USPTO, 2025
Artifact Segmentation and/or Uniformity Assessment of a Gamma Camera
Daga S., Saha S., Crawford T. E., Khan K., Morris B.
USPTO, 2023
projects

A Medical Research Methodology Querying System based on Llama-3.2 trained with LoRA
This project explores medical AI system based on Llama-3.2-1B-it trained with LoRA to assist researchers with planning of methods, data subjects and experiments. We utilize the expert-labeled subset of PubMedQA, which includes 1,000 high-quality samples containing research questions and corresponding long answers with methodological insights.

ArXiv Paper Classification using CNN and Glove
We address the challenge of accurately classifying research papers by exploring the strength of the CNN model and employ it to efficiently extract hierarchical and local features from textual data. GloVe embeddings enhance the semantic representation of the input data, transforming words into dense vector representations enriched with pre-trained semantic knowledge.

Machine-Generated Text Attribution
Trained BERT Sequence classifier on a novel dataset created using 5 generative models (GPT-2-small, GPT-2XL, Phi-2, Falcon-7B, Mistral-7B-it) by prompting with text sourced from Wikipedia and GSM8K Math dataset, achieving F1-score of 0.65. and studied the impact of input length, prompt domain, and LLM parameter size on attribution accuracy.
