Sinjoy Saha

Building ideas into reality!

MS CSE @ Penn State

State College, PA, USA

Email

Twitter

GitHub

Scholar

about

Sinjoy is a first year Masters student in Computer Science and Engineering at the Pennsylvania State University. His courses are focused on design and analysis of algorithms, computer architecture, machine learning, computer vision and NLP.

He joined the Human Language Technologies Lab, led by Dr. Shomir Wilson, as a student researcher. He is working on improving PrivaSeer, a privacy policy search engine of ~3.1 million privacy policies. He is also involved in the analysis of legal documents using LLMs to find contradicitons and inconsistencies in such documents. His research interests broadly span the domains of Natural Language Processing, Machine Learning and AI in large-scale document analysis.

Prior to joining Penn State, Sinjoy worked as a Senior Research Engineer at Siemens Healthineers. As part of the Data Analytics team, his contributions spanned multiple projects in image and language processing. He designed and developed an automated grading system of gamma camera images, which led to a patent filing.

He has prior experience in fine-tuning existing LMs for downstream tasks like entity recognition, clustering, retrieval and re-ranking for semantic search. He led the deployment of an on-prem search and summarization engine for recommending corrective actions to service engineers, based on historic service tickets, which helped in reducing downtime and saving license costs. His contributed to automated analysis of failure modes across the installed base to monitor trends in KPIs over time such as service costs, parts replaced and downtime, leading to the redesign of a SPECT subcomponent.

He had been a research intern at Sensordrops Networks (incubated at IIT Kharagpur), where he worked on improving federated learning for non-IID datasets, and at the Plant Vision Lab, University of Nebraska-Lincoln, where he worked on hyperspectral image analysis which led to a journal publication.

Check out his GitHub profile!

work
experience

Senior Engineer - Research and Technology

Siemens Healthineers, Bangalore, KA

Mar 2023 - Aug 2024

Fine-tuned a Siamese BERT network on query-result pairs for re-ranking search results and obtained NDCG at 10 of 0.71.
Led the deployment of a search API using fastAPI on AWS cloud, leveraging on-premise LLM and Milvus VectorDB to serve RAG pipelines and low-code platforms like Power App and Teams chatbot, saving $20,000 per annum in license costs.
Designed semi-supervised system for identifying failure modes from service tickets using Named Entity Recognition (NER), HDBSCAN clustering and generative labelling by Orca-2, leading to an invention disclosure.
Created a Power BI dashboard for visualizing failure modes across installed base and monitor trends in KPIs over time such as service costs, parts replaced and downtime, leading to the redesign of a SPECT subcomponent.
Developed histopathology WSI segmentation model utilizing DeepLabV3+ and ResNet50 (Dice 0.85, IoU 0.63). Optimized using ONNX, pruning, quantization, normalization and background separation, achieving sub-15 minute inference.

Engineer - Research and Technology

Siemens Healthineers, Bangalore, KA

Dec 2021 - Feb 2023

Implemented end-to-end Cause and Action phrase extraction from service tickets leveraging roBERTa and NER, increasing average ticket coverage in search results from 14% to 52% compared to earlier POS tagging model.
Developed an unsupervised method leveraging Word2Vec embeddings and DBSCAN to cluster domain-specific words into a thesaurus, increasing clusters from 15 to 346 and enhancing full-text SQL search for PET/SPECT service tickets.
Engineered a novel image feature set for Random Forest classifier and an ROI-agnostic artifact segmentation model for automated grading of SPECT QC images, improving F1-score from 0.46 to 0.75, resulting in a patent filing.

Programmer Analyst Trainee

Cognizant Technology Solutions, Kolkata, WB

Nov 2020 - Sep 2021

Developed and executed automated testing scripts using Java Selenium, TestNG, and JUnit to ensure quality of API and web apps.
Performed requirement analysis and test design for web/mobile apps and worked with developers to resolve defects.

research
experience

Graduate Researcher

Human Language Technologies Lab, Penn State

Sep 2024 - Present

Optimized visualization rendering latency for PrivaSeer, a privacy policy search engine containing ~3.1 million policies.
Leveraged on-premise Llama 3.1-8B-it to extract contradictory and inconsistent statements from privacy policies in JSON format.
Designed output parsing and self-verification systems to filter incorrectly formatted outputs and reduce hallucinations.

Research Intern

Plant Vision Lab, University of Nebraska-Lincoln

Jul 2021 - Dec 2021

Developed two methods to predict onset of plant drought stress using Dynamic Time Warping (DTW) algorithm on computed phenotypes and 1D-CNN for temporal stress propagation on hyperspectral images (F1: 0.98, mean soil water content (SWC) corr: -0.85), leading to a journal publication.

Research Intern

Sensordrops Networks Pvt. Ltd., IIT Kharagpur

Jul 2021 - Dec 2021

Introduced a novel algorithm for Federated Learning in non-IID data by clustering on client data statistics, surpassing FedAvg accuracy by 2% and reducing aggregation time by 67% compared to clustering on local weights.

education

Masters of Science in Computer Science and Engineering

Pennsylvania State University - University Park, PA

Aug 2024 - May 2026

Courses: Machine Learning - Tools and Algorithms, Design and Analysis of Algorithms, Fundamentals of Computer Architecture

Bachelor of Technology in Electronics and Communication Engineering

Pennsylvania State University - University Park, PA

Oct 2016 - Sep 2020

Courses: Engineering Math – I & II, Data Structures & Algorithms, Computer Organization & Architecture, Microprocessors, Analog & Digital Circuits, Control Theory, Digital Signal Processing and Communication Theory.
Thesis: Environmental Monitoring using SFCW Radar for Minor Crack Detection

publications

(and patent under review)

Artifact Segmentation and/or Uniformity Assessment of a Gamma Camera

Daga S., Saha S., Crawford T. E., Khan K., Morris B.

USPTO, 2023

Drought stress prediction and propagation using time series modeling on multimodal plant image sequences

Das Choudhury, S., Saha, S., Samal, A., Mazis, A. and Awada, T.

Frontiers in Plant Science, 14, p.1003150, 2023

The paper introduces two novel algorithms for predicting and propagating drought stress in plants using image sequences captured by cameras in two modalities, i.e., visible light and hyperspectral.

Smart Health Monitoring System for Temperature, Blood Oxygen Saturation, and Heart Rate Sensing with Embedded Processing and Transmission Using IoT Platform

Basu, S., Saha, S., Pandit, S. and Barman, S.

Computational Intelligence in Pattern Recognition: Proceedings of CIPR 2019 (pp. 81-91). Springer Singapore. Part of book series: Advances in Intelligent Systems and Computing (AISC, vol. 999), 2019

projects

Machine-Generated Text Attribution

Trained BERT Sequence classifier on a novel dataset created using 5 generative models (GPT-2-small, GPT-2XL, Phi-2, Falcon-7B, Mistral-7B-it) by prompting with text sourced from Wikipedia and GSM8K Math dataset, achieving F1-score of 0.65. and studied the impact of input length, prompt domain, and LLM parameter size on attribution accuracy.

Tiktokenizer.js

Developed a static site in pure JavaScript for visualizing the GPT-2 Byte-Pair Encoding (BPE) tokenization process, replicating official OpenAI API

Federated Learning - MNIST

Implementated Federated Averaging (FedAvg) algorithm on MNIST dataset using TensorFlow and Keras

Brain Tumor Segmentation from MRI using PSPNet

Sinjoy Saha

about

workexperience

Senior Engineer - Research and Technology

Siemens Healthineers, Bangalore, KA

Mar 2023 - Aug 2024

Engineer - Research and Technology

Siemens Healthineers, Bangalore, KA

Dec 2021 - Feb 2023

Programmer Analyst Trainee

Cognizant Technology Solutions, Kolkata, WB

Nov 2020 - Sep 2021

researchexperience

Graduate Researcher

Human Language Technologies Lab, Penn State

Sep 2024 - Present

Research Intern

Plant Vision Lab, University of Nebraska-Lincoln

Jul 2021 - Dec 2021

Research Intern

Sensordrops Networks Pvt. Ltd., IIT Kharagpur

Jul 2021 - Dec 2021

education

Masters of Science in Computer Science and Engineering

Pennsylvania State University - University Park, PA

Aug 2024 - May 2026

Bachelor of Technology in Electronics and Communication Engineering

Pennsylvania State University - University Park, PA

Oct 2016 - Sep 2020

publications

Artifact Segmentation and/or Uniformity Assessment of a Gamma Camera

Daga S., Saha S., Crawford T. E., Khan K., Morris B.

USPTO, 2023

Drought stress prediction and propagation using time series modeling on multimodal plant image sequences

Das Choudhury, S.*, Saha, S.*, Samal, A., Mazis, A. and Awada, T.

Frontiers in Plant Science, 14, p.1003150, 2023

The paper introduces two novel algorithms for predicting and propagating drought stress in plants using image sequences captured by cameras in two modalities, i.e., visible light and hyperspectral.

Smart Health Monitoring System for Temperature, Blood Oxygen Saturation, and Heart Rate Sensing with Embedded Processing and Transmission Using IoT Platform

Basu, S., Saha, S., Pandit, S. and Barman, S.

Computational Intelligence in Pattern Recognition: Proceedings of CIPR 2019 (pp. 81-91). Springer Singapore. Part of book series: Advances in Intelligent Systems and Computing (AISC, vol. 999), 2019

projects

Machine-Generated Text Attribution

Tiktokenizer.js

Developed a static site in pure JavaScript for visualizing the GPT-2 Byte-Pair Encoding (BPE) tokenization process, replicating official OpenAI API

Federated Learning - MNIST

Implementated Federated Averaging (FedAvg) algorithm on MNIST dataset using TensorFlow and Keras

Brain Tumor Segmentation from MRI using PSPNet

Demonstrated efficacy of PSPNet for segmentation of brain tumors from MRI data, achieving Dice: 0.66 and IoU: 0.552

work
experience

research
experience

Das Choudhury, S., Saha, S., Samal, A., Mazis, A. and Awada, T.