Sujal Charak · Applied AI / ML Engineer

I build production
ML systems.

From model behavior to on-device deployment.

MS in Computer Science from Boston University, working at the intersection of reinforcement learning, LLM safety, and intelligence that runs where the cloud can’t. Sole inventor of a patent-pending offline RL engine written in C, and a published author. Before grad school, I spent two years shipping large-scale data platforms on Azure.

See what I’ve built Download résumé

pri-sim · tabular Q-learning · live

EP 0 ε 1.00 BEST — exploring

A tiny cousin of my patent-pending RL engine, learning live in your browser: the dot starts with zero knowledge and discovers the shortest route to the square by trial and error. Green marks cells it has learned to value; watch BEST fall as it improves. Tap the grid to drop walls and make it re-learn.

Patent‑pending

Sole inventor, PRI offline-RL system · India 2025

~7,400 LoC

RL engine in C · 22 modules, fully offline

45 TB

Automated duplicate detection across datasets

Published

Author · IJCER Journal

Red‑Team

LLM & RAG security research · AIXT

01 / SELECTED WORK

Systems I’ve designed and shipped.

A mix of independent research, open-source security work, and team engineering, weighted toward reinforcement learning, LLM safety, and resource-constrained intelligence.

PROJECT 01 Patent Pending · Sole Inventor

PRI: Persistent Reinforcement Intelligence

Offline RL engine for OS resource optimization · Apr 2025 – Present

A fully offline reinforcement-learning engine written in C on Linux that ingests live procfs telemetry to autonomously optimize OS resource usage on aging hardware.

Tabular Q-learning over a discretized state space covering CPU, RAM pressure, memory trend, and system load.
Crash-safe persistence across reboots via an atomic A/B slot mechanism for policy storage.
Safety-aware action layer with behavioral process classification, reversible interventions, and automatic rollback.

Provisional patent filed (India, 2025). Architecture and methodology are confidential, happy to walk through the details over email. By requesting access you agree to treat anything shared as confidential under NDA terms.

CLinuxReinforcement Learning SystemsQ-Learning

PRI work in progress: patent architecture figures on a MacBook beside the live PRI agent log running on a 2013 Dell Latitude E6230 — Work in progress Patent architecture (left) and the live agent optimizing a 2013 Dell Latitude E6230 (right), running fully offline.

PrivacyRadar dashboard in light mode showing the Privacy AI panel summarizing a live capture — 01Light mode · live session + Privacy AI

PROJECT 02 Team Project

Privacy Radar

Privacy AI subsystem for a desktop network monitor · 2025

The Privacy AI subsystem for an Electron desktop network monitor that turns live packet telemetry into plain-language privacy insight, without ever exposing raw payloads to the model.

Three-layer design: a local data and context layer, a PrivacyAgent engine that builds a structured prompt, and a UI layer that renders the parsed result.
Single Gemini egress returning a constrained report (summary, insights, risk, recommended actions) over a secure IPC boundary.
Guardrails by construction: the model sees aggregated metadata only, with enforced output fields and a deterministic local fallback.

GeminiElectronAI SafetyPrivacy

View pull request AI architecture

PROJECT 03 Security Research

AIXT — AI eXposure & Trust

RAG exfiltration probe & trust-boundary research · 2025

A security-research toolkit that probes where large language models over-trust the data they retrieve. The opening probe asked one sharp question: can attacker-controlled input pull restricted data out of a RAG pipeline without ever breaking IAM?

Canary-grounded harness: a high-entropy synthetic token is planted in a restricted document, then attacker-style queries run through the RAG pipeline while every response is auto-scanned for exact or partial leakage.
Clean negative baseline: no canary surfaced and access controls held, with direct fetches returning HTTP 403, and request metadata plus responses logged for full reproducibility.
Strategic pivot: that hardened result redirected AIXT toward the real soft spots, embedding and context poisoning, retriever mis-ranking, and over-trust between retrieval and LLM reasoning.

All testing used researcher-owned accounts and synthetic data only, following responsible-disclosure practices. Sensitive victim artifacts are excluded from the public repo.

RAG SecurityRed-TeamingTrust Boundaries EmbeddingsGemini

View repo

AIXT guardrail enclosure concept: an adversarial prompt enters the raw LLM core, which is checked by an evidence validator chip and RAG testing module that stop hallucination and leakage, so only verified output leaves. — Concept architecture An adversarial prompt enters the guardrail enclosure; the raw LLM core is gated by an evidence validator and RAG testing module, so leaked or hallucinated content is stopped and only verified output passes. Click to zoom.

PROJECT 04 Open Source

Portfolio Voice Agent

Voice-first quantitative assistant · 2025

Two-process design splitting the OpenAI reasoning core from a continuous voice loop (wake-word, Vosk speech recognition, human-sounding TTS).
Reasoning you can hear: 1,000-scenario Monte Carlo plus NewsAPI sentiment become a spoken Buy / Hold / Sell call, with ffmpeg simulation videos.

VoiceVoskMonte CarloOpenAI

View repo Watch the sim Hear it rap

PROJECT 05 Academic

Flight Optimization

Cost-optimal routing with graph theory

12 international airports modeled as a NetworkX graph, with Dijkstra finding the cheapest fare path fast.
Bellman-Ford validates costs and flags negative-weight cycles, with the optimal route drawn on a Basemap map.

DijkstraBellman-FordNetworkX

Read the deck View notebook

PROJECT 06 Academic

Precision Farming

ML crop recommendation for Indian agriculture · CS 667

Seven soil and climate inputs (N, P, K, temperature, humidity, pH, rainfall) mapped to the best of 22 candidate crops.
Six classifiers benchmarked (Decision Tree, Naive Bayes, SVM, Logistic Regression, Random Forest, XGBoost), reaching about 99% accuracy.

scikit-learnRandom ForestXGBoost

Read the deck View notebook

02 / EXPERIENCE

Where I’ve operated at scale.

Oct 2021 – Oct 2023

Data Engineer

Tata Consultancy Services · Client: ABN AMRO · Mumbai, India

Two years owning production data infrastructure for a major European bank, building and operating the Azure pipelines behind its analytics and regulatory reporting under strict SLA, reliability, and data-quality requirements.

Owned and operated large-scale Azure data platforms (Azure Data Factory, Databricks), building PySpark and Delta Lake ETL pipelines that processed tens of terabytes across raw and curated layers.
Designed ingestion and transformation workflows that turned raw source feeds into structured, query-ready curated datasets, enabling scalable, reliable access for downstream analytics and reporting teams.
Tuned PySpark workloads through partitioning, caching, and query optimization, reducing data-processing time by up to 80% and improving Azure compute efficiency ~20% through resource-aware pipeline design.
Automated duplicate detection and cleanup across ~45 TB of datasets, raising data accuracy and consistency for downstream analytics, and shipped production-grade pipelines with monitoring, logging, and failure-recovery for high availability.
Worked in Agile sprints owning pipeline development user stories, and supported production on a rotating on-call rotation: validation, schema-change backfills, and stakeholder issue resolution.

Azure Data FactoryDatabricksDelta Lake PySparkSQLCI/CDOn-call

Jan 2025 – May 2025

Teaching Assistant

Boston University · Information Structures with Python · Boston, MA

Graded labs and reviewed Python code quality for graduate students.
Ran weekly support sessions on debugging, data structures, and algorithmic fundamentals.

Feb 2021

Publication

Int’l Journal of Computational Engineering Research (IJCER) · Vol. 11, Issue 2, pp. 06–10

“Polarity Testing and Analysis of Tweets in Twitter using Tweepy” — a Twitter sentiment-analysis pipeline that ingests live tweets via the Tweepy API and classifies them into positive, negative, and neutral polarity with Python NLP.

Built the end-to-end pipeline: tweet collection through the Tweepy API, text preprocessing, and polarity classification over real-time Twitter data.
Co-authored with R. Pandya, S. Moolya, R. Dahivalkar, and H. Gadhadara.

PythonTweepyNLP Sentiment Analysis Read paper →

03 / ABOUT

Sujal Charak presenting his work — Boston University, MA

I’m most interested in intelligence that has to be reliable, safe, and cheap to run. Those are the systems where elegance shows up as fewer failures.

My work sits at the intersection of applied ML and systems engineering. With PRI, I designed an offline reinforcement-learning engine in C that makes autonomous, reversible decisions on live hardware, with no cloud, no network, and safety guarantees baked into the action layer.

Before graduate school I spent two years as a data engineer at TCS, operating large-scale Azure pipelines for a major bank under real SLA and on-call pressure. That production discipline of monitoring, recovery, and cost-awareness carries directly into how I build and evaluate ML systems.

Lately I’ve focused on LLM safety and red-teaming, model evaluation, and on-device intelligence. I care about model behavior under adversarial conditions and about systems that degrade gracefully when things break.

04 / CAPABILITIES

The toolkit.

/ AI & Machine Learning

Generative AIRAG Systems LLM Evaluation & GuardrailsConversational & Agentic AI AI Security & Red-TeamingModel Safety Reinforcement LearningQ-Learning Prompt EngineeringOffline AI XGBoostFeature Engineering Monte CarloRobustness Analysis scikit-learnSupervised Classification Random ForestModel EvaluationLLM APIs

/ Programming & Data

PythonCSQL TypeScriptPySparkDatabricks Delta LakeAzure Data Factory ETL PipelinesTensorFlow LangChain.jsNode.jsReact pandasNumPy

/ Systems & Tooling

LinuxGitElectron OS OptimizationDevOpsCI/CD Monitoring & RecoveryAgile IntelliJ IDEAGoogle CloudREST APIs

EDUCATION & RECOGNITION

Sep 2024 – Jan 2026

Boston University

Master of Science, Computer Science STEM

Jul 2017 – Jul 2021

Mumbai University

B.E., Electronics & Telecommunications

05 / CONTACT

Let’s build something meaningful.

I’m always open to thoughtful conversations, new opportunities, and ideas worth exploring. Feel free to reach out by email or connect with me through the links below.

sujalcharak20@gmail.com github.com/SujalCharak LinkedIn @CharakSujal @PRI_OSLayer Download résumé