Sujal Charak · Applied AI / ML Engineer

I build production
ML systems.

From model behavior to on-device deployment.

MS in Computer Science from Boston University, working at the intersection of reinforcement learning, LLM safety, and intelligence that runs where the cloud can’t. Sole inventor of a patent-pending offline RL engine written in C, and a published author. Before grad school, I spent two years shipping large-scale data platforms on Azure.

pri-sim · tabular Q-learning · live
EP 0 ε 1.00 BEST exploring
A tiny cousin of my patent-pending RL engine, learning live in your browser: the dot starts with zero knowledge and discovers the shortest route to the square by trial and error. Green marks cells it has learned to value; watch BEST fall as it improves. Tap the grid to drop walls and make it re-learn.
Patent‑pending
Sole inventor, PRI offline-RL system · India 2025
~7,400 LoC
RL engine in C · 22 modules, fully offline
45 TB
Automated duplicate detection across datasets
Published
Author · IJCER Journal
Red‑Team
LLM & RAG security research · AIXT
01 / SELECTED WORK

Systems I’ve designed and shipped.

A mix of independent research, open-source security work, and team engineering, weighted toward reinforcement learning, LLM safety, and resource-constrained intelligence.

PROJECT 01 Patent Pending · Sole Inventor

PRI: Persistent Reinforcement Intelligence

Offline RL engine for OS resource optimization · Apr 2025 – Present

A fully offline reinforcement-learning engine written in C on Linux that ingests live procfs telemetry to autonomously optimize OS resource usage on aging hardware.

  • Tabular Q-learning over a discretized state space covering CPU, RAM pressure, memory trend, and system load.
  • Crash-safe persistence across reboots via an atomic A/B slot mechanism for policy storage.
  • Safety-aware action layer with behavioral process classification, reversible interventions, and automatic rollback.

Provisional patent filed (India, 2025). Architecture and methodology are confidential, happy to walk through the details over email. By requesting access you agree to treat anything shared as confidential under NDA terms.

CLinuxReinforcement Learning SystemsQ-Learning
PRI work in progress: patent architecture figures on a MacBook beside the live PRI agent log running on a 2013 Dell Latitude E6230
Work in progress Patent architecture (left) and the live agent optimizing a 2013 Dell Latitude E6230 (right), running fully offline.
PrivacyRadar — Network Monitor
01Light mode · live session + Privacy AI
02Dark mode · global traffic map
03AI overview · per-app summary, risk, actions
Live product Scroll to explore · click any view to zoom.
PROJECT 02 Team Project

Privacy Radar

Privacy AI subsystem for a desktop network monitor · 2025

The Privacy AI subsystem for an Electron desktop network monitor that turns live packet telemetry into plain-language privacy insight, without ever exposing raw payloads to the model.

  • Three-layer design: a local data and context layer, a PrivacyAgent engine that builds a structured prompt, and a UI layer that renders the parsed result.
  • Single Gemini egress returning a constrained report (summary, insights, risk, recommended actions) over a secure IPC boundary.
  • Guardrails by construction: the model sees aggregated metadata only, with enforced output fields and a deterministic local fallback.
GeminiElectronAI SafetyPrivacy
PROJECT 03 Security Research

AIXT — AI eXposure & Trust

RAG exfiltration probe & trust-boundary research · 2025

A security-research toolkit that probes where large language models over-trust the data they retrieve. The opening probe asked one sharp question: can attacker-controlled input pull restricted data out of a RAG pipeline without ever breaking IAM?

  • Canary-grounded harness: a high-entropy synthetic token is planted in a restricted document, then attacker-style queries run through the RAG pipeline while every response is auto-scanned for exact or partial leakage.
  • Clean negative baseline: no canary surfaced and access controls held, with direct fetches returning HTTP 403, and request metadata plus responses logged for full reproducibility.
  • Strategic pivot: that hardened result redirected AIXT toward the real soft spots, embedding and context poisoning, retriever mis-ranking, and over-trust between retrieval and LLM reasoning.

All testing used researcher-owned accounts and synthetic data only, following responsible-disclosure practices. Sensitive victim artifacts are excluded from the public repo.

RAG SecurityRed-TeamingTrust Boundaries EmbeddingsGemini
AIXT guardrail enclosure concept: an adversarial prompt enters the raw LLM core, which is checked by an evidence validator chip and RAG testing module that stop hallucination and leakage, so only verified output leaves.
Concept architecture An adversarial prompt enters the guardrail enclosure; the raw LLM core is gated by an evidence validator and RAG testing module, so leaked or hallucinated content is stopped and only verified output passes. Click to zoom.
PROJECT 04 Open Source

Portfolio Voice Agent

Voice-first quantitative assistant · 2025
  • Two-process design splitting the OpenAI reasoning core from a continuous voice loop (wake-word, Vosk speech recognition, human-sounding TTS).
  • Reasoning you can hear: 1,000-scenario Monte Carlo plus NewsAPI sentiment become a spoken Buy / Hold / Sell call, with ffmpeg simulation videos.
PROJECT 05 Academic

Flight Optimization

Cost-optimal routing with graph theory
  • 12 international airports modeled as a NetworkX graph, with Dijkstra finding the cheapest fare path fast.
  • Bellman-Ford validates costs and flags negative-weight cycles, with the optimal route drawn on a Basemap map.
DijkstraBellman-FordNetworkX
PROJECT 06 Academic

Precision Farming

ML crop recommendation for Indian agriculture · CS 667
  • Seven soil and climate inputs (N, P, K, temperature, humidity, pH, rainfall) mapped to the best of 22 candidate crops.
  • Six classifiers benchmarked (Decision Tree, Naive Bayes, SVM, Logistic Regression, Random Forest, XGBoost), reaching about 99% accuracy.
scikit-learnRandom ForestXGBoost
02 / EXPERIENCE

Where I’ve operated at scale.

Oct 2021 – Oct 2023
Data Engineer
Tata Consultancy Services · Client: ABN AMRO · Mumbai, India

Two years owning production data infrastructure for a major European bank, building and operating the Azure pipelines behind its analytics and regulatory reporting under strict SLA, reliability, and data-quality requirements.

  • Owned and operated large-scale Azure data platforms (Azure Data Factory, Databricks), building PySpark and Delta Lake ETL pipelines that processed tens of terabytes across raw and curated layers.
  • Designed ingestion and transformation workflows that turned raw source feeds into structured, query-ready curated datasets, enabling scalable, reliable access for downstream analytics and reporting teams.
  • Tuned PySpark workloads through partitioning, caching, and query optimization, reducing data-processing time by up to 80% and improving Azure compute efficiency ~20% through resource-aware pipeline design.
  • Automated duplicate detection and cleanup across ~45 TB of datasets, raising data accuracy and consistency for downstream analytics, and shipped production-grade pipelines with monitoring, logging, and failure-recovery for high availability.
  • Worked in Agile sprints owning pipeline development user stories, and supported production on a rotating on-call rotation: validation, schema-change backfills, and stakeholder issue resolution.
Azure Data FactoryDatabricksDelta Lake PySparkSQLCI/CDOn-call
Jan 2025 – May 2025
Teaching Assistant
Boston University · Information Structures with Python · Boston, MA
  • Graded labs and reviewed Python code quality for graduate students.
  • Ran weekly support sessions on debugging, data structures, and algorithmic fundamentals.
Feb 2021
Publication
Int’l Journal of Computational Engineering Research (IJCER) · Vol. 11, Issue 2, pp. 06–10

“Polarity Testing and Analysis of Tweets in Twitter using Tweepy” — a Twitter sentiment-analysis pipeline that ingests live tweets via the Tweepy API and classifies them into positive, negative, and neutral polarity with Python NLP.

  • Built the end-to-end pipeline: tweet collection through the Tweepy API, text preprocessing, and polarity classification over real-time Twitter data.
  • Co-authored with R. Pandya, S. Moolya, R. Dahivalkar, and H. Gadhadara.
PythonTweepyNLP Sentiment Analysis Read paper →
03 / ABOUT
Sujal Charak presenting his work
Boston University, MA

I’m most interested in intelligence that has to be reliable, safe, and cheap to run. Those are the systems where elegance shows up as fewer failures.

My work sits at the intersection of applied ML and systems engineering. With PRI, I designed an offline reinforcement-learning engine in C that makes autonomous, reversible decisions on live hardware, with no cloud, no network, and safety guarantees baked into the action layer.

Before graduate school I spent two years as a data engineer at TCS, operating large-scale Azure pipelines for a major bank under real SLA and on-call pressure. That production discipline of monitoring, recovery, and cost-awareness carries directly into how I build and evaluate ML systems.

Lately I’ve focused on LLM safety and red-teaming, model evaluation, and on-device intelligence. I care about model behavior under adversarial conditions and about systems that degrade gracefully when things break.

04 / CAPABILITIES

The toolkit.

/ AI & Machine Learning

Generative AIRAG Systems LLM Evaluation & GuardrailsConversational & Agentic AI AI Security & Red-TeamingModel Safety Reinforcement LearningQ-Learning Prompt EngineeringOffline AI XGBoostFeature Engineering Monte CarloRobustness Analysis scikit-learnSupervised Classification Random ForestModel EvaluationLLM APIs

/ Programming & Data

PythonCSQL TypeScriptPySparkDatabricks Delta LakeAzure Data Factory ETL PipelinesTensorFlow LangChain.jsNode.jsReact pandasNumPy

/ Systems & Tooling

LinuxGitElectron OS OptimizationDevOpsCI/CD Monitoring & RecoveryAgile IntelliJ IDEAGoogle CloudREST APIs
EDUCATION & RECOGNITION
Sep 2024 – Jan 2026

Boston University

Master of Science, Computer Science STEM
Jul 2017 – Jul 2021

Mumbai University

B.E., Electronics & Telecommunications
05 / CONTACT

Let’s build something meaningful.

I’m always open to thoughtful conversations, new opportunities, and ideas worth exploring. Feel free to reach out by email or connect with me through the links below.