Deepak Soni

Deepak Soni

AI Architect - AI Centre of Excellence

Oracle EMEA | Campanillas, Spain

Senior Architect with over 20+ years of experience designing and deploying high-performance AI/ML, HPC, CAE and GPU-accelerated cloud infrastructure across global industries

Get In Touch

About Me

Senior Architect with over 20+ years of experience designing and deploying high-performance AI/ML, HPC, CAE and GPU-accelerated cloud infrastructure across global industries. Proven success leading complex architecture engagements, cloud migrations, and GenAI platform enablement for research, enterprise, and hybrid environments.

Known for driving scalable innovation through deep technical expertise, cross-functional collaboration, and a customer-first mindset. Demonstrated ability to deliver infrastructure solutions aligned with business goals, technical compliance, and operational excellence across OCI, AWS, GCP and Azure ecosystems.

Currently serving as AI Architect at Oracle's Centre of Excellence in Campanillas, Spain, specializing in enterprise reference architectures, performance baselines, and guardrails for GPU-accelerated GenAI/HPC on OCI.

Core Expertise

AI/ML Infrastructure HPC Platforms GPU Computing GenAI Platforms Automotive CAE NVIDIA Ecosystem

🚀 Recent Open Source Work

Production-ready frameworks and benchmarks for distributed AI/ML training, inference, and computer vision on Oracle Cloud Infrastructure

⭐ LATEST

ARBM: Agentic AI Benchmarking

Comprehensive 15-track benchmark suite for evaluating LLM agentic workflow capabilities including planning, tool orchestration, self-healing, and context persistence.

Key Results:

Agent Persistence: 95% loop completion
Self-Healing: 100% error recovery
Overall Score: 85% agentic capability

15 Benchmark Tracks:

  • • Planning, Tool Use, Execution, Validation
  • • Context Retention, Error Recovery, JSON Output

Tech Stack:

Llama-3 Nemotron Mixtral vLLM OCI
Agentic Workflows 15 Tracks
🆕 NEW

LLM Quantization Benchmark

Comprehensive benchmarking framework for evaluating LLM inference across quantization methods (AWQ, GPTQ, GGUF, FP8) with throughput, latency, memory, and quality metrics.

Key Results:

AWQ 4-bit: 73% faster vs FP16
Memory: 75% reduction with INT4
Quality: 98% retained accuracy

Methods Compared:

  • • AWQ, GPTQ, GGUF, FP8, AQLM
  • • 7B to 70B models tested

Tech Stack:

vLLM AWQ GPTQ GGUF K8s
Quantization 5+ Methods
🕐 RECENT

LLM Observability Stack v2.0

Production-ready observability framework for monitoring LLM inference workloads on Kubernetes with NVIDIA GPUs, combining centralized logging with infrastructure validation testing.

Key Capabilities:

Documents: 248,805+ indexed logs
Validation: 11 infrastructure tests
Dashboards: 4 pre-built Kibana views

Stack Components:

  • • ELK Stack (Elasticsearch, Logstash, Kibana, Filebeat)
  • • Pytest validation suite, vLLM inference

Tech Stack:

ELK 8.12 vLLM Pytest OKE A10 GPUs
LLM Monitoring 96GB VRAM

Reasoning Model Benchmarking

Comprehensive benchmarking framework for reasoning-first LLMs like NVIDIA Nemotron-3-Nano, analyzing hybrid Mamba-Transformer MoE architectures on OCI GPU infrastructure.

Key Results:

Throughput: 1,390 tokens/sec peak
GSM8K: 60% math reasoning accuracy
HumanEval: 40% code generation

12 Benchmarks:

  • • 7 Custom Tracks (MoE, TP, Context, Concurrency)
  • • 5 Industry Standards (MMLU, GSM8K, HumanEval, MT-Bench)

Tech Stack:

Nemotron Mamba vLLM OKE OCI
Reasoning Analysis 12 Benchmarks

RAG Evaluation Framework

Comprehensive framework for evaluating Retrieval-Augmented Generation (RAG) systems, measuring retrieval quality, generation accuracy, and end-to-end performance on OCI.

Evaluation Metrics:

Retrieval: Precision, Recall, MRR
Generation: Faithfulness, Relevance
Latency: E2E response time

Tech Stack:

RAG Vector DB LLM OCI
RAG Pipeline Quality Metrics
🆕 NEW

Speculative Decoding Framework

Comprehensive performance evaluation of Speculative Decoding techniques for LLM inference acceleration, comparing draft-target model configurations on OCI GPU infrastructure.

Key Benchmarks:

Speedup: Token generation acceleration
Acceptance Rate: Draft token verification
Trade-offs: Quality vs speed analysis

Tech Stack:

Speculative Decoding vLLM Draft Models OCI
Inference Speedup Deep Analysis
🆕 NEW

MoE Inferencing Benchmark

Comprehensive benchmark framework for Mixture of Experts (MoE) model inference on OCI, analyzing expert routing efficiency, throughput scaling, and memory utilization patterns.

Key Focus Areas:

Expert Routing: Load balancing analysis
Throughput: Token generation benchmarks
Memory: Sparse activation patterns

Tech Stack:

MoE Models vLLM Kubernetes OCI
Expert Analysis Performance Metrics

Distributed LLM Training Benchmark

Comprehensive benchmark framework comparing PyTorch DDP, FSDP, and DeepSpeed ZeRO-2/ZeRO-3 for distributed LLM training on Oracle Kubernetes Engine (OKE) with NVIDIA GPUs.

Key Results (4 NVIDIA A10 GPUs):

Best Throughput: ZeRO-2 at 18,147 tokens/sec
Best Memory: ZeRO-3 at 9.67 GB VRAM
Best Scaling: 41.2% efficiency

Challenges Solved:

  • ✅ DeepSpeed configuration bugs (string vs int)
  • ✅ Kubernetes pod results collection
  • ✅ NCCL networking for A10/A100/H100/H200
  • ✅ Worker RANK computation from K8s index

Tech Stack:

PyTorch DeepSpeed Kubernetes OCI NVIDIA GPUs
1,000+ lines Python 4,055 lines docs

Mistral-7B QLoRA Fine-tuning

Production-ready implementation of Mistral-7B-Instruct fine-tuning using QLoRA with 4-bit quantization for efficient training on consumer/cloud GPUs.

Key Features:

4-bit quantization - Reduced memory footprint
QLoRA - Efficient parameter updates
Single GPU - Consumer hardware compatible

Use Cases:

  • • Custom domain adaptation (legal, medical, financial)
  • • Instruction following for specific tasks
  • • Cost-effective LLM customization
  • • Research and experimentation

Tech Stack:

Mistral-7B QLoRA bitsandbytes PEFT HuggingFace
Memory efficient Production ready

YOLO + Triton Inference

High-performance YOLOv8 object detection deployment using NVIDIA Triton Inference Server on Oracle Kubernetes Engine with TensorRT optimization.

Performance Highlights:

TensorRT - GPU-accelerated inference
Triton - Production-grade serving
Kubernetes - Scalable deployment

Applications:

  • • Real-time object detection and tracking
  • • Computer vision pipelines
  • • Edge-to-cloud deployment patterns
  • • Benchmarking inference performance

Tech Stack:

YOLOv8 Triton TensorRT OKE NVIDIA GPU
Low latency Auto-scaling

NVIDIA Nsight Systems Profiling

Deep-dive GPU profiling framework using NVIDIA Nsight Systems to analyze CUDA kernels, NVTX markers, and NCCL communication patterns across distributed training strategies.

Key Insights (2 NVIDIA A10 GPUs):

JIT Compilation: DeepSpeed fused_adam overhead
NCCL Analysis: AllReduce vs AllGather patterns
Timeline: GPU utilization visualization

Profiling Coverage:

  • DDP, FSDP, ZeRO-2, ZeRO-3 strategies
  • CUDA memory allocation tracking
  • CPU-GPU synchronization analysis
  • Communication bottleneck detection

Tech Stack:

Nsight Systems CUDA NVTX NCCL DeepSpeed
GPU Profiling Timeline Analysis

LLM Inference Benchmarking

Comprehensive framework for benchmarking vLLM vs NVIDIA Triton vs HuggingFace TGI inference servers on Kubernetes with NVIDIA Nsight Systems GPU profiling.

Key Results (NVIDIA A10 - Mistral-7B):

Peak Throughput: TGI at 8.07 req/s
Token Rate: vLLM at 412 tok/s
GPU Utilization: vLLM at 99% SM

Framework Features:

  • Side-by-side inference server comparison
  • Nsight Systems CUDA kernel profiling
  • Kubernetes-native deployment manifests
  • A10, A100, H100, H200, B200 GPU support

Tech Stack:

vLLM Triton TGI Kubernetes Nsight
3 Inference Servers Performance Analysis

IBM Fusion HCI LLM Benchmarking

Reusable benchmarking framework for evaluating LLM inference server performance on OpenShift clusters with GPU acceleration using IBM Fusion HCI and NVIDIA A100 MIG GPUs.

Key Results (A100 MIG 20GB):

vLLM: 560.71 tok/s (3.4x faster)
Latency: 2543ms P50 (vLLM)
Triton: 404.72 tok/s

Framework Features:

  • ACM ManifestWork templates for managed clusters
  • Universal benchmark client for all backends
  • Visualization tools for performance analysis
  • Complete GPU troubleshooting docs

Tech Stack:

OpenShift IBM Fusion HCI vLLM Red Hat ACM A100 GPU
OpenShift Native 3 Inference Engines

NVIDIA cuOpt EV Fleet Optimization

Complete framework for deploying NVIDIA cuOpt on OCI for Electric Vehicle fleet optimization with GPU-accelerated route optimization - 10-100x faster than CPU solvers.

Key Results (4x A10 GPUs):

Success Rate: 100% across 17 scenarios
Throughput: 150+ vehicles/min at scale
Cost Savings: 15-25% delivery reduction

Use Cases:

  • Last-mile delivery optimization (24-77s)
  • EV charging station routing
  • Real-time fleet dispatch (<15s)
  • Enterprise 500+ vehicle operations

Tech Stack:

NVIDIA cuOpt OCI OKE NVIDIA NIM A10 GPU
10-100x Faster Route Optimization

LLM Observability Stack

Complete Prometheus + Grafana observability stack for monitoring GPU clusters, vLLM inference, and LLM training workloads on Oracle Kubernetes Engine (OKE).

Key Features:

GPU Metrics: DCGM Exporter for NVIDIA GPUs
vLLM Metrics: Latency, throughput, KV cache
Training: Loss curves, gradient norms

Dashboards Included:

  • • Cluster Management Home (unified view)
  • • GPU Cluster Overview & GPU Health Alerts
  • • vLLM Inference & Training Cluster
  • • OKE Cluster Overview

Tech Stack:

Prometheus Grafana DCGM OKE AlertManager
7 Dashboards 30+ Alert Rules

LLM Serving Benchmark

Comprehensive benchmarking framework for evaluating LLM serving performance comparing vLLM, TGI, and NVIDIA NIM on Kubernetes with detailed latency and throughput analysis.

Key Metrics:

Latency: P50, P95, P99, TTFT, TPOT
Throughput: Tokens/sec, Requests/sec
Resources: GPU memory, utilization

Framework Features:

  • • Side-by-side inference server comparison
  • • Configurable concurrency levels
  • • Kubernetes deployment manifests
  • • A10, A100, H100 GPU support

Tech Stack:

vLLM TGI NIM Kubernetes NVIDIA GPU
3 Inference Servers Performance Analysis

MoE Training Parallelism Framework

Comprehensive framework for benchmarking distributed Mixture-of-Experts (MoE) training using Expert Parallelism (EP) and hybrid EP+Data Parallelism strategies on Oracle Kubernetes Engine with NVIDIA GPUs.

Key Results (4 NVIDIA A10 GPUs):

Hybrid EP=2,DP=2: 8.77x speedup (96,592 tok/s)
Memory Reduction: 56% with Expert Parallelism
Compute/Comm: 34.75x efficient ratio

Benchmark Tracks:

  • Expert routing & load balancing analysis
  • NCCL AlltoAll communication profiling
  • EP vs DP vs Hybrid scaling comparison
  • Auxiliary loss tuning (CV=0.04 optimal)

Tech Stack:

PyTorch NCCL Expert Parallelism Kubernetes OCI
5 Benchmark Tracks Expert Parallelism

LLM Training Parallelism Guide

Practical strategy guide for selecting LLM training parallelism approaches, comparing DDP, Pipeline, Tensor Parallelism, and hybrid strategies with detailed NCCL communication pattern analysis.

Key Results (4 NVIDIA A10 GPUs):

DDP (4 GPU): 52,847 tok/s (54% scaling)
NCCL Analysis: AllReduce, Send/Recv, AllGather
Hybrid PP=2xTP=2: 10,069 tok/s

Strategy Coverage:

  • Data Parallelism (DDP) - single & multi-node
  • Pipeline Parallelism (PP=2, PP=4)
  • Tensor Parallelism (TP) strategies
  • NCCL communication pattern visualization

Tech Stack:

PyTorch DDP NCCL Nsight Systems OKE
NCCL Patterns Strategy Guide
16
Open Source Projects
Production-ready frameworks
5K+
Lines of Code
Comprehensive documentation
100%
Cloud Native
OCI + Kubernetes ready

All projects include comprehensive documentation, performance benchmarks, and production deployment guides

View All Projects on GitHub

📰 LinkedIn Newsletter

Weekly insights on AI infrastructure, LLM optimization, and practical engineering

Beyond the Model

Where AI infrastructure meets practical engineering

Weekly deep-dives into AI/ML infrastructure, LLM benchmarking, inference optimization, and real-world engineering challenges. From RAG pipelines to GPU clusters.

400+ Subscribers
Weekly Publication

Recent Articles:

  • Beyond MMLU: Why Traditional AI Benchmarks Are Failing Us
  • Understanding Reasoning-First LLMs: NVIDIA Nemotron Study
  • LLM Observability: Why Monitoring Your AI Infrastructure is Critical
  • RAG Quality vs Speed: A Framework for Measuring What Matters
  • Speculative Decoding: Get 14-17% Faster LLM Inference
Subscribe on LinkedIn

Topics Covered:

AI Benchmarking LLM Inference RAG Systems MoE Architecture GPU Infrastructure Agentic AI Training Parallelism OCI/Cloud

Core Expertise

20+ years of specialized expertise in AI/ML infrastructure, HPC systems, and cloud architecture delivering enterprise-scale solutions

AI/ML & HPC Infrastructure Architecture

Enterprise-Scale Solutions

Leading the design and implementation of large-scale, high-performance computing environments for AI/ML workloads.

  • Large-Scale GPU Cluster Design (NVIDIA A100/H100)
  • Multi-node GPU clusters for GenAI and LLM training
  • Performance optimization & tuning for HPC/AI workloads

Cloud Solutions Architecture

Oracle Cloud Infrastructure

Designing and implementing robust, scalable, and cost-effective cloud solutions on Oracle Cloud Infrastructure (OCI).

  • Hybrid & Multi-Cloud Architecture
  • Kubernetes & Docker Containerization
  • Infrastructure as Code (Terraform, Ansible)

Domain Knowledge

Industry Specialization

  • Automotive HPC: Autonomous Driving (AD/ADAS), Computer-Aided Engineering (CAE), simulation workloads
  • Financial & Defence: Monte Carlo simulation, financial app orchestration
  • Generative AI & NVIDIA Ecosystem: GenAI platforms, LLM training, full NVIDIA AI stack

HPC & Technical Skills

System Architecture

Workload Management

  • • Slurm
  • • IBM LSF

Storage & Networking

  • • Lustre, GPFS
  • • RDMA (RoCE v2)

DevOps & Monitoring

  • • CI/CD, Git
  • • Prometheus, Grafana

Programming

  • • Python, Shell
  • • Linux Systems

Professional Skills

C-Level & Executive Advisory

Customer consultation

Pre-Sales Engineering

Technical consulting

System Design

Documentation

Technical Mentoring

Leadership

Professional Experience

Oracle Logo

AI Architect - AI Centre of Excellence

Oracle Iberia

Feb 2021 - Present | Campanillas, Spain

• Define enterprise reference architectures, performance baselines, and guardrails for GPU-accelerated GenAI/HPC on OCI

• Architect and operate GPU-accelerated Gen AI and HPC/AI platforms on OCI (Kubernetes/OKE plus Slurm & PBS Pro)

• Lead performance engineering & benchmarking, CUDA/NCCL micro-benchmarks, optimize GPU utilization and throughput

• Enable distributed training & inference for LLMs and CV/NLP (DeepSpeed/FSDP/Horovod on Slurm/PBS Pro and OKE)

• Build reusable IaC blueprints (Terraform/Resource Manager, Helm, OCI DevOps/OCIR) for rapid GPU cluster deployment

• Partner with automotive CAE/simulation teams to map CFD/FEA/crash workloads to optimal shapes/schedulers

TECHNOLOGY

Senior Professional, Emerging Technologies

DXC Technology

Nov 2018 – Jan 2021 | Europe & UK

• HPC and emerging technologies consultant supporting scientific computing workloads across financial services, aerospace, and automotive

• Delivered HPC infrastructure engineering using NVIDIA Bright Cluster Manager, xCAT, LSF, and PBS Pro

• Led automation initiatives with Ansible, Docker, and Python for cluster provisioning and application deployment

• Enabled hybrid cloud integration with AWS and GCP for scalable compute environments

• Conducted extensive application and hardware benchmarking with performance optimization

citi

HPC Analyst

Citi (Citicorp Services India)

Aug 2016 – Oct 2018 | Financial Engineering

• HPC Engineer supporting Financial Engineering Research Group for real-time financial trading and risk modeling

• Point-of-Contact for emerging HPC technologies, driving innovation in simulation grid architecture

• Conducted hardware and application benchmarking, validating performance for production trading environments

• Designed and tested hybrid HPC architecture PoCs ensuring scalability and reliability

• Customized ELK stack for infrastructure observability, log correlation, and anomaly detection

TATA TECHNOLOGIES

Lead HPC Solutions Developer

Tata Motors / Tata Technologies

Jun 2008 – Aug 2016 | Automotive R&D

• HPC operations and infrastructure lead for Computer-Aided Engineering (CAE) Research Group

• Directed daily operations of multi-node, heterogeneous HPC cluster for automotive simulations

• Led integration and performance tuning of LS-DYNA, Abaqus, Ansys Fluent, MSC Nastran, StarCCM+, OptiStruct

• Developed custom CAE job submission portal integrated with PBS Pro

• Enabled centralized CAE access via Altair e-Compute portal across engineering teams

S

Senior Linux System Administrator

Sankalp Venture

Mar 2007 – May 2008

Led enterprise Linux infrastructure administration, web/mail servers, and team management for Indian Express news sites.

V

Programmer & Academic Mentor

Vindhaya Institute

Mar 2004 – Feb 2007

Taught computer science subjects, conducted lab sessions, mentored B.E. students on software development projects.

Professional Portfolio

Strategic partnerships with leading organizations across global markets, delivering transformational AI infrastructure solutions with proven results and measurable impact

Fortune 500

Global Enterprises

• Energy & Oil Companies

• Manufacturing Giants

• $2.4T+ combined market cap

AI Unicorns

Innovation Leaders

• LLM Builders

• Biotech AI

• Research Pioneers

Government

Public Sector

• Smart Cities

• National Initiatives

• Vision 2030 Projects

FinTech

Financial Innovation

• Payment Leaders

• Cross-Border Platforms

• Digital Banking

Global Reach

EMEA
Europe, Middle East, Africa
AMERICAS
North & South America
APAC
Asia Pacific
NORDIC
Scandinavian Region
50+
Global Organizations
Across 4 continents
100k+
GPU Hours
AI/ML workloads
$50M+
Infrastructure Value
Deployed solutions

Professional Impact

Delivering mission-critical AI/ML infrastructure solutions that drive digital transformation across industries

Energy Sector

DataRobot AI platform deployment for digital transformation initiatives

✓ Production deployment success

FinTech

Cross-border payments platform with data sovereignty compliance

✓ 25% cost reduction achieved

AI Innovation

LLM training infrastructure for next-generation AI companies

✓ 50+ GPU cluster deployed

Telecom

5G network optimization with AI-driven analytics

✓ Full compliance achieved

Public Sector

Smart city initiatives and digital governance platforms

✓ Multi-region deployment

Manufacturing

Predictive maintenance and quality optimization systems

✓ 30% efficiency gain

Service Categories

Infrastructure Architecture

AI/ML platform design, HPC cluster deployment, cloud migration strategy

Performance Optimization

GPU utilization, RDMA networking, workload scheduling, cost reduction

Compliance & Security

Data sovereignty, regulatory compliance, security best practices

LATEST SUCCESS STORY

Austrian AI Pioneer Breakthrough

50-Node GPU Cluster • xLSTM Technology • Research to Production

Challenge: Austrian LLM builder entering productization phase, seeking European AI leadership position

Solution: Deployed 50+3 Node BM GPU H100.8 cluster with RDMA networking for xLSTM technology research

Impact: Enabled transition from university research to commercial AI products, competing with leading European AI companies

Technical Architecture:

GPU Infrastructure:

  • 50 + 3 Node GPU Cluster
  • BM GPU H100.8 configuration
  • RDMA cluster networking

HPC Components:

  • 14 + 2 Node CPU Cluster
  • BM HPC E5.144 nodes
  • File System Storage (FSS)
xLSTM Technology H100 GPUs RDMA Networking Production Ready

Recognition Received

Q1FY25 EMEA Technology Engineering Excellence Award for outstanding collaborative work and customer success

Professional Network

Building strategic partnerships with industry leaders across global markets to deliver transformational AI infrastructure solutions

Professional Certifications

Oracle Cloud Infrastructure

Architect Associate

2021, 2023

🔒

Security Professional

2023

☁️

Cloud Foundation

2021

⚙️

Operations Associate

2021

NVIDIA AI & Data Centers

🤖

Introduction to AI

Data Centers 2023

💻

GPU Computing

Certified

🚀

AI Infrastructure

Expert Level

Multi-Cloud Expertise

☁️

AWS Architect

Associate Level

🔵

Google Cloud

Professional Series

Bright Cluster

8.0 Administration

4

Oracle Cloud

3

NVIDIA AI

6

Multi-Cloud

2

Leadership

Total of 15 Professional Certifications spanning cloud architecture, AI/ML, data platforms, security, and quality management across Oracle, AWS, GCP, NVIDIA, and industry standards

Client Success Stories

Delivering transformational AI/ML and HPC infrastructure solutions across diverse industry verticals

Global Energy Corporation

Digital Transformation - AI/ML Platform

Fortune 500 Oil & Gas Company

Challenge: Deploy DataRobot AI platform for digital transformation initiatives

Solution: Enhanced performance with specialized OCI features and HPC expertise

Impact: ✅ Delivered on timeline, customer adopted OCI for production workloads

FinTech Payment Leader

Cross-Border Payments Platform

FTSE 250 Listed Company

Challenge: Expand into new market with data sovereignty compliance

Solution: Oracle Cloud deployment with enhanced GlusterFS integration

Impact: ✅ 25% cost reduction vs on-premises, enabling rapid market entry

Telecommunications Giant

5G Network Optimization

Leading Middle East Telecom Provider

Challenge: 5G network optimization with AI-driven analytics platform

Solution: GPU-accelerated HPC cluster with Oracle Linux optimization

Impact: ✅ Full compliance achieved, enhanced performance delivered

Biotech AI Pioneer

Therapeutic AI Research Platform

Healthcare AI Innovator

Challenge: Generative AI for therapeutic antibody design and protein discovery

Solution: High-performance GPU cluster with specialized AI frameworks

Impact: ✅ Accelerated drug discovery, research breakthrough achieved

Government Smart City Initiative

AI-Powered Urban Management

National Vision 2030 Project

Challenge: AI-powered visual pollution detection processing 100K-200K images daily

Solution: Scalable cloud infrastructure with automated AI pipeline

Impact: ✅ Revolutionized environmental monitoring and urban planning

LATEST

European AI Research Unicorn

xLSTM Technology • 50-Node GPU Cluster

Austrian LLM Pioneer

Challenge: European LLM builder entering productization phase, seeking AI leadership position

Solution: Deployed 50+3 Node GPU H100.8 cluster with RDMA networking

Impact: ✅ Enabled transition from university research to commercial AI products

Technical Publications & Insights

Sharing knowledge and insights through technical blogs, reference architectures, and open-source contributions

OCI Reference Architectures

Deploy Scalable OwnGPT Model on Oracle Cloud

Reference architecture for deploying enterprise-scale generative AI solutions on OCI with comprehensive ERP integration capabilities.

Oracle Cloud GenAI ERP Integration
View Architecture

Accelerate VM Image Storage in KVM

Accelerate and scale the storage of virtual machine images in a KVM environment with enterprise-grade reliability.

KVM Storage Performance
View Architecture

Remote Synchronous Block Replication

Use remote synchronous block replication on Oracle Cloud Infrastructure for enterprise-grade data replication.

Block Storage Replication OCI
View Architecture

Video Surveillance Analytics Performance

Video surveillance and analytics software performance optimization on OCI for enhanced security operations.

Video Analytics Performance OCI Blog
Read Blog

Protein Large Language Models

Powering protein large language models in antibody discovery on OCI for pharmaceutical innovation.

Protein LLM Antibody OCI Blog
Read Blog

Telco Innovation with GPUs

Accelerating telco innovation by leveraging power of GPUs on OCI for enhanced customer experiences.

Telco GPU OCI Blog
Read Blog

One Lexiicon ownGPT AI Model

Pioneering collaboration for AI innovation and excellence with One Lexiicon ownGPT AI model on OCI.

ownGPT AI Model OCI Blog
Read Blog

De Novo Antibody Design

Pioneering de novo antibody design with OCI, supporting Silica Corpora's AI mission for precision and efficacy.

AI Design Antibody OCI Blog
Read Blog

Primary GitHub Repository

Personal collection of AI/ML infrastructure projects, HPC configurations, automation scripts, and technical implementations for enterprise-scale deployments.

AI Infrastructure HPC Automation
View Repository

Academic & Research Projects

Research-focused repository containing academic projects, mathematical computing implementations, and early-stage experimental work in system architecture.

Research Academic Mathematics
View Repository

Oracle DevRel Contributions

Enterprise-grade implementations for Oracle Developer Relations, featuring DeepSpeed training, GPU clustering, and production-ready AI infrastructure patterns.

Oracle DeepSpeed Enterprise
View Contributions

Medical RAG Chatbot

Advanced Retrieval-Augmented Generation (RAG) chatbot for medical information and healthcare applications, featuring vector search, semantic understanding, and context-aware responses for medical queries.

RAG Healthcare AI/ML Vector Search
View Repository

Technical Articles from Tata Technologies Experience

Published insights and technical achievements from my tenure as HPC & CAE Systems Engineer at Tata Technologies, focusing on performance optimization and enterprise-scale solutions.

NUMA Benchmarking Results

Performance Comparison Chart

17.7s → 5.59s
68% faster

Unlocking Performance: How NUMA Tuning Can Triple Your CAE Simulation Speed on HPC

Demonstrated 68% reduction in runtime through NUMA optimization techniques for compute-intensive CAE applications like LS-DYNA in automotive simulations. Achieved 45-50% performance improvement in CPU time through strategic process and memory placement.

NUMA HPC Optimization CAE Performance LS-DYNA
Read Article

HPC Ecosystem Architecture

MSC Patran - Stress Analysis

12 weeks → 9 weeks
25% faster

From Bottleneck to Breakthrough: Revolutionizing CAE Workflows with a Tailored HPC Ecosystem

Engineered custom HPC environment that reduced CAE loop time by 25% (12 weeks to 9 weeks) with 8x increase in license utilization. Implemented GlusterFS distributed file system and Torque/Maui job scheduling for enterprise-scale CAE workflows.

HPC Architecture CAE Workflow MSC Nastran Performance
Read Article

Technical Impact

5+

OCI Reference Architectures

10+

Technical Articles

3+

Open Source Projects

1000+

Developer Engagements

Education & Academic Background

Master of Computer Applications (M.C.A)

1999 - 2002

Rajiv Gandhi Proudyogiki Vishwavidyalaya (RGPV)

Madhya Pradesh, India • State Technical University

Comprehensive graduate program in computer applications covering advanced software engineering, distributed systems architecture, database management, network programming, and enterprise computing solutions. Specialized coursework in system design, performance optimization, and scalable application development.

Software Engineering System Architecture Database Systems Network Programming

Bachelor of Science in Mathematics

1995 - 1998

Post Graduate College, Satna

Madhya Pradesh, India • Affiliated College

Rigorous undergraduate program in pure and applied mathematics covering advanced calculus, linear algebra, differential equations, probability theory, statistics, and numerical analysis. Built strong analytical and problem-solving foundation essential for understanding AI/ML algorithms, performance optimization, and computational complexity in large-scale infrastructure systems.

Mathematical Analysis Statistics Numerical Methods Linear Algebra

Academic Foundation

Software Engineering

Systems design, architecture patterns, enterprise software development methodologies, and scalable application frameworks

Mathematical Computing

Statistical analysis, computational mathematics, algorithmic optimization, and mathematical foundations for AI/ML

System Architecture

Distributed systems design, network architecture, high-performance computing principles, and infrastructure scalability

Educational Philosophy

"The combination of computer science expertise and mathematical foundations provides the perfect foundation for understanding both the theoretical principles and practical applications of modern AI/ML infrastructure architecture."

Professional Communities

Active participant in global professional communities spanning AI, HPC, cloud computing, and technology domains

29

Professional Groups

Active member in specialized communities and technical forums worldwide

Key Professional Communities

Big Data & AI

388,963 members

Data Science | Machine Learning | Deep Learning | AI

Project Manager Community

698,184 members

Best Group for Project Management

Cloud Computing

545,631 members

AWS, Azure, GCP, IBM, Alibaba, OCI

High Performance Computing

27,763 members

HPC Infrastructure & Supercomputing

Auto OEM & Dealer Network

409,895 members

World's Largest Automotive Group

Linux Expert

195,565 members

Linux Systems & Open Source

Additional Specialized Communities

CAE & Engineering
CAD, CAE, FEM, MBD & Optimization

AI & ML
OpenAI, ChatGPT, NLP, AI Agents

Scientific Computing
HPC-AI Advisory Council, CSSC

Beyond Technology

When not architecting AI infrastructures, I find balance and inspiration through sports and music

Cricket Enthusiast

The Gentleman's Game

Cricket has been a lifelong passion - from following international matches to understanding the strategic complexities that mirror the analytical thinking required in AI architecture.

Cricket Interests:

  • International cricket analysis and statistics
  • Following World Cup and tournament strategies
  • Player performance analytics and data trends
  • Team dynamics and leadership insights

"Cricket teaches patience, strategy, and the importance of both individual excellence and team collaboration - principles that directly apply to leading complex infrastructure projects."

Music & Singing

Creative Expression

Music provides the perfect counterbalance to technical work - offering creative expression and emotional release that keeps me energized and inspired.

Musical Journey:

  • Vocal performance and singing practice
  • Exploring diverse musical genres and styles
  • Bollywood classics and contemporary hits
  • International music appreciation

"Music teaches rhythm, timing, and the art of harmonious collaboration - qualities essential for orchestrating complex AI infrastructure deployments."

Work-Life Harmony

"The analytical precision required for AI architecture finds perfect balance in the strategic thinking of cricket and the creative flow of music. These passions keep me grounded, inspired, and bring fresh perspectives to solving complex technical challenges."

Strategic Thinking

Cricket strategy enhances architectural planning

Creative Problem-Solving

Musical creativity drives innovative solutions

Team Leadership

Sports and music build collaborative skills

Let's Build the Future Together

Ready to transform your AI/ML infrastructure? Let's discuss how we can accelerate your journey to production-scale AI solutions.

Professional Network

Connect for strategic AI infrastructure discussions and industry insights

LinkedIn Profile

Technical Collaboration

Explore open-source contributions and technical implementations

GitHub Profile
Available on Demand

Open to strategic AI infrastructure consulting and enterprise architecture engagements upon request