Resume - Systems Architect/Engineer

From iwantchips, 1 Hour ago, written in Plain Text.

This paste will cross the great divide in 22 Hours.

URL https://paste.centos.org/view/1edc49af

Embed

AI Platform Engineer | LLM Systems Architect

OVERVIEW

AI Platform Engineer building production-grade LLM systems with strong

reliability guarantees.

10+ years in distributed systems and high-availability infrastructure, now

focused on AI control planes, RAG pipelines, routing, evaluation frameworks,

and observability.

Designed LLM orchestration systems with explicit validation, retry, rollback,

and commit semantics to reduce non-determinism and enforce correctness. Brings

deep experience in scaling, failure modeling, and SLO-driven production system

design.

SKILLS

AI Platform & LLM Systems: RAG architectures, LLM orchestration, control

planes, routing & fallback logic, evaluation frameworks (Langfuse, deepeval),

trace-level observability, prompt/version management, context engineering,

agentic workflows

Distributed Systems & Infrastructure: High-availability design, autoscaling

concepts, failure modeling, SLO/SLA enforcement, load balancing

(HAProxy/Nginx), Kubernetes, Docker, cloud-native architectures

(AWS/GCP/Azure), Rackspace, Terraform, Ansible, Chef, Linux, Solaris, ZFS

Systems Design & Leadership: Architecture Reviews, Design RFCs, Failure

Modeling, Incident Postmortems, Cross-team platform enablement

Languages: Python, SQL, Bash

Databases: PostgreSQL (expert: performance tuning, query optimization,

replication, partitioning, migrations, security), MySQL (strong: performance

tuning, optimization), CockroachDB, Cassandra, MS SQL Server, Oracle

EXPERIENCE

NetApp (Acquired), Remote — Senior Systems Architect

May 2022 - PRESENT

●

Designed and deployed PostgreSQL-backed LLM platform architectures using

RAG and Model Context Protocol (MCP), treating the database as an AI

control plane rather than a passive datastore. Improved internal

operational workflows and reduced failure rates.

●

Built and productionized multi-agent LLM pipelines using LangGraph,

modeling workflows as explicit state machines with validation, retry,

correction, and commit phases — increased reliability and contributed to

~2x YoY revenue growth.

●

Integrated end-to-end LLM observability and evaluation pipelines

(Langfuse, deepeval), enabling trace-level introspection, automated

scoring, regression detection, and feedback-driven iteration.

●

Defined system invariants, routing gates, and validation probes to

enforce correctness, reproducibility, and safe deployment of AI-assisted

database tooling in enterprise SaaS environments.

●

Applied distributed systems principles (idempotency, transactional

boundaries, rollback semantics, auditability) to AI workflows, reducing

silent failures and improving deployment safety.

●

Designed API-layer abstractions and LLM gateway components to support

routing, validation, and model versioning strategies.

●

Deployed & iterated LLM systems in cloud-native environments (AWS/Azure),

integrating storage, compute, and monitoring layers to support production

workloads.

Instaclustr (Acquired), Remote — Senior Database Reliability

Engineer

March 2021 - May 2022

●

●

●

PostgreSQL scalability expert for DoorDash, helping them scale their

Postgres RDS & Aurora systems to support rapid traffic growth during the

COVID-19 pandemic surge.

Led the team of database experts working on all aspects of DoorDash

database scalability, performance, and infrastructure overhaul

Provided ongoing schema design, query optimization, live production

schema change processes with minimal disruption, and disaster recovery

consulting playing a pivotal role in the company’s growth & operations

leading to a successful IPO

Credativ (Acquired), Remote — Senior Database Reliability

Engineer

January 2019 - March 2021

●

PostgreSQL & MySQL specialist for enterprise clients including DoorDash,

Etsy, National Geographic, Follow Up Boss, Gilt, Leafly, and Paperless

Post among others, maintaining up to 99.999% uptime for mission-critical

operations

●

Led large-scale PostgreSQL schema changes, migrations, major upgrades,

security patching, and performance optimization in high-transaction,

high-availability environments

●

●

24/7 oncall - Prevented critical outages during peak traffic through

proactive tuning, capacity planning, and failure modeling, contributing

to long-term contract renewals

Conducted comprehensive database performance & security audits for

numerous clients providing surgical, concrete solutions to complex issues

& pain points

OmniTI Computer Consulting, Remote — Data Engineer

June 2013 - January 2019

●

Re-designed OLAP ETL pipeline and data sanitization project for Gilt

Japan saving them $2+ million per year on infrastructure and resources

costs

●

●

●

●

Designed and implemented HA/DR strategies that reduced processing time by

65% and recovery time objectives (RTO) by 70%

24/7 PagerDuty oncall for multiple clients - debugging and solving

production critical performance and availability issues at all times of

the night

Built and contributed to several open source projects for database

monitoring, security, replication, observability, reporting,

partitioning, sharding, and migration and employed these tools to better

serve clients

Performed in-depth & thorough configuration tuning for Postgres and MySQL

as well as Linux & Solaris OS to improve database performance and

efficiency for multiple clients. Clients like Locally and Click2Ship

experienced drastic improvements (4x traffic with no issues) in scaling

for holidays & random spikes as a direct result of my endeavors

SELECTED PROJECTS

●

●

Enterprise AI platform for automated DBMS migration and code conversion

using multi-agent orchestration with full auditability, evaluation

pipelines, and feedback loops. Designed for safe migration of

heterogeneous workloads under strict correctness and traceability

guarantees.

Designed and implemented RAG + MCP financial analytics platform with

routing logic, persistence layer, and evaluation-driven pipelines for

reproducible AI-assisted analysis.

●

●

Built real-time streaming RAG + MCP pipeline as proof-of-concept for

reliable LLM deployment under latency constraints, integrating

observability and controlled rollout mechanisms.

In-memory enterprise data sanitization pipeline for PCI-compliant

PostgreSQL environments

CONFERENCE TALKS

Spoke at SCaLE 16x, 18x, 23x (most recent), pgCon.dev (Ottawa), Percona Live,

NYCPUG, CPOSC on various topics:

●

●

●

●

●

Postgres as an AI Control Plane: Building RAG + MCP Workflows Inside the

Database (2026)

Provisioning & Automating High Availability Postgres on AWS

Securing Your Data on Postgres

Data Mining for Beginners with Pandas

Think Your Postgres Backups & Disaster Recovery Are Safe? Let’s Talk.

EDUCATION

University of Maryland Baltimore County, Baltimore County MD —

Masters in Computer Science

September 2011 - December 2014

●

Research Assistant for NSF funded massive data processing & visualization

project for geologists

Panjab University, Chandigarh (India) — Bachelor of Engineering

in Computer Science

July 2007 - June 2011

CERTIFICATIONS

AWS Certified Solutions Architect

AWS Certified Cloud Practitioner

TECHNICAL BLOGS & LINKS

https://medium.com/@reliable-by-design

https://penningpence.blogspot.com

https://github.com/payals

Author

Title

Language

Your paste - Paste your paste here

AI Platform Engineer | LLM Systems Architect
OVERVIEW
AI Platform Engineer building production-grade LLM systems with strong
reliability guarantees.
10+ years in distributed systems and high-availability infrastructure, now
focused on AI control planes, RAG pipelines, routing, evaluation frameworks,
and observability.
Designed LLM orchestration systems with explicit validation, retry, rollback,
and commit semantics to reduce non-determinism and enforce correctness. Brings
deep experience in scaling, failure modeling, and SLO-driven production system
design.
SKILLS
AI Platform & LLM Systems: RAG architectures, LLM orchestration, control
planes, routing & fallback logic, evaluation frameworks (Langfuse, deepeval),
trace-level observability, prompt/version management, context engineering,
agentic workflows
Distributed Systems & Infrastructure: High-availability design, autoscaling
concepts, failure modeling, SLO/SLA enforcement, load balancing
(HAProxy/Nginx), Kubernetes, Docker, cloud-native architectures
(AWS/GCP/Azure), Rackspace, Terraform, Ansible, Chef, Linux, Solaris, ZFS
Systems Design & Leadership: Architecture Reviews, Design RFCs, Failure
Modeling, Incident Postmortems, Cross-team platform enablement
Languages: Python, SQL, Bash
Databases: PostgreSQL (expert: performance tuning, query optimization,
replication, partitioning, migrations, security), MySQL (strong: performance
tuning, optimization), CockroachDB, Cassandra, MS SQL Server, Oracle
EXPERIENCE
NetApp (Acquired), Remote — Senior Systems Architect
May 2022 - PRESENT
●
Designed and deployed PostgreSQL-backed LLM platform architectures using
RAG and Model Context Protocol (MCP), treating the database as an AI
control plane rather than a passive datastore. Improved internal
operational workflows and reduced failure rates.
●
Built and productionized multi-agent LLM pipelines using LangGraph,
modeling workflows as explicit state machines with validation, retry,
correction, and commit phases — increased reliability and contributed to
~2x YoY revenue growth.
●
Integrated end-to-end LLM observability and evaluation pipelines
(Langfuse, deepeval), enabling trace-level introspection, automated
scoring, regression detection, and feedback-driven iteration.
●
Defined system invariants, routing gates, and validation probes to
enforce correctness, reproducibility, and safe deployment of AI-assisted
database tooling in enterprise SaaS environments.
●
Applied distributed systems principles (idempotency, transactional
boundaries, rollback semantics, auditability) to AI workflows, reducing
silent failures and improving deployment safety.
●
Designed API-layer abstractions and LLM gateway components to support
routing, validation, and model versioning strategies.
●
Deployed & iterated LLM systems in cloud-native environments (AWS/Azure),
integrating storage, compute, and monitoring layers to support production
workloads.
Instaclustr (Acquired), Remote — Senior Database Reliability
Engineer
March 2021 - May 2022
●
●
●
PostgreSQL scalability expert for DoorDash, helping them scale their
Postgres RDS & Aurora systems to support rapid traffic growth during the
COVID-19 pandemic surge.
Led the team of database experts working on all aspects of DoorDash
database scalability, performance, and infrastructure overhaul
Provided ongoing schema design, query optimization, live production
schema change processes with minimal disruption, and disaster recovery
consulting playing a pivotal role in the company’s growth & operations
leading to a successful IPO
Credativ (Acquired), Remote — Senior Database Reliability
Engineer
January 2019 - March 2021
●
PostgreSQL & MySQL specialist for enterprise clients including DoorDash,
Etsy, National Geographic, Follow Up Boss, Gilt, Leafly, and Paperless
Post among others, maintaining up to 99.999% uptime for mission-critical
operations
●
Led large-scale PostgreSQL schema changes, migrations, major upgrades,
security patching, and performance optimization in high-transaction,
high-availability environments
●
●
24/7 oncall - Prevented critical outages during peak traffic through
proactive tuning, capacity planning, and failure modeling, contributing
to long-term contract renewals
Conducted comprehensive database performance & security audits for
numerous clients providing surgical, concrete solutions to complex issues
& pain points
OmniTI Computer Consulting, Remote — Data Engineer
June 2013 - January 2019
●
Re-designed OLAP ETL pipeline and data sanitization project for Gilt
Japan saving them $2+ million per year on infrastructure and resources
costs
●
●
●
●
Designed and implemented HA/DR strategies that reduced processing time by
65% and recovery time objectives (RTO) by 70%
24/7 PagerDuty oncall for multiple clients - debugging and solving
production critical performance and availability issues at all times of
the night
Built and contributed to several open source projects for database
monitoring, security, replication, observability, reporting,
partitioning, sharding, and migration and employed these tools to better
serve clients
Performed in-depth & thorough configuration tuning for Postgres and MySQL
as well as Linux & Solaris OS to improve database performance and
efficiency for multiple clients. Clients like Locally and Click2Ship
experienced drastic improvements (4x traffic with no issues) in scaling
for holidays & random spikes as a direct result of my endeavors
SELECTED PROJECTS
●
●
Enterprise AI platform for automated DBMS migration and code conversion
using multi-agent orchestration with full auditability, evaluation
pipelines, and feedback loops. Designed for safe migration of
heterogeneous workloads under strict correctness and traceability
guarantees.
Designed and implemented RAG + MCP financial analytics platform with
routing logic, persistence layer, and evaluation-driven pipelines for
reproducible AI-assisted analysis.
●
●
Built real-time streaming RAG + MCP pipeline as proof-of-concept for
reliable LLM deployment under latency constraints, integrating
observability and controlled rollout mechanisms.
In-memory enterprise data sanitization pipeline for PCI-compliant
PostgreSQL environments
CONFERENCE TALKS
Spoke at SCaLE 16x, 18x, 23x (most recent), pgCon.dev (Ottawa), Percona Live,
NYCPUG, CPOSC on various topics:
●
●
●
●
●
Postgres as an AI Control Plane: Building RAG + MCP Workflows Inside the
Database (2026)
Provisioning & Automating High Availability Postgres on AWS
Securing Your Data on Postgres
Data Mining for Beginners with Pandas
Think Your Postgres Backups & Disaster Recovery Are Safe? Let’s Talk.
EDUCATION
University of Maryland Baltimore County, Baltimore County MD —
Masters in Computer Science
September 2011 - December 2014
●
Research Assistant for NSF funded massive data processing & visualization
project for geologists
Panjab University, Chandigarh (India) — Bachelor of Engineering
in Computer Science
July 2007 - June 2011
CERTIFICATIONS
AWS Certified Solutions Architect
AWS Certified Cloud Practitioner
TECHNICAL BLOGS & LINKS
https://medium.com/@reliable-by-design
https://penningpence.blogspot.com
https://github.com/payals

Create Shorturl - Create a shorter url that redirects to your paste?

Private - Private paste aren't shown in recent listings.

Delete After - When should we delete your paste?

Resume - Systems Architect/Engineer

Reply to "Resume - Systems Architect/Engineer"