From iwantchips, 1 Hour ago, written in Plain Text.
This paste will cross the great divide in 22 Hours.
Embed
  1. AI Platform Engineer | LLM Systems Architect
  2. OVERVIEW
  3. AI Platform Engineer building production-grade LLM systems with strong
  4. reliability guarantees.
  5. 10+ years in distributed systems and high-availability infrastructure, now
  6. focused on AI control planes, RAG pipelines, routing, evaluation frameworks,
  7. and observability.
  8. Designed LLM orchestration systems with explicit validation, retry, rollback,
  9. and commit semantics to reduce non-determinism and enforce correctness. Brings
  10. deep experience in scaling, failure modeling, and SLO-driven production system
  11. design.
  12. SKILLS
  13. AI Platform & LLM Systems: RAG architectures, LLM orchestration, control
  14. planes, routing & fallback logic, evaluation frameworks (Langfuse, deepeval),
  15. trace-level observability, prompt/version management, context engineering,
  16. agentic workflows
  17. Distributed Systems & Infrastructure: High-availability design, autoscaling
  18. concepts, failure modeling, SLO/SLA enforcement, load balancing
  19. (HAProxy/Nginx), Kubernetes, Docker, cloud-native architectures
  20. (AWS/GCP/Azure), Rackspace, Terraform, Ansible, Chef, Linux, Solaris, ZFS
  21. Systems Design & Leadership: Architecture Reviews, Design RFCs, Failure
  22. Modeling, Incident Postmortems, Cross-team platform enablement
  23. Languages: Python, SQL, Bash
  24. Databases: PostgreSQL (expert: performance tuning, query optimization,
  25. replication, partitioning, migrations, security), MySQL (strong: performance
  26. tuning, optimization), CockroachDB, Cassandra, MS SQL Server, Oracle
  27. EXPERIENCE
  28. NetApp (Acquired), Remote — Senior Systems Architect
  29. May 2022 - PRESENT
  30. Designed and deployed PostgreSQL-backed LLM platform architectures using
  31. RAG and Model Context Protocol (MCP), treating the database as an AI
  32. control plane rather than a passive datastore. Improved internal
  33. operational workflows and reduced failure rates.
  34. Built and productionized multi-agent LLM pipelines using LangGraph,
  35. modeling workflows as explicit state machines with validation, retry,
  36. correction, and commit phases — increased reliability and contributed to
  37. ~2x YoY revenue growth.
  38. Integrated end-to-end LLM observability and evaluation pipelines
  39. (Langfuse, deepeval), enabling trace-level introspection, automated
  40. scoring, regression detection, and feedback-driven iteration.
  41. Defined system invariants, routing gates, and validation probes to
  42. enforce correctness, reproducibility, and safe deployment of AI-assisted
  43. database tooling in enterprise SaaS environments.
  44. Applied distributed systems principles (idempotency, transactional
  45. boundaries, rollback semantics, auditability) to AI workflows, reducing
  46. silent failures and improving deployment safety.
  47. Designed API-layer abstractions and LLM gateway components to support
  48. routing, validation, and model versioning strategies.
  49. Deployed & iterated LLM systems in cloud-native environments (AWS/Azure),
  50. integrating storage, compute, and monitoring layers to support production
  51. workloads.
  52. Instaclustr (Acquired), Remote — Senior Database Reliability
  53. Engineer
  54. March 2021 - May 2022
  55. PostgreSQL scalability expert for DoorDash, helping them scale their
  56. Postgres RDS & Aurora systems to support rapid traffic growth during the
  57. COVID-19 pandemic surge.
  58. Led the team of database experts working on all aspects of DoorDash
  59. database scalability, performance, and infrastructure overhaul
  60. Provided ongoing schema design, query optimization, live production
  61. schema change processes with minimal disruption, and disaster recovery
  62. consulting playing a pivotal role in the company’s growth & operations
  63. leading to a successful IPO
  64. Credativ (Acquired), Remote — Senior Database Reliability
  65. Engineer
  66. January 2019 - March 2021
  67. PostgreSQL & MySQL specialist for enterprise clients including DoorDash,
  68. Etsy, National Geographic, Follow Up Boss, Gilt, Leafly, and Paperless
  69. Post among others, maintaining up to 99.999% uptime for mission-critical
  70. operations
  71. Led large-scale PostgreSQL schema changes, migrations, major upgrades,
  72. security patching, and performance optimization in high-transaction,
  73. high-availability environments
  74. 24/7 oncall - Prevented critical outages during peak traffic through
  75. proactive tuning, capacity planning, and failure modeling, contributing
  76. to long-term contract renewals
  77. Conducted comprehensive database performance & security audits for
  78. numerous clients providing surgical, concrete solutions to complex issues
  79. & pain points
  80. OmniTI Computer Consulting, Remote — Data Engineer
  81. June 2013 - January 2019
  82. Re-designed OLAP ETL pipeline and data sanitization project for Gilt
  83. Japan saving them $2+ million per year on infrastructure and resources
  84. costs
  85. Designed and implemented HA/DR strategies that reduced processing time by
  86. 65% and recovery time objectives (RTO) by 70%
  87. 24/7 PagerDuty oncall for multiple clients - debugging and solving
  88. production critical performance and availability issues at all times of
  89. the night
  90. Built and contributed to several open source projects for database
  91. monitoring, security, replication, observability, reporting,
  92. partitioning, sharding, and migration and employed these tools to better
  93. serve clients
  94. Performed in-depth & thorough configuration tuning for Postgres and MySQL
  95. as well as Linux & Solaris OS to improve database performance and
  96. efficiency for multiple clients. Clients like Locally and Click2Ship
  97. experienced drastic improvements (4x traffic with no issues) in scaling
  98. for holidays & random spikes as a direct result of my endeavors
  99. SELECTED PROJECTS
  100. Enterprise AI platform for automated DBMS migration and code conversion
  101. using multi-agent orchestration with full auditability, evaluation
  102. pipelines, and feedback loops. Designed for safe migration of
  103. heterogeneous workloads under strict correctness and traceability
  104. guarantees.
  105. Designed and implemented RAG + MCP financial analytics platform with
  106. routing logic, persistence layer, and evaluation-driven pipelines for
  107. reproducible AI-assisted analysis.
  108. Built real-time streaming RAG + MCP pipeline as proof-of-concept for
  109. reliable LLM deployment under latency constraints, integrating
  110. observability and controlled rollout mechanisms.
  111. In-memory enterprise data sanitization pipeline for PCI-compliant
  112. PostgreSQL environments
  113. CONFERENCE TALKS
  114. Spoke at SCaLE 16x, 18x, 23x (most recent), pgCon.dev (Ottawa), Percona Live,
  115. NYCPUG, CPOSC on various topics:
  116. Postgres as an AI Control Plane: Building RAG + MCP Workflows Inside the
  117. Database (2026)
  118. Provisioning & Automating High Availability Postgres on AWS
  119. Securing Your Data on Postgres
  120. Data Mining for Beginners with Pandas
  121. Think Your Postgres Backups & Disaster Recovery Are Safe? Let’s Talk.
  122. EDUCATION
  123. University of Maryland Baltimore County, Baltimore County MD —
  124. Masters in Computer Science
  125. September 2011 - December 2014
  126. Research Assistant for NSF funded massive data processing & visualization
  127. project for geologists
  128. Panjab University, Chandigarh (India) — Bachelor of Engineering
  129. in Computer Science
  130. July 2007 - June 2011
  131. CERTIFICATIONS
  132. AWS Certified Solutions Architect
  133. AWS Certified Cloud Practitioner
  134. TECHNICAL BLOGS & LINKS
  135. https://medium.com/@reliable-by-design
  136. https://penningpence.blogspot.com
  137. https://github.com/payals