Cloud Engineer

Job ID: 112574
Location: Plano, Texas  [Hybrid]
Category: Infrastructure
Employment Type: Contract
Date Added: 05/11/2026

Apply Now

Fill out the form below to submit your information for this opportunity. Please upload your resume as a doc, pdf, rtf or txt file. Your information will be processed as soon as possible.


 
 
 
 
 
(Word, PDF, RTF, TXT)
* Required field.

Role Summary

A senior Cloud Engineer with expertise in building and managing scalable observability and infrastructure platforms for enterprise-level cloud microservices environments. This hybrid role demands hands-on experience with container orchestration, cloud infrastructure automation, and high-volume monitoring systems. The engineer will own end-to-end components, support production operations, and leverage AI tools for system troubleshooting and code generation.

Responsibilities

  • Design, develop, and operate observability platforms enabling logging, metrics collection, and tracing for cloud-based microservices applications.
  • Manage and optimize large-scale Kubernetes clusters across multiple regions, including Helm chart management, pod scheduling, and resource tuning.
  • Own and maintain CI/CD pipelines using tools such as Argo CD, Helm, and GitOps methodologies to ensure reliable deployment workflows.
  • Implement Infrastructure as Code (IaC) solutions utilizing Terraform on AWS to provision and manage cloud infrastructure at scale.
  • Operate and maintain monitoring ecosystems including OpenSearch/Elasticsearch, Prometheus, Grafana, Splunk, and Kafka, ensuring high availability and performance.
  • Develop automation solutions to detect, respond, and remediate production issues proactively.
  • Ensure security and compliance by managing vulnerability patching and automating security best practices in container environments.
  • Collaborate with cross-functional teams to improve system reliability, scalability, and performance, contributing to distributed system design.
  • Participate in on-call rotations, incident response, and post-incident analysis to uphold SLA commitments.
  • Utilize AI-assisted coding and troubleshooting tools to accelerate system development, automation, and incident resolution.

Qualifications

  • Bachelor's degree in Computer Science, Information Technology, or related field.
  • Minimum of 8 years of experience in DevOps, SRE, or platform engineering roles supporting production cloud environments.
  • Proven incident response experience, including alert triage, root cause analysis, and SLA management in 24/7 operations.
  • Expertise in Infrastructure as Code principles with proficiency in Terraform, Ansible, or similar automation tools for cloud provisioning.
  • Strong scripting skills in Python, Golang, or Bash for automation, tooling, and CI/CD pipeline integration.
  • Extensive experience operating and troubleshooting large-scale Kubernetes workloads, including Helm chart management and multi-cluster orchestration.
  • Hands-on knowledge of observability stacks such as OpenSearch, Prometheus, Grafana, Loki, and Splunk, including query optimization and capacity planning.
  • Familiarity with Kafka and AWS MSK, including cluster operation, topic configuration, and schema management.
  • Experience deploying, managing, and migrating Splunk Enterprise environments with Kubernetes-based log shipping architectures.
  • Working knowledge of OpenTelemetry, distributed tracing, and application performance monitoring in cloud environments.
  • Understanding of security frameworks, container hardening practices, and vulnerability remediation at scale, including standards such as FedRAMP, STIG, IL5, ISO 27001, and SOC 2.
  • Experience using AI tools like LLMs, GitHub Copilot, or custom AI agents to enhance operational workflows and incident management.
  • Effective communication skills and the ability to work independently in a hybrid work setting.

Publishing Pay Range: $65.00 – $67.00 hourly

This position offers a hybrid schedule, with time split between the office and remote work.