DevOps Engineer – Onsite 3 days

Job ID: 112526
Location: San Jose, California  [Hybrid]
Category: App/Dev
Employment Type: Contract
Date Added: 05/01/2026

Apply Now

Fill out the form below to submit your information for this opportunity. Please upload your resume as a doc, pdf, rtf or txt file. Your information will be processed as soon as possible.


 
 
 
 
 
(Word, PDF, RTF, TXT)
* Required field.

Role Summary
This position is a senior-level DevOps Engineer responsible for supporting and optimizing cloud-based collaboration platforms. The role involves operating, scaling, and maintaining observability platforms, Kubernetes environments, and automated deployment pipelines to ensure reliable and efficient large-scale distributed systems. The ideal candidate possesses extensive production experience, a strong operational discipline, and a focus on automation and reliability.

Responsibilities

  • Design, develop, and maintain observability platforms, including logging, metrics, and tracing solutions for web services.
  • Manage, operate, and optimize multi-region Kubernetes clusters to support high availability and scalability.
  • Own and enhance continuous integration and continuous delivery (CI/CD) pipelines utilizing Argo CD and Helm.
  • Implement infrastructure as code using Terraform on Amazon Web Services (AWS).
  • Operate monitoring and logging ecosystems such as OpenSearch or ELK, Prometheus, Grafana, Splunk, and Kafka.
  • Develop automation tools to proactively detect, troubleshoot, and resolve production issues.
  • Enforce security standards through vulnerability management, platform hardening, and compliance checks.
  • Collaborate with application, platform, and security teams to improve system reliability and performance.
  • Participate in on-call rotations and lead incident response activities to ensure rapid resolution of issues.
  • Contribute to system architecture design, operational best practices, and review processes for distributed systems.

Qualifications

  • Bachelor’s degree in Computer Science, Engineering, or a related technical field.
  • Minimum of eight years of experience in DevOps, Site Reliability Engineering, or platform engineering roles.
  • Extensive experience operating large-scale Kubernetes environments, with proficiency in container orchestration and resource tuning.
  • Hands-on expertise with Helm chart management, multi-cluster operations, and pod scheduling.
  • Strong knowledge of observability stacks such as OpenSearch/Elasticsearch, Prometheus/Mimir, Grafana, Loki, Splunk, or Logstash.
  • Proven experience designing ingestion pipelines, query optimization, and capacity planning for telemetry systems.
  • Proficiency with infrastructure as code tools like Terraform or Ansible on AWS.
  • Working knowledge of scripting and automation languages such as Python, Golang, or Bash.
  • Experience supporting 24/7 production environments, including incident management, alert triage, and post-incident review processes.
  • Ability to work in a fast-paced environment with strong problem-solving skills.

Publishing Pay Range: $41.16 – $43.68 hourly

This is a fully remote role and can be performed from an approved location.