DevOps Engineer – Onsite 3 days

Job ID: 112527
Location: San Jose, California  [Hybrid]
Category: App/Dev
Employment Type: Contract
Date Added: 05/01/2026

Apply Now

Fill out the form below to submit your information for this opportunity. Please upload your resume as a doc, pdf, rtf or txt file. Your information will be processed as soon as possible.


 
 
 
 
 
(Word, PDF, RTF, TXT)
* Required field.

Role Summary
This role involves managing the availability, performance, and scalability of backend services within a large-scale SaaS collaboration platform. As a Site Reliability Engineer, the focus is on supporting cloud and hybrid environments through automation, operational best practices, and incident management. The position requires a proactive approach to ensure service reliability and continuous improvement, with regular onsite presence required in the designated location.

Responsibilities

  • Own deployment, operation, and reliability of core collaboration services across cloud and hybrid environments
  • Design, improve, and automate CI/CD pipelines and frameworks, including AI-driven deployment, monitoring, and incident response tools
  • Lead complex production incident response activities, perform root cause analysis, and implement long-term reliability enhancements
  • Utilize observability and operational data to assist with capacity planning, system scaling, and resource optimization
  • Establish operational best practices, documentation standards, and promote a culture of reliability and accountability
  • Collaborate with development teams to integrate reliability practices into software deployment processes
  • Maintain and improve monitoring and alerting systems to identify and resolve issues proactively
  • Drive automation initiatives to streamline operations and reduce manual intervention
  • Support on-call duties, ensuring rapid response to production issues and minimizing downtime
  • Continuously evaluate new technologies and methodologies to enhance system reliability and operational efficiency

Qualifications

  • Bachelor’s degree in Computer Science, Engineering, or related field, or equivalent work experience
  • Three to five years of experience in Site Reliability Engineering, Cloud Operations, or Systems Engineering roles
  • Practical experience operating production services with Docker and Kubernetes in cloud or hybrid environments
  • Proficiency in scripting or programming languages such as Python, Go, or Bash for automation tasks
  • Experience with monitoring, observability tools, incident management, and post-incident reviews
  • Strong understanding of Linux systems, networking, distributed systems, CI/CD pipelines, infrastructure as code, and version control with Git
  • Excellent problem-solving skills with the ability to handle high-pressure situations effectively
  • Effective communication skills to collaborate with cross-functional teams and document processes
  • Availability to work in a hybrid environment with onsite presence required in San Jose three days per week

Publishing Pay Range: $36.09 – $39.82 hourly

This position offers a hybrid schedule, with time split between the office and remote work.