DevOps Engineer – Onsite 3 days
Job ID: 112527
Location: San Jose, California [Hybrid]
Category: App/Dev
Employment Type: Contract
Date Added: 05/01/2026
Role Summary
This role involves managing the availability, performance, and scalability of backend services within a large-scale SaaS collaboration platform. As a Site Reliability Engineer, the focus is on supporting cloud and hybrid environments through automation, operational best practices, and incident management. The position requires a proactive approach to ensure service reliability and continuous improvement, with regular onsite presence required in the designated location.
Responsibilities
- Own deployment, operation, and reliability of core collaboration services across cloud and hybrid environments
- Design, improve, and automate CI/CD pipelines and frameworks, including AI-driven deployment, monitoring, and incident response tools
- Lead complex production incident response activities, perform root cause analysis, and implement long-term reliability enhancements
- Utilize observability and operational data to assist with capacity planning, system scaling, and resource optimization
- Establish operational best practices, documentation standards, and promote a culture of reliability and accountability
- Collaborate with development teams to integrate reliability practices into software deployment processes
- Maintain and improve monitoring and alerting systems to identify and resolve issues proactively
- Drive automation initiatives to streamline operations and reduce manual intervention
- Support on-call duties, ensuring rapid response to production issues and minimizing downtime
- Continuously evaluate new technologies and methodologies to enhance system reliability and operational efficiency
Qualifications
- Bachelor’s degree in Computer Science, Engineering, or related field, or equivalent work experience
- Three to five years of experience in Site Reliability Engineering, Cloud Operations, or Systems Engineering roles
- Practical experience operating production services with Docker and Kubernetes in cloud or hybrid environments
- Proficiency in scripting or programming languages such as Python, Go, or Bash for automation tasks
- Experience with monitoring, observability tools, incident management, and post-incident reviews
- Strong understanding of Linux systems, networking, distributed systems, CI/CD pipelines, infrastructure as code, and version control with Git
- Excellent problem-solving skills with the ability to handle high-pressure situations effectively
- Effective communication skills to collaborate with cross-functional teams and document processes
- Availability to work in a hybrid environment with onsite presence required in San Jose three days per week
Publishing Pay Range: $36.09 – $39.82 hourly
This position offers a hybrid schedule, with time split between the office and remote work.
