Site Reliability Engineer

Remote
Full Time
GSA
Experienced
Location: Remote

About the company:  
At VivSoft, we aim to solve complex federal problems using emerging and open technologies in a collaborative and rewarding environment. VivSoft is a diverse team of strategists, engineers, designers, and creators experienced in building high performance effective softwares, with impactful organizational design and organizational dynamics for software delivery. We build secure Software Factories based on DoD reference designs and NIST Frameworks for Cloud and DevSecOps. These factories deliver AI/ML Applications, Data Science Platforms, Blockchain and Microservices for DoD, Healthcare and Civilian Agencies

Job Summary:

We are seeking a highly skilled Site Reliability Engineer (SRE) to oversee the production environment, ensuring system health, availability, and performance optimization. In this role, you will develop and maintain software and systems for platform infrastructure, automate operations, and enhance software delivery speed and reliability. You will collaborate with development teams to implement rigorous testing, improve service quality, and support scalable system architecture. This position requires strong expertise in cloud architecture, DevSecOps, and microservices, with leadership experience in driving high-performing teams.

Responsibilities
  • Oversee the production environment by monitoring system health, availability, and proactively addressing issues.
  • Develop and maintain software and systems to operate and manage platform infrastructure and applications.
  • Enhance the reliability, quality, and delivery speed of software solutions through performance optimization and continuous improvement.
  • Conduct performance testing and usability evaluations to ensure production readiness for new releases.
  • Collect and analyze metrics from both operating systems and applications to support performance tuning and troubleshooting.
  • Collaborate with development teams to improve service quality by implementing rigorous testing and release processes.
  • Engage in system design consulting, capacity planning, and platform management to support scalable system architecture.
  • Build sustainable, automated systems and services, balancing feature development speed with reliability according to defined service level objectives.
  • Conduct root cause analyses (RCAs) for production incidents and post-incident reviews to improve future resilience.
  • Streamline and optimize on-call rotations and processes to enhance operational efficiency.
  • Maintain comprehensive documentation and runbooks to support reliable system operation.

Qualifications
  • Bachelor’s or Master’s degree in Computer Science or a related technical, scientific field.
  • Candidate must be willing to obtain Public Trust Clearance
  • Proficiency in programming with one or more high-level languages such as Python, Java, C/C++, Ruby, or JavaScript.
  • Experience with cloud storage technologies and dynamic resource management frameworks (e.g., Mesos, Kubernetes, Yarn).
  • 5+ years of experience in Cloud Architecture, preferably with AWS.
  • 10+ years of experience in operations of enterprise systems with over a million users.
  • 10+ years of experience in application development.
  • 5+ years of experience in DevSecOps, with a proactive approach to identifying improvements and bottlenecks.
  • 3+ years of experience in microservices architecture.
  • 5+ years of experience leading teams, fostering collaborative and high-performing environments.
  • 3+ years of experience with Agile methodologies and practices.

Benefits:
  • Comprehensive Medical, Dental, and Visions Plans (Healthcare benefits are 100% employer-paid for employees only)
  • Life Insurance
  • Paid Time Off (Flexible/Combined PTO, Bereavement Leave, 11 Company Paid Holidays)
  • 401K Retirement Plan with employer match
  • Professional Development Training Reimbursement
  • Flexible/remote work schedules
Target salary range: $140,001 - $170,000.
Share

Apply for this position

Required*
We've received your resume. Click here to update it.
Attach resume as .pdf, .doc, .docx, .odt, .txt, or .rtf (limit 5MB) or Paste resume

Paste your resume here or Attach resume file

Human Check*