Advanced Software Talent - South San Francisco, CA
posted 2 months ago
As the Delivery Lead, you will be the driving force behind building and maintaining robust site reliability engineering (SRE) functions for our growing data organization. This role is pivotal in ensuring smooth operations and high availability of our data infrastructure and services. You will act as the primary point of contact for all SRE-related activities, which requires a blend of technical expertise, project management skills, and a passion for data-driven solutions. Your leadership will be essential in developing comprehensive support and SRE processes from the ground up, ensuring that our systems are not only functional but also optimized for performance and reliability. In this position, you will collaborate closely with data engineers, scientists, and IT teams to ensure seamless integration and optimal performance of data systems. You will own the incident management process, which includes responding to incidents, troubleshooting issues, and resolving them efficiently to minimize downtime and impact on data operations. Establishing proactive monitoring and alerting mechanisms will be a key responsibility, allowing you to identify and address potential issues before they escalate into significant problems. Your role will also involve continuous performance tuning of our systems to meet the evolving needs of the data organization. You will drive automation initiatives aimed at streamlining support workflows and improving overall efficiency. Additionally, maintaining thorough documentation of processes, procedures, and system configurations will be crucial to ensure clarity and consistency across the team.