Roblox - San Mateo, CA

posted 4 days ago

Full-time - Senior
San Mateo, CA
Administrative and Support Services

About the position

At Roblox, we're building the tools and platform that empower our community to bring any experience that they can imagine to life. As a Senior Software Engineer, Application Observability, you will collaborate with data scientists, product managers, and leaders across the company to ensure app quality by working across multiple teams to build and scale a robust anomaly detection system for Roblox. Your work will lay the essential foundation for app quality, driving a great user experience and supporting Roblox's business growth.

Responsibilities

  • Tailor and consolidate real-time detection and root-cause analysis solutions as needed, covering every stage - from code merge, build, and deploy to rollout, full release, and ongoing monitoring.
  • Collaborate with pods and teams to define and standardize key metrics, their definitions, and the necessary dimensions essential to release and app health observability.
  • Operationalize and scale metrics monitoring by enabling quick slicing and dicing of data across different types of releases and roll-outs, as well as root-cause analysis.
  • Work with Data Scientists and ML Modelers to fine-tune severity thresholds for production incidents based on business impact insights from experiments and causal learning.
  • Collaborate with engineers to detect and investigate abnormal trends, while proactively and continuously identifying new opportunities to improve visibility, alerting, tooling, and processes throughout the entire release and app health observability lifecycle.
  • Be a technical leader for the team and mentor junior engineers and help recruit future talent for the team.

Requirements

  • Expertise in back-end software engineering with 6+ years of experience building scalable, distributed systems.
  • Experience with monitoring and observability for large, consumer-facing applications with an ideal focus on client-side monitoring.
  • Comfort rolling up your sleeves and diving into client and application code when needed.
  • Experience with defining the correct charts, alerts and queries to understand the health of large applications and systems.
  • Proficiency in the incident response process.
  • Excitement to collaborate with multiple teams and build long-term solutions for all of Roblox.
  • An understanding of statistics, and familiarity with deploying and running ML models at large scale a strong plus.

Benefits

  • Industry-leading compensation package
  • Excellent medical, dental, and vision coverage
  • A rewarding 401k program
  • Flexible vacation policy (varies by exemption status)
  • Roflex - Flexible and supportive work policy
  • Roblox Admin badge for your avatar
  • Free catered lunches five times a week and several fully stocked kitchens with unlimited snacks
  • Onsite fitness center and fitness program credit
  • Annual CalTrain Go Pass
Job Description Matching

Match and compare your resume to any job description

Start Matching
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service