Valiantica - New York, NY

posted 3 months ago

Full-time - Mid Level
New York, NY
Professional, Scientific, and Technical Services

About the position

The Production Engineer position is a hybrid role based in New York, NY, offering a contract-to-hire opportunity. The selected candidate will be responsible for monitoring and resolving system errors and disruptions, documenting resolutions, and managing incidents according to the ITIL lifecycle. This role requires liaising with upstream data providers to address issues and responding to user inquiries and operations requests. The engineer will also prepare and present stability reports, analyze alert and stability trends, and make recommendations based on their findings. Investigating the root causes of issues and educating developers on these causes to mitigate future occurrences is a key aspect of the role. In addition to troubleshooting and resolving issues across the entire stack—including hardware, software, application, and network—the Production Engineer will mentor other Site Reliability Engineers (SREs) on best practices for monitoring and troubleshooting complex code and database issues. The role involves identifying opportunities to improve automation within the company, scoping, and creating automation for deployment, management, and visibility of services. The engineer will represent the SRE organization in design reviews and operational readiness exercises for both new and existing services, and will participate in an on-call rotation and periodic conference calls with specialists from other time zones. The ideal candidate will have hands-on experience with UNIX and SQL-based databases, as well as three-tier support experience with databases such as IBM, DB2, Sybase, Mongo, Green Plum, and KDB. Excellent analytical and communication skills, a problem-solving mindset, and the ability to prioritize tasks are essential for success in this role. Familiarity with financial products, including equity and fixed income, and the various risks associated with investment banking is also important. The candidate should be able to contribute to system design and architecture with a strong database knowledge.

Responsibilities

  • Monitor and resolve system errors and disruptions, documenting resolutions and managing incidents as per ITIL lifecycle.
  • Liaise with upstream data providers to resolve issues and respond to user inquiries and operations requests.
  • Prepare and present stability reports and analyze alert and stability trends, making recommendations based on findings.
  • Investigate root causes of issues and educate developers to mitigate future occurrences.
  • Automate resolution of common problems, routine investigations, and user requests using scripts or programming platforms.
  • Lead reliability or business-driven projects and perform reliability engineering.
  • Work closely with engineering/development teams to design, build, and maintain systems, advising on product selection, schema design, and query tuning.
  • Troubleshoot issues across the entire stack: hardware, software, application, and network.
  • Mentor other SREs on standard methodologies for monitoring and troubleshooting complex code and database issues.
  • Identify and drive opportunities to improve automation for deployment, management, and visibility of services.
  • Represent the SRE organization in design reviews and operational readiness exercises for new and existing services.
  • Participate in on-call rotation and periodic conference calls with specialists from other time zones.

Requirements

  • 3-5 years of hands-on experience with UNIX.
  • Hands-on experience with SQL-based databases.
  • Three-tier support experience with databases such as IBM, DB2, Sybase, Mongo, Green Plum, and KDB.
  • Excellent analytical and communication skills.
  • Ability to prioritize tasks and willingness to take ownership.
  • Problem-solving mindset and solution enabler.
  • Strong troubleshooting and debugging abilities.
  • Familiarity with financial products like equity and fixed income, and understanding of risks in investment banking.
  • Ability to contribute to system design and architecture with strong database knowledge.

Nice-to-haves

  • Knowledge of automation-related activities using scripting languages such as Python, Bash, Perl, or Ruby.
  • Experience using enterprise tools such as App Dynamic, Grafana, Splunk, or Dynatrace.
  • Awareness of modern software and systems architectures, including load-balancing, queueing, caching, and microservices.
  • Deep understanding of operating system-level concepts such as processes, memory allocation, and the network stack.
  • Practical experience running large-scale online systems.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service