What you'll do
Do you have a passion for solving complex technical problems and ensuring high availability of critical systems? Are you eager to leverage your expertise to support Regnology's industry-leading software solutions? If so, join our SRE team as a SRE Engineer!
Regnology, a leading provider of software solutions for the financial services industry, is searching for a talented SRE Engineer to join our growing team. In this role, you will play a vital role in diagnosing, resolving, and preventing production incidents impacting our clients' operations.
Regnology, a leading provider of software solutions for the financial services industry, is searching for a talented SRE Engineer to join our growing team. In this role, you will play a vital role in diagnosing, resolving, and preventing production incidents impacting our clients' operations.
- Act as the escalation point for Level 1 support, performing in-depth analysis and troubleshooting of complex technical issues related to Regnology's solutions.
- Collaborate with development teams to identify and resolve underlying code defects or design flaws contributing to incidents.
- Implement effective solutions to restore service functionality and minimize downtime during incidents.
- Proactively identify potential issues by monitoring system health, analyzing logs, and implementing preventative measures.
- Participate in incident reviews to identify root causes, document lessons learned, and contribute to continuous improvement of SRE processes.
- Work closely with Level 1 support engineers to ensure knowledge transfer and maintain consistent support quality.
- Stay up-to-date on Regnology's products, infrastructure, and industry best practices in SRE.
- Strong focus on automation:
- Develop and maintain automation scripts and tools to streamline incident response, system monitoring, and operational tasks.
- Utilize CI/CD pipelines to automate the deployment, monitoring, and rollback of applications and infrastructure.
- Create and manage automated workflows for system maintenance, configuration management, and application updates.
- Use configuration management tools (e.g., Puppet, Chef) to automate system configurations and ensure compliance.
- Leverage automated testing frameworks to validate the stability and performance of infrastructure and applications.
- Implement automated log analysis and alerting systems to detect anomalies and potential issues proactively.