- Job Ref: 4771
- Location: Dublin, Ireland
- Type: Permanent
System Administration & Site Reliability
- Using common orchestration tools to manage and improve infrastructure (e.g. Terraform, Ansible, Puppet, etc.)
- Participate in the operations on-call rotation, triaging and addressing production issues as they arise.
- Contribute to internal tools that help us improve our operations processes, manage our infrastructure, and scale our systems.
- Whiteboard a fix to a scaling problem — and then make it happen.
- Install new / rebuild existing servers and configure hardware, peripherals, services, settings, directories, storage, etc. in accordance with standards and project/operational requirements.
Operations and Support
- Perform daily system monitoring, verifying the integrity and availability of all hardware, server resources, systems and key processes, reviewing system and application logs, and verifying completion of scheduled jobs such as backups.
- Perform regular security monitoring to identify intrusion patterns.
- Perform daily backup operations, including restorative testing.
- Resource utilisation monitoring and solution recommendation.
- Manage user provisioning and automated provisioning systems.
- Provide escalation engineering support to other teams.
- Repair and recover from hardware or software failures. Coordinate and communicate with impacted constituencies.
- Assist in applying OS patches and upgrades on a regular basis, and upgrade administrative tools and utilities. Configure / add new services as necessary.
- Contribute to system configuration and asset management applications.
What skills do I need?
- Bachelor (4-year) degree, with a technical major, such as engineering or computer science.
- At least three years production system administration/SRE experience.
- At least two years serving a large-scale SaaS web application solution with AWS, or similar cloud provider.
- You are able to analyze and optimize performance in high-traffic internet applications.
- Thorough understanding of common Internet protocols (e.g. HTTP, DNS, SMTP).
- Familiarity with APIs used for monitoring, management, user provisioning, and SSO.
- Ability to solve complex, high-impact problems.
- Ability to digest and discuss issues/solutions with team members that may not be familiar with such terminology/technologies
- Excellent communication skills, team player.