Site Reliability Engineer
We’re looking for Site Reliability Engineers (SREs) who can help us design, build, and maintain high-performance, scalable, reliable services. You will partner with our engineering teams to help make the services more performant, scalable, observable, and reliable. We believe every engineering team at AfterShip should be responsible for the software they build, and SREs play a critical part in providing the tools, practices, and expertise to make that happen.
SRE is not system administrator / system operator - you get to code as well. We are looking for a bright site reliability engineer to join AfterShip's team.
Responsibilities
- Keep services up and running with their high reliability and response speeds to meet our SLAs
- Evaluate and decide which SaaS and infrastructure stacks should be used
- Manage and optimize SaaS and infrastructure usage and cost
- Define and enforce security best practice for the whole company
- Investigate and respond to infrastructure outages with engineering team
- Design and implement automated architecture so that it is abstracted away from product engineering teams.
Requirements
- 2+ years experienced with Amazon Web Services or Google Cloud Platform
- Excellent password hygiene and have good sense of identity management
- Strong understanding of servers, networking, storage, Linux system administration
- Production experience with DNS, load balancing, failover strategies, Blue-Green and Canary deployments
- Ability to setup automated monitoring and alerting systems
- Care about up-time and service level objectives
- Production experience with log aggregation, analysis and troubleshooting
- Ability to communicate with multiple teams and IaaS / SaaS vendors effectively
What we are using
- Cloud: Amazon Web Services, Google Cloud Platform
- Monitoring: New Relic, Pingdom, Runscope, StatusPage
- Database: MongoDB, DynamoDB, Amazon Aurora, Redis
- Queue: Beanstalk
- Major programming languages: Node.js
- Continuous Integration: Travis CI, Bitrise
- Automation: Ansible, Packer
- Tools: Atlassian Suite
- Other SaaS: Algolia, Mixpanel, SendGrid, Twilio
Benefits
- 5-day work week
- Flexible work hours
- 15-day annual leaves
- Performance bonus
- Medical / Dental Insurance
- Unlimited work from home policy
- Work Visa Sponsorship
- Open startup culture
- International teams
- Friday sharing & happy hours
- Internal hackathon
Apply Now