Senior Devops Engineer
Who we are:
Welcome to BookedBy, an industry-leading business management solution and scheduling software for salons, spas, and barbershops everywhere. BookedBy — with headquarters in Austin, TX — features more than 100 employees across three continents and powers thousands of locations worldwide with top brands such as Sport Clips Haircuts, Diesel Barbershop, Perfect Look, Sharkey’s Cuts for Kids, Hairzoo, and more. Founded in 2011, BookedBy’s scheduling platform has more than 60 million bookings annually and is expanding into other service-based industries.
Job Summary:
We are seeking a Senior DevOps Engineer to help scale and operate the infrastructure behind BookedBy’s platform. You will design, implement, and maintain AWS cloud environments, manage Kubernetes-based services, and build automation frameworks to ensure our systems are highly available, secure, and performant. You will collaborate closely with software engineers to improve developer productivity, streamline CI/CD with GitLab and ArgoCD, and bring operational excellence to production systems. You’ll also leverage AI tools to automate repetitive tasks, accelerate incident response, and continuously optimize system performance.
Key Responsibilities:
- Design, build, and maintain AWS cloud infrastructure supporting BookedBy’s applications.
- Manage and improve Kubernetes clusters (updates, scaling, monitoring, lifecycle management).
- Build and optimize CI/CD pipelines with GitLab and ArgoCD, supporting versioning, rollbacks, and progressive delivery (canary, blue/green).
- Automate infrastructure provisioning and management using Infrastructure-as-Code (Terraform for infrastructure, Helm for Kubernetes applications, ArgoCD for GitOps-driven deployments).
- Enhance observability with metrics, logs, and tracing (e.g., Prometheus, Grafana, Datadog, ELK).
- Implement and maintain security best practices, including continuous audits and compliance readiness (e.g., SOC2).
- Support application migration to Kubernetes and modernization efforts by deprecating legacy deployment tooling.
- Troubleshoot and resolve complex infrastructure and deployment issues in production.
- Write and maintain runbooks and operational documentation to improve reliability and response.
- Use AI-driven monitoring, anomaly detection, and automation to improve reliability and efficiency.
- Participate in an on-call rotation, ensuring fast, effective incident response.
Qualifications & Skills:
- 5+ years of DevOps / SRE / Platform Engineering experience.
- Strong expertise with AWS services (EC2, RDS, S3, Route53, ELB/ALB, IAM, networking).
- Hands-on experience managing Kubernetes clusters.
- Proficiency with containers (Docker) and orchestration patterns.
- Strong knowledge of Terraform (infrastructure), Helm (Kubernetes packaging), and ArgoCD (GitOps deployments).
- Experience building and maintaining CI/CD pipelines (GitLab preferred, but familiarity with Jenkins, GitHub Actions, or Bamboo is welcome).
- Familiarity with observability stacks (Prometheus, Grafana, Datadog, ELK).
- Strong scripting/automation skills (Python, Bash, Ruby).
- Proficiency with Linux and command-line troubleshooting.
- Solid understanding of distributed systems, scaling, and cloud security practices.
- Strong collaboration and communication skills.
- Excitement about applying AI to automate workflows, predict incidents, and enhance operational efficiency.