Back to search

Manager III Cloud Reliability

Retail Business Services is the services company of leading grocery retail group Ahold Delhaize USA, providing services to five East Coast grocery brands: Food Lion, Giant Food, The GIANT Company, Hannaford and Stop & Shop. Retail Business Services leverages the size and scale of the local brands to and provides industry-leading expertise, insights and analytics to local brands to support their strategies. We are committed to diversity, equity and inclusion and we foster a community of belonging where everyone is valued. For more information, visit


Primary Purpose: 

RBS Cloud Reliability group is looking for an experienced Engineering Manager to help lead, build, and coach the team responsible for Platform reliability engineering for Azure Cloud Platform. As a leader of this group, you will set the vision, leading reliability initiatives, and governance functions with internal team members and managed service providers to provide best-in-class support for internal cloud services customers.

If you are an Engineering leader passionate about reliability, have a consistent track record of building healthy, highly performant teams, have experience leading large-scale fault-tolerant systems, care about metrics and operational excellence then this role is for you.

The Platform Reliability Engineering Team is responsible for providing Incident Management, Observability and Reliability engineering consultation across the organization as well as providing troubleshooting assistance on high-impact incidents which the application teams are unable to solve

You will work with the engineering and product teams to ensure we have a long-term technical vision in place, support the team in developing and delivering on their objectives, and will nurture a customer-centric culture that is inclusive both internally and externally.


Duties and Responsibilities:   

Build and run support for cloud solutions that includes Core Azure Services, Container platforms, Networking, Security, Cost management, Operating systems, Web applications and data services.
Build and manage a team of engineers across many time zones who work to analyze and maintain service stability by documenting policies in a 24/7/365 operation.
Manage the customer experience and oversee daily operations, including escalations, logistics, operations support, space usage, budget support, future-proofing, and guidelines.
Develop, own, and execute on a roadmap that addresses our immediate challenges and maps an incremental approach to longer-term reliability, automation, and instrumentation goals
Partner with Cloud Platform Engineering to identify and implement automation opportunities, efficiency in process to improve reliability, observability, and operations
Design and implement tools that help product teams focus on shipping features, while making sure we build infrastructure that is cost efficient, secure, and reliable.
Provide consultation to development and product teams to help them build reliable and scalable services, and resolve any production issues as quickly as possible
Lead projects for disaster recovery, automated failure recovery, capacity planning, high availability, and scaling
Helping us shape a DevOps culture, and foster its adoption
Stay abreast of the latest SRE methodologies, and skillfully adopt the appropriate ones for cloud platform
Foster innovation within the team, and join others manifesting the new SRE discipline for cloud platform
Take an active role in driving and evolving the roadmap for the SRE Organization: particularly in the areas of infrastructure automation, observability, and AI Ops
Execute various solution areas leveraging the Cloud FinOps operating model around Cloud governance, spend management, migrations, and modernizations as part of FinOps.
Provide input and tracking of cloud costs to the of overall financial budgets, forecasts, and actuals
Drive FinOps value by helping customers in understanding their cloud spend based on their business goals and budget
Conducting risk assessments of security controls as they pertain to enterprise IT assets and related potential business impact
Excellent stakeholder management skills and a proven ability to build strong relationships and trust throughout the organization, including with senior leadership
Plan and mange departmental budget, budget forecasting, chargeback, and performance reviews of associates
Contribute to team culture and recruiting by leading activities to attract and retain top talent and mentoring and developing junior product associates
Collaborate with Solution architecture, Platform engineering, Managed service providers and Product teams for delivering solutions
A highly collaborative leader that is capable of formulating and advocating for a clear, impactful platform vision and strategy and working cross-functionally to deliver on that roadmap.



Bachelor’s Degree in Computer Science, Information Technology, Engineering, or related field
10+ years’ experience in Infrastructure technology solutions, DevOps, Agile development, architecture, consulting, and/or cloud/infrastructure technologies
5+ years of experience leading, managing, supporting, maintaining, and automating private and public cloud environments
3+ years in management roles, managing resources, projects, and budgets, forecasts, and chargeback
3+ years of experience using IaC tools (ARM, Terraform, JSON, YAML, PowerShell, Github etc
Experience crafting, implementing, and operating highly scalable and reliable platform solutions at scale on the public cloud like Azure or AWS
Deep understanding of cloud technologies preferably Azure, including design, standard methodologies around securing cloud environments and hands on experience with IAC and SDLC models.
Capable of technical deep dives into code, networking, systems, and storage with very experienced engineers.
Hands on experience managing Azure Enterprise-scale reference architecture implementations
Deep and extensive experience in building and landing DevOps / SRE practices in a global environment, is required.
Exposure to enabling and managing cloud services, usage, and optimization as well as automation and development of tools to support DevOps model and improvements based on trends and data analysis.
Technical depth that allows you to develop and mentor others as well as build credibility with your team
Experience in Full stack Cloud Infrastructure Engineering, Operations, and Application knowledge
Ability to work in an Extreme Programming environment and work in a paired programming/engineering model
Able to manage diverse teams, multi-task, and work under pressure to meet aggressive schedule targets
Hands on experience with IaC tools like ADO, ARM, terraform, ansible, PowerShell, python, azcli, github
Experience working with and automating enterprise scale cloud infrastructure deployments
Experience with security compliance programs such as ISO, PCI, HIPPA, is strongly preferred
Negotiation skills, stakeholder management and strong ability to manage opposing viewpoints
Asks questions to encourage others to think differently and enrich their analyses of complex situations

Preferred Qualifications:  

Certification in Azure Administrator -preferred, Azure DevOps -preferred, Azure Solutions Architect -preferred
Prior experience working in/with DevOps, Agile and automation and SRE teams.
Prior experience managing Infrastructure and software development or devops teams with automation focus.

Address: USA-IL-Chicago-300 South Riverside Plaza
Store Code: IT Executive & Administration (2760797)

To request a modification to this listing please email

  • Company: Retail Business Services
  • Published: 6th January 2023
  • Closing Date: 23rd February 2023
  • Country: United States
  • Type: Full-time
  • Seniority: Manager
  • FinOps Certifications Required: None