Cloud Operations Lead

Company: Eli Lilly and Company Published: 1st April 2023 Closing Date: 30th April 2023

This role as ” Cloud Operations Lead” is part of Lilly Hosting and Cloud Services (HCS), that runs the IT Infrastructure Hosting and IaaS Cloud technologies across the enterprise. This spans from our Global Enterprise Data Centers, through our Storage/Backups, Disaster Recovery, Server Virtualization and Server Operating Systems both on premise and in our Lilly Private Cloud and Public IaaS offerings.

We’re looking for a highly skilled and experienced Cloud Operations Lead to join our team.

As Cloud Operations Lead you will be responsible for managing and leading the operations of our cloud-based infrastructure. This individual will work closely with Cloud Foundations /Engineering team to ensure our cloud infrastructure is reliable, secure and scalable.

In this role, you will be responsible for developing the Cloud Ops strategic vision to include assessments, remediation, consulting, onboarding, system setup, system administration, monitoring, incident resolution, problem management, configuration/change management, security management, logging and monitoring, demand/capacity planning, availability management, disaster recovery, cost optimization, cloud migrations and cloud support. This individual will work closely with Product Management, Product Development, and Information Security Officer in the creation of technical cloud standards and cloud architecture and deliver strategic objectives. This position is responsible for developing, implementing best practices, policies, and procedures to manage cloud operations staffing and leadership.

Key Deliverables/Responsibilities

  • Hands-on knowledge of cloud operations in multi-cloud environment with advanced skills in one or more cloud platforms (AWS preferred, Azure or GCP).
  • Be able to use professional knowledge and problem determination / source identification skills to resolve problems involving cloud APIs, application services, IaaS, PaaS, micro-services, containers, middleware components, network, security and infrastructure issues alike. If unable to resolve, will triage and route the incident to the appropriate level of support.
  • Proven technical skills of Cloud infrastructure, software architecture and cloud computing.
  • Demonstrated ability to think tactically and strategically about solutions to business, product, and technical challenges.
  • Serve as a senior-level technical contact for enterprise customers and assure they realize full benefit of cloud services.
  • Oversee a team of employees and Managed Service Provider (MSP) resources and serve as an escalation point for any service impacting issues.
  • Ensure frictionless access to our cloud services while maintaining security and data protection.
  • Explore and deploy highly available, and fault tolerant infrastructure solutions in the cloud.
  • Audit and report on cloud services consumption, and adherence to company policies and procedures.
  • Recommend appropriate cloud services based on compute, data, or security requirements.
  • Identify appropriate use of cloud operational best practices and ensure they are followed.
  • Understand cloud usage costs and FinOps practices and deliver cloud cost optimization solutions.
  • Collaborate with other technical groups (Cloud Engineering, Cyber, Network, etc.) on technology issues and solutions.
  • Evaluate effectiveness of the cloud solutions and make recommendations for improvement.
  • Coordinate and engineer cloud availability zones and regional architecture for data protection.
  • Understand high level cloud application architecture and provide analysis on high-availability and disaster recovery options to clients.
  • Manage incidents, requests and implement changes leveraging ITIL tools and processes.
  • Take appropriate actions to resolve issues and communicate the solution or action plan to the clients..
  • Conduct regular trainings/sessions within the team to keep all the team members updated with latest Cloud offerings.
  • Identify use cases for AIOps, Automation through scripts and tools deployment for monitoring and proactive service delivery
  • Creation of reporting and KPI metrics for servers and storage.
  • Management of after-hours support
  • The candidate will be experienced with the ITIL processes of Incident (including Critical Incident Management), Problem, Change Management and Integrated Service Level Management..
  • Familiarity with Site Reliability Engineering (SRE) concepts and practices.

Minimum Qualification

  • Bachelor’s degree in computer science, Computer Engineering, Information Technology or relevant field
  • 10+ years of operations experience (management of critical infrastructure systems) in a data centre or public/private cloud environment
  • Minimum of 6 years professional hands-on experience with the AWS, Azure or GCP platform
  • Certifications to include several of
    • AWS Certified SysOps Engineer
    • Microsoft Azure Administrator.
  • Experience managing/operating production systems on cloud service provider infrastructure
  • Experience with monitoring, backups, patching and security remediation in a cloud environment
  • Hands on experience with managing resources at the console as well as CLI and understanding cloud API calls.
  • Hands-on with native or third-party cloud optimization tools such as AWS Compute Optimizer, AWS Trusted Advisor, Azure Advisor, Cloudability, Turbonomic, etc.
  • AWS data protection and lifecycle experience using technologies like EBS snapshots, S3/Glacier, data replication between availability zones and regions, etc.
  • Good grasp of fundamental security concepts with hands on experience implementing security controls and compliance requirements.
  • Working knowledge of Linux & Windows OS and familiarity with DB concepts and operation
  • Willingness to serve in a 24×7 on-call cycle to manage escalations or resolve issues.
  • Hands on experience with managing resources at the console as well as CLI and understanding cloud API calls.
  • Experience with automation technologies for CI/CD (Jenkins, Azure DevOps, AWS Code pipeline)
  • Experience with visualization tools (e.g., QuickSight, Power BI)
  • Experience with Python, JSON, YAML, ARM, Cloud Formation
  • Exceptional customer service orientation.

Additional Skills/Preferences

  • Working knowledge of Pharmaceutical regulatory requirements, qualification and validation of applications.
  • Ability to explain complex technology decisions to both technical and non-technical audiences at all levels in the organization
  • Previous experience in cost optimization advisory projects will be an added advantage.
  • Knowledge of cloud management and governance tools (e.g., Cloud Custodian, Trusted Advisor)
  • Familiarity with monitoring tools such as CloudWatch, CloudTrail, etc.
  • Detail-oriented with excellent documentation skills/methodologies, who can successfully manage multiple priorities.
  • Demonstrable high level of intellectual curiosity, external perspective, and innovation interest.
  • Strong communication skills, including the ability to articulate complex technical topics to a non-technical or IDS leadership audience.
  • Resolves highly complex and multi-dimensional problems requiring consideration of variables that affect multiple aspects of the project/program. Problems range from high impact operational events to tactical corrections
  • Ensures that Cloud platforms developed are designed for 24 X 7 operations, Disaster Recovery, Performance Monitoring and oversee Network Operations, Event Management, Incident Management, Problem/Escalation Management, Configuration Management and Change Management Processes
  • Defines and reports Key Performance Indicators to monitor process health; define and report Customer facing service metrics. Collaborate with Security organization to implement and oversee Security policy, monitoring, and guidelines.
  • Conduct System Outage Analysis to prevent the re-occurrence of incidents. Implement continuous improvement plans for all services and processes
  • Service design activities will include building out service capabilities to match SLA requirements, capacity modelling for scale and cost, and security management
  • Ensure all technical procedures (Installation, Configuration, Run Books) are documented and updated and are contributing to the maintenance of operational standards where the automation of the infrastructure services and system administration tasks are critical. Implementation of a monitoring strategy to provide rapid feedback and diagnostics in the event of a service disruption is imperative
  • Relentlessly introduce new ways of improving, optimizing & scaling the cloud services

Coach and mentor direct reports, supporting and inspiring their development and career growth objectives


Location

Details

  • Company: Eli Lilly and Company
  • Type: Full-time
  • Seniority: Mid-level Contributor
  • FinOps Certifications Required: None

To request a modification to this listing please email jobs@finops.org