Lead Site Reliability Engineer (f/m/x) - API Platforms

apartmentDeutsche Bank AG placeBerlin calendar_month 
Lead Site Reliability Engineer (f/m/x) - API Platforms Job ID: R0357361 Full/Part-Time: Full-time Regular/Temporary: Regular Listed: 2025-01-29 Location: Berlin Position Overview Deutsche Bank Technology in Berlin DB Technology is a global team of technology specialists, spread across multiple trading hubs and tech centres.
We have a strong focus on promoting technical excellence – our engineers work at the forefront of financial services innovation using cutting-edge technologies. Our Berlin location is our most recent addition to our global network of tech centres and growing strongly.
We are committed to building a diverse workforce and to creating excellent opportunities for talented engineers and technologists. Our tech teams and business units use agile ways of working to create #GlobalHausbank solutions from our home market.
API Platforms and Integration Services Deutsche Bank API Platforms and Integration Services team orchestrates internal and external API Platforms, portals, enabling services and embedded finance products in global level. The team is a highly skilled and innovative group dedicated to developing cutting-edge solutions and services that leverage the power of APIs to drive digital transformation and enhance the banking experience for clients worldwide.

As a Lead Site Reliability Engineer, you will be responsible for the SRE activities across platforms, portals and enabling services together with other SREs and engineers. -> You love this job but feel you cannot tick 100% of the boxes? Send us your CV anyway! Your key responsibilities As Lead Site Reliability Engineer you Orchestrate and contribute SRE activities across API Platforms and Integration services Introduce all engineering disciplines that combine software- and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems Implement the core of DevOps with specific principles and practices, focusing on “what” and “how” to improve reliability Establish and support capacity planning procedures and have a close eye on SLIs and SLOs for production readiness and in live environment Coordinate with the rest of the division and the teams working on different layers of the application and infrastructure, and you have full commitment to collaboration on problem solving For Infrastructure & Service Management you Engage in and improve the whole lifecycle of services - from inception and design, deployment, operation, and refinement Maintain services once they are live by measuring and monitoring availability, latency, and overall system health Scale systems sustainably through mechanisms like automation; evolve systems by pushing for changes that improve reliability and velocity Develop and enforce policies, standards and guidelines for site reliability Automate application and infrastructure deployment activities to production environments For Incident & Problem Management you Perform troubleshooting & Emergency Response Investigate root causes and suggest solutions Increase the productivity by leading blameless post-mortems For Application Maintenance you Collaboratively work with Product Owners and Engineers to run reliable services Configure and maintains application & monitoring Identify business objects for monitoring Track system performance, capacity, and use your experience to create effective strategies for maintaining and improving system performance and availability For Operational Continuous Improvement you Identify issues and optimization potential and introduce related user stories Support with automation knowhow to reduce the risk of bad changes Identify, design, develop, deploy tools and processes to monitor, maintain, and report site performance and availability For Service Onboarding you Support your Squad and your Chapter population in onboarding & promotions Your skills and experiences Hands-on experience with cloud ecosystems run on Google Cloud Hands-on experience with Docker / Kubernetes operations with GKE or similar technology Expert experience with automated infrastructure provisioning based on Terraform/TerraGrunt, Terraform Enterprise, Ansible Advanced hands-on experience with Continuous Integration / Continuous Deployment (Github) and patterns for CI/CD pipelines.

Advanced hands-on experience of monitoring tools like Prometheus, Grafana, Kibana and alerting tools like OpsGenie, NewRelic, DataDog, Splunk, Google Operations-Suite (Stackdriver) Very good knowledge of security capabilities (TLS, OAuth2, KMS, Vault, Admission Controllers, let's encrypt or similar technologies).

Very good understanding of Microservice architectures and experience with API Management with Apigee or WSO2 Experience in software development in at least one language (Java, JavaScript, Python, Go) Good Knowledge of the Software Development Life Cycle processes based on related tools such as TeamCity, BitBucket, Artifactory SonarQube, VeraCode, Crucible JIRA, Confluence, Service Now What we offer We provide you with a comprehensive portfolio of benefits and offerings to support both, your private and professional needs.

Emotionally and mentally balanced A positive mind helps us master the challenges of everyday life – both professionally and privately. We offer consultation in difficult life situations as well as mental health awareness trainings. Physically thriving We support you in staying physically fit through an offering to maintain personal health and a professional environment.
You can benefit from health check-ups; vaccination drives as well as advice on healthy living and nutrition. Socially connected Networking opens up new perspectives, helps us thrive professionally and personally as well as strengthens our self-confidence and well-being.
You can benefit from PME family service, FitnessCenter Job, flexible working (e.g parttime, hybrid working, job tandem) as well as an extensive culture of diversity, equity and inclusion. Financially secure We provide you with financial security not only during your active career but also for the future.
You can benefit from offerings such as pension plans, banking services, company bicycle or “Deutschlandticket”. Since our offerings slightly vary across locations, please contact your recruiter with specific questions. This job is available in full and parttime.
In case of any recruitment related questions, please get in touch with Kilian Weber . Contact Kilian Weber: +49 30 34073087 Wir streben eine Unternehmenskultur an, in der wir gemeinsam jeden Tag das Beste geben. Dazu gehören verantwortungsvolles Handeln, wirtschaftliches Denken, Initiative ergreifen und zielgerichtete Zusammenarbeit.Gemeinsam teilen und feiern wir die Erfolge unserer Mitarbeiter*innen.

Gemeinsam sind wir die Deutsche Bank Gruppe.Wir begrüßen Bewerbungen von allen Menschen und fördern ein positives, faires und integratives Arbeitsumfeld.

apartmentING DeutschlandplaceBerlin
und Praktiken des Site Reliability Engineering (SRE)  •  Erfahrung mit Container-Plattformen, wie Docker und OpenShift / Kubernetes, sowie mit Monitoring-Tools und -Techniken zur Systembeobachtbarkeit und proaktiven Problemlösung  •  Vertraut mit der Microservices...
check_circleNeues Jobangebot

Engineering Manager (f/m/d) - Berlin

apartmentJobs for HumanityplaceBerlin
About the Opportunity We are looking for an Engineering Manager to lead a high-performing, globally-distributed web development team responsible for shaping and executing the technical strategy for semantics and search. This role will focus on...
local_fire_departmentDringend gesucht

Systems Engineer, Amazon Foundational Security Services

apartmentAmazonplaceBerlin
highly motivated Software Development Manager (SDM) to lead a team of Software Developers and Systems Development Engineers in Amazon Security who will raise the security bar by delivering industry-best security services. You should be somebody who enjoys...