Senior Site Reliability Engineer (all genders)

apartmentgridX placeMünchen calendar_month 

How you can contribute to gridX

Do stuff that matters - Become a part of gridX and contribute your own part to digitalise the energy industry with us and thus make renewable energies accessible and affordable everywhere #getshitdone

As a cloud infrastructure team, we create the basis for success and ensure scalability - from its production to our own deployment system, which enables us to develop new applications in the shortest possible time and roll them out to our customers.

Our tools, most of which were developed in Go, make everyday life easier for other internal teams working with gridBox.

  • You are responsible for our critical cloud infrastructure and manage it as code
  • You want to take responsibility and continuously improve the cloud infrastructure of gridX
  • You identify potential problems or bottlenecks in advance and contribute your own ideas to make our service platform more resilient and reliable
  • You understand the value of a "service ownership" culture
  • You will support or mentor our developers in reliably running their services in production by helping them deploy their applications and interact with various cloud services
  • You take care of our internal monitoring for our applications and infrastructure and guide our developers in setting up their own dashboards and alerts
  • You manage core tools to support and accelerate our development, e.g. CI/CD, Docker Base Images
  • You enforce Cloud Native best practices across the company
  • You document everything and maintain our runbooks
This is how you and your application stand out
  • You have a strong awareness and experience of working with the principles of site reliability engineering
  • You know how to achieve high availability, scalability and fault tolerance for distributed software in production
  • You have experience with Kubernetes and know how to reliably run even larger clusters
  • You are experienced on a senior level in using IaC tools (e.g. Terraform, Cloudformation, CDKs; we mainly use Terraform
  • You have a deep understanding of cloud offerings like AWS. Specifically AWS services like EC2, EKS, Lambda, Kinesis, DynamoDB, SNS, IAM, RDS
  • You have proactive security in mind and follow best practices and official benchmarks
  • You know what Scratch or Distroless containers are about and why they should be used in production
  • You have at least 5+ years of experience in at least one modern programming language like (e.g. Go, Python, Java; we mainly use Go)
  • You know how to increase transparency, optimize dashboards and define alerts by using monitoring solutions such as Prometheus and Grafana
Why gridX
  • Flexible & mobile working: Work remotely for up to 70 days from anywhere in the EU
  • Vacation: 30 days for your relaxation + 0.5 days Special Leave at the end of the year for the 24th and 31st of December each
  • Health & Sports: 30 Euro allowance for Urban Sports Club or E-Gym Wellpass as well as offers for company health management & (Mental) Health Care
  • Personal development: cross-functional coaching, access to e-learning platforms & an annual development budget of 1,500 euros per employee
  • Employee discounts: Access to gridX Corporate Benefits
  • Stay fit and safe the planet with our JobRad offer
  • Receive a fair monthly contribution to your company pension plan
  • City travel subsidy: 30 Euros monthly allowance for your monthly/annual ticket
  • Modern workplace in the hearts of Aachen and Munich with IT equipment of your choice
  • Annual Teamweek: Enjoy an unforgettable teamweek, face extraordinary challenges together with all gridX teams and create unforgettable memories!
  • Experience the gridX culture at regular team events and receive 100 Euros on top per employee for your department event
  • We will donate 20 Euros to a charity of your choice on your birthday
  • Sabbatical option: take a break from the daily work routine and realize personal projects, travel or further education
  • Our benefits differ for 100% remote employment!
apartmentBayerische VersorgungskammerplaceMünchen
für bestehende Prozesse und unterstützen die Bereiche bei spezifischen Anforderungen im Rechenzentrum.  •  Sie übernehmen Aufgaben im Bereich Site Reliability Engineering und tragen zur Stabilität und Skalierbarkeit unserer Systeme bei, einschließlich...
apartmentgridXplaceMünchen
ein ausgeprägtes Bewusstsein und Erfahrung in der Arbeit mit den Prinzipien des Site Reliability Engineering  •  Du weißt, wie man Hochverfügbarkeit, Skalierbarkeit und Fehlertoleranz für verteilte Software in Produktion erreicht  •  Du hast Erfahrung mit Kubernetes...
apartmentGoogleplaceMünchen
designing, analyzing, and troubleshooting distributed systems.  •  Ability to debug, optimize code, and to automate routine tasks.  •  Excellent problem-solving, verbal, and written communication skills. About the job Site Reliability Engineering (SRE)...