Site Reliability Engineer at Syndica

At Syndica, big things happen. Every day, weâre translating vision into reality by tackling new and exciting challenges head-on. This is a breakthrough stage in our company, and youâll experience firsthand the infectious enthusiasm of our employees and leadership team. Youâll have the opportunity to learn new skills, grow your career, and work with the smartest, most passionate people in crypto.

This role will have primary accountability for maintaining and operating Syndicaâs blockchain infrastructure platform. Golang knowledge is a necessity! The team operates with a ârun what you writeâ philosophy and each engineer is responsible for deploying and operating the code they write.

A successful candidate must have demonstrable experience in at least one programming language (preferably Go, Rust or C++), and previous work in SaaS application development and operations. You will be working closely with the Support and Development team on the architecture and configuration of our AWS and GCP hosted infrastructure as well as management of our bare metal RPC nodes. You will be responsible to ensure the environment is configured, managed, and monitored correctly to support the business. You will drive decisions on the right-sizing of servers and storage, troubleshooting performance issues, ensuring the highest level of reliability for the platform, and tuning the environment for maximum scalability, cost efficiency, and security. The ideal candidate will also have prior experience developing applications on either of the three major cloud platforms - AWS, Azure, or GCP via Kubernetes.

Responsibilities

Design, creation, and provisioning of infrastructure
Administer overall site availability, security, latency and system health
Responsible for effective provisioning, installation/configuration, operation, and maintenance of services and system software and related infrastructure
Administer the state of all components in our cloud and bare metal environments
Deploy, manage, and operate the cloud environments
Design, build, manage and operate the infrastructure and configuration of SaaS applications with a focus on automation and infrastructure as code
Design, manage and operate the infrastructure as a service layer (hosted and cloud-based platforms) that supports the different platform services
Develop comprehensive monitoring solutions to provide full visibility to the different platform components using tools and services like Kubernetes, Prometheus, Grafana, ELK, Datadog, New Relic, and other similar tools
Create the environments and tooling that enables the development team to release code quickly and reliably
Identify and troubleshoot any availability and performance issues at multiple layers of deployment, from hardware, to operating environment, network, and application
Evaluate performance trends and expected changes in demand and capacity, and establish the appropriate scalability plans
Troubleshoot and solve customer RPC issues
Ensure that SLAs are met in executing operational tasks
Work with development teams to ensure best practices for scalability, reliability, and security are designed and implemented from the start
Conduct periodic on-call duties

Qualifications

Great collaborator with 5+ years of experience in a DevOps or SRE role
Deep understand of infrastructure-as-code (Terraform, etc.) and deploying large-scale systems reliably
Strong experience with Infrastructure as Code and Configuration Management tools
Experience with Prometheus/Grafana for metrics aggregation/visualization
Configuration of CI/CD pipelines
Experience using Kubernetes
Experience with automation tools/platforms
Experience with alerting and monitoring tools
Strong knowledge of monitoring and performance analytics tools (DataDog, New Relic, etc.)
Commitment to implementing reliability and security best practices
Capacity planning experience, including resource optimization and load testing
Experience working in a highly distributed company is a plus
Align a portion of your day with the business hours of Central Time Zone - UTC -6
Working knowledge of information security issues
Experience in Building and managing Virtualized systems (KVM, OVM, Containers/Docker) and ability to read and understand source code
Systematic problem-solving approach, combined with a strong sense of ownership and drive
Firm grasp of at least one modern programming language, beyond advanced scripting (Shell or Python)
Working knowledge of web and network protocols and standards (HTTP, TLS, DNS, etc)
Experience writing automation tools & eagerness to "automate all the things"

What does success in this role look like?

In three months, you have become our infrastructure administrator with respect to overall site availability, security, latency, system health, customer accounts, and billing. Youâll have taken on independent code review responsibilities and are collaborating on the design of new features
In six months, you have earned the trust of the team and are delivering tasks through the entire SDLC, from design through development with minimal guidance, and are helping to effectively mentor new engineers joining the team
In twelve months, you have established a cadence of predictable, on-time delivery without cutting corners

Site Reliability Engineer

Job Description

Responsibilities

Qualifications

What does success in this role look like?

About the job

Share this job

Similar Jobs

Site Reliability Engineer - Remote

Senior Site Reliability Engineer

Shopify Engineer

Support Engineer

UI Engineer

Never miss a remote job