Hotline: 0123-456-789

Site Reliability Engineer

Our Purpose
We work to connect and power an inclusive, digital economy that benefits everyone, everywhere by making transactions safe, simple, smart and accessible. Using secure data and networks, partnerships and passion, our innovations and solutions help individuals, financial institutions, governments and businesses realize their greatest potential. Our decency quotient, or DQ, drives our culture and everything we do inside and outside of our company. We cultivate a

culture of inclusion

for all employees that respects their individual strengths, views, and experiences. We believe that our differences enable us to be a better team – one that makes better decisions, drives innovation and delivers better business results.

Title and Summary

Site Reliability Engineer

About the Role
The Merchant Loyalty team is seeking a Senior Site Reliability Engineer (SRE).
As an SRE, you are responsible for ensuring that our platform is stable and healthy. We break down barriers to run our products by fostering developer run ownership and empowering developers to build resilient products. We support our developers during the application build phase in software run principals that includes operational design, automation, capacity planning, monitoring that leads to fault-tolerant, scalable products. We see the big picture and help create and enforce operations standards while facilitating an agile and learning culture.

About the Program:
Within the Data & Services organization, SessionM, a Mastercard Company (now our Merchant Loyalty Program), is a customer engagement platform empowering global brands to forge stronger and more profitable customer relationships. We partner with brands to drive marketing transformation through innovative technology and services.

What you’ll do:

  • Plan, manage, and oversee all aspects of the production environment for all merchant loyalty use cases
  • Define strategies for all facets of observability
  • Identify areas of improvement in production
  • Ability to understand MTTR, SLO, SLI definitions and apply them to services.
  • Respond to Incidents and improvise platform based on feedback and measure the reduction of incidents over time.
  • Ensure reliable, fault-tolerant, efficiently scalable and cost-effective services and infrastructure.
  • Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
  • Practice sustainable incident response and blameless postmortems.
  • Ensures that batch production scheduling and process are accurate and timely.
  • Able to create and execute queries to big data platform and relational data tables to identify process issues or to perform mass updates, preferred.
  • Ability to isolate problems between hardware and software.
  • Analyze ITSM activities of the platform and provide feedback loop to development teams on operational gaps or resiliency concerns
  • Support services before they go live through activities such as system design consulting, capacity planning and launch reviews.
  • Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
  • Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.
  • Work with a global team spread across tech hubs in multiple geographies and time zones

What experience you need:

  • Bachelor’s degree in computer science, software engineering, or a similar field.

 

  • Experience in Splunk and SignalFx
  • Experience with Amazon Web Services including RDS
  • Relevant data DevOps, SRE, or general systems engineering experience.
  • Experience in managing large production platforms.
  • Experience architecting and implementing data governance processes and tooling (data catalogs, lineage tools, role-based access control, PII handling)
  • Strong coding ability in .Net, Golang, C, C++, Ruby etc.
  • Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.
  • Ability to help debug and optimize code and automate routine tasks.
  • Ability to support many different stakeholders. Experience in dealing with difficult situations and making decisions with a sense of urgency is needed.
  • Interest in designing, analyzing and troubleshooting large-scale distributed systems.
  • Appetite for change and pushing the boundaries of what can be done with automation.
  • Experience in working across development, operations, and product teams to prioritize needs and to build relationships is a must.
  • Good Handle on Change Management and Release Management aspects of Software.
Corporate Security Responsibility
All activities involving access to Mastercard assets, information, and networks comes with an inherent risk to the organization and, therefore, it is expected that every person working for, or on behalf of, Mastercard is responsible for information security and must:
  • Abide by Mastercard’s security policies and practices;
  • Ensure the confidentiality and integrity of the information being accessed;
  • Report any suspected information security violation or breach, and
  • Complete all periodic mandatory security trainings in accordance with Mastercard’s guidelines.

More Information

Leave your thoughts

Share this job

We strive to simplify the job search process, making it more efficient and effective for candidates, while helping companies find the right talent to drive their growth and success.

Contact Us

JobMonster Inc.
54/29 West 21st Street, New York, 10010, USA
[email protected]
http://jobmonster.com