Senior Applications Support Specialist
Remote - United Kingdom JR013381Key Responsibilities
Incident & Problem Management
- Lead major incident (MI) bridges and restore service with minimum business impact.
- Handle all L3 escalations, perform deep diagnostics across Java, JVM, middleware, OS, and infra.
- Own technical RCAs, drive long‑term and systemic remediation.
- Identify recurring failure patterns and risks.
Reliability Engineering
- Apply SRE principles: SLIs/SLOs, error budgets, resilience patterns.
- Tune JVM parameters, analyze thread/heap dumps, and improve performance.
- Influence application architecture for fault tolerance, scalability, and recoverability.
- Validate DR readiness, failover behavior, and resilience testing outcomes.
Change, Release & Risk
- Provide technical approval and risk assessment for high-risk changes.
- Enforce operational readiness for new apps and major releases.
- Ensure changes meet audit, compliance, and regulatory expectations.
Automation, Monitoring & Observability
- Build advanced automation using Shell/Python/PowerShell.
- Develop frameworks for health validation, automated recovery, and compliance checks.
- Define observability standards; optimize alerts and improve MTTR.
Leadership & Mentorship
- Mentor L1/L2 teams; review and approve runbooks, SOPs, and KB articles.
- Act as a trusted technical advisor to stakeholders and leadership.
Skills & Qualifications
Technical (Mandatory)
- Strong knowledge of application architecture, distributed systems, and middleware.
- Java expertise: JVM internals, GC, memory management, thread/heap dump analysis, performance tuning.
- .Net — CLR internals, garbage collection, memory management, thread/dump analysis, and application performance tuning.
- Strong Unix/Linux, networking basics, and advanced scripting (Shell/Python/PowerShell/VBS).
- Advanced SQL and understanding of databases; Autosys (or equivalent scheduler).
- Handson with observability tools: Splunk, AppDynamics/Dynatrace, ELK, Grafana, Prometheus.
Reliability & Operations
- Major incident leadership, deep RCA, change/release readiness, DR & resilience engineering.
- Experience in regulated production environments.
Soft Skills
- Strong technical leadership and decision‑making.
- Clear communication during high‑pressure incidents.
- Ownership mindset and business awareness.
Experience & Education
- 7–12+ years in Application Reliability, Production Support, SRE, or platform operations.
- Bachelor’s degree in Computer Science/Engineering or equivalent.
- ITIL, cloud, or industry certifications (preferred).
- Banking/financial domain experience (preferred).
Working Conditions
- On‑call and after‑hours support as required.
- Fast‑paced environment with multiple priorities.
- Hybrid working model
Social Share
More career opportunities at Ensono
Explore additional openings with our team, and apply today.
Pune, India | JR011565
Director – Transformation -(JR011565)
Remote - United States | JR012832
Practice Director- AIOps Consulting
Pune, India | JR012635