200+ alerts relating to heartbeat Jira connector in BHOM per month on average
30+ alerts relating to queue not getting drained in BHOM per month on average
240 events per month
180 hours per month
Integration Service alerts 2023 Runbook
https://bmcsre-itom.onbmc.com/aiops/#/situations/situation/14d66638-b03c-48bc-a2ba-e2e2c2539d8b
17 Sep 2024 ETA of Monday 23rd September for more details regarding the runbook.
24 Sep 2024 Cesar to see what we can do with this, update next week.
26 Sep 2024 - Cesar had a meeting with Benson/Elcina - Both agreed to review both runbooks to identified which situations/scenarios from the runbooks can be automated and they will provide the steps to Cesar via email.
03 Oct 2024 Approx 10 different alerts in the runbook, only 2 scenarios are suitable for automation at the moment:
Queue not drained alert, #1 part : Obtain the logs and restart connector pod only
Heartbeat missing alert, #1 part : Obtain the logs and restart connector pod only
Cesar to organise a meeting with Gautam and Manoj.
07 Oct 2024 - Met with Elcina, Manoj and Gautam to validate steps on the runbook and clear questions. All questions were answered, Cesar provided a list of the most alerted data centers for both alerts (heartbeat and queue not dranined) heartbeat.xlsx. Elcina to validate if there govcloud customers. Also created SREMON-3561, found inconsistency in the hostname of the panama alert.
08 Oct 2024 Elcina has advised two use cases. Cesar will update further.
15 Oct 2024 Gonzalez, Cesar to provide examples of the alerts to Goyal, Gautam. ETA to be provided next week 22st Oct.
22 Oct 2024 Gautam is working on this. Tentative ETA 15th Nov.
05 Nov 2024 Cesar will check with Elcina for the approval of queries raised.
12 Nov 2024 Gautam will discuss internally and will reply to Elcina. ETA 29th Nov.
20 Nov 2024 ETA 29th for submission to SolQA.
28 Nov 2024 Submitted to SolQA: [DRSQA-7539] Validate- NOC_Automation_Integration_Service_Heartbeat_Missing - BMC JIRA Production
03 Dec 2024 This will be 2 automation jobs. Still with SolQA. We will check for approval.
10 Dec 2024 Still waiting on SolQA.
17 Dec 2024 Ferens, Jason to follow up with Kishore, Akshar .
31 Dec 2024 Goyal, Gautam this week's CAB on Friday 1/2 (DRSQA-7539). prod release 1/2. There is a second IA for Panama Integration (SA-7536). We don't have correct details in the bhom alert. Bone, Richard has followed up with an email (Accessing the Panama Project namespace)
08 Jan 2025 Meeting scheduled for 9th Jan with Rich and Cesar to investigate further.
21 Jan 2025 Bone, Richard to follow up with the Tools Team about Prometheus event enrichment. SREMON-4415
04 Feb 2025 Gonzalez, Cesar has setup a meeting for 5th Feb.
11 Feb 2025 One automation is live -"Heart beat missing" . One more pending. Bone, Richard Will follow up on this.
I want it... | Drop everything to work on this now! |
@Sumeet Gill This is already developed and under canary prod rollout phase. Hence please move this ask to 'Already Started'
Visible Customer Benefit - All
Internal Benefit - Medium
Financial Benefit - No
Efforts - Medium,
Impact Urgency - High
Disruption Risk by Not Doing This - High
Aligned to Strategic Goals - Moderate
Hard Need By Date? - NA
Executive Escalation? - NA
Force Highest Priority -NA
@Jason Ferens Heartbeat missing is live.
ActiveMQ queue not drained for Panama service alert --> Plan to present CAB on 27th FEB.