Skip to Main Content
BMC Helix SaaSOps Ideas Portal
Status Completed
Created by Divya Mohanasundaram
Created on Feb 13, 2025

SA-8348 | SRE Auto-Remediation - Unlock 'srserviceuser' OR 'siadmin' account to avoid impact on reporting application



Problem statement

Issue

  • Partial production outage occurred brhelix customer reported an issue that (SSR-14872) Cross Launch not working for Smartreporting from Midtier and SmartIT) . Fix applied during outage call To fix the issue Unlocked the account by executing the below query on DB

  • update IpClass set ACCESSATTEMPTS='0' WHERE EmailLeft ='siadmin'

  • updated the siadmin user id and password on Midtier config page and in CCS.

Action items to propose automation

To reduce related outage, enable monitoring and build auto-remediation

  • SREMON - SREMON-3974 has been raised to Enable monitoring alert for 'srserviceuser' OR 'siadmin' account lock.

  • SA-8348 - raised for building automation


Automation Consumer

DBA, SRE, NOC/MIM

Automation Trigger type


Based on BHOM event, Intelligent automation to trigger Jenkins Pipeline for remediating the issue


Intelligent Automation tasks

1. Create IA Policy and Publish

2. if Incident ID not available in BHOM event, do not execute policy from IA.

  1. if Incident ID available in BHOM event, IA to execute policy and invoke Jenkins pipeline.

Remediate issue via Jenkins pipeline

Jenkins Pipeline inputs

  1. SR_DB hostname

  2. Incident number

  3. event_creation_timestamp


Automation Solution design

  1. Fetch Smart_Reporting_DB, event_creation_timestamp and Incident as input from the BHOM alert.

  2. Perform Precheck

    1. Fetch Project details based on SR DB hostname COPS API getprojects METHOD

    2. Fetch below fields from COPS via API getprojectparams METHOD

      1. DB_TYPE (Aurora/Postgres/Oracle/MSSQL)

      2. SMART_REPORTING_DB_USER

      3. SMART_REPORTING_DB_USER_PASSWORD

      4. DB_TYPE/RW_PORT

      5. AR_SERVER_SR_DB_NAME_CALCULATED

      6. SMART_REPORTING_DB

      7. SMART_REPORTING_DB_INSTANCE_NAME

  3. Enable auto-remediation by automating below steps. If successful go

    1. Check DB type and Connect to customer SR database (using AR DB Database Server, AR Database User Name for SmartReporting, AR Database User Password for SmartReporting)

    2. Execute below query to check the access attempts. If success got to step 3c, Else abort automation

      1. If user is siadmin, execute below query

        1. SELECT ACCESSATTEMPTS FROM IpClass WHERE EmailLeft = 'siadmin';

      2. if user is sr service user, execute below query

        1. SELECT ACCESSATTEMPTS FROM IpClass WHERE EmailLeft = 'srserviceuser';

    3. If user locked out (ACCESSATTEMPTS >= 3;), Execute below query to unlock accounts via auto remediation. we have to run below query. This will reset ACCESSATTEMPTS to 0. Then system allows to login with old password. If success got to step 4, Else abort automation

      1. if 'srserviceuser' account locked then

        • UPDATE IpClass SET AccessAttempts='0' WHERE EmailLeft = 'srserviceuser';

      2. if 'siadmin' account locked then

        • UPDATE IpClass SET AccessAttempts='0' WHERE EmailLeft = 'siadmin';

  4. Account unlock status

    1. If account unlock success - proceed to step5

    2. If account unlock failed, then automation will update incident workinfo about the error

      1. CRITICAL alert will remain open in NOC queue

      2. NOC will act using regular process

  5. Perform Post-check

    1. Validate if access attempts == 0 for the user account

      1. If user is siaadmin, execute -SELECT ACCESSATTEMPTS FROM IpClass WHERE EmailLeft = 'siadmin';

      2. if user is sr service user , execute -SELECT ACCESSATTEMPTS FROM IpClass WHERE EmailLeft = 'srserviceuser';

    2. Account unlocked but during post validation if unsuccessful password attempts != 0

      1. send email to NOC team to initiate bridge with Monitoring/SRE/Application team for below scenarios

        1. Account unlocked but during post validation if unsuccessful password attempts != 0

        2. If multiple BHOM events created for same customer/user within 2 hours

      2. If issue persists, new BHOM event will be created

  6. Update work info of incident about automation pipeline status(Success/Failure).

  7. Jenkins pipeline status would be available in IA

Automation Benefits


Reduced the Overall turnaround time and eliminate dependencies on multiple teams(NOC/Database).

Event based automation would be enabled



I want it... Soon
  • Attach files