Enhance rotating equipment reliability: Implementing a bad actor program in the oil and gas industry

H. Saleh, Saudi Aramco

By identifying assets with frequent failures, high repair costs and significant downtime, the program prioritizes corrective actions through root cause analysis (RCA). Tools like fault tree analysis (FTA) and the “5 Whys” method are used to uncover underlying issues such as improper installation, operational overloads and material defects. The effectiveness of the program is measured by metrics such as increased mean-time between failures (MTBF), reduced maintenance costs and improved equipment performance. Success stories demonstrate how addressing bad actors has led to substantial cost savings and enhanced operational efficiency. Key challenges include data collection and resource alignment, while lessons learned highlight the importance of cross-functional teamwork and thorough RCA. The program’s future will involve continuous improvements and integration with advanced maintenance strategies like predictive maintenance and condition monitoring. 

Rotating equipment. In the oil and gas industry, the reliability of rotating equipmentsuch as pumps, compressors, turbines, generators, motors and fin fansis critical to maintaining operational efficiency and minimizing downtime. All types of rotating equipment are considered under a bad actor program. This includes any equipment whose failure negatively impacts plant reliability. 

The importance of identifying and addressing bad actors. Identifying and addressing bad actors within this equipment is essential to achieve “best-in-class” results in plant operation. By extending the lifetime of assets, minimizing repair costs, and optimizing both maintenance efforts and production processes, plants can significantly improve reliability and reduce operational inefficiencies associated with these critical assets. 

Challenges in managing bad actors. One key challenge is the high frequency of failures, which not only drives up repair costs but also causes significant downtime, affecting overall plant efficiency. These recurring failures often signal underlying issues that require thorough analysis and resolution to ensure long-term reliability. 

History of the bad actor program. Over the past decade, the author’s company’s plants have implemented a bad actor program to systematically identify and rank the worst-performing assets that disproportionately impact plant reliability and maintenance expenditures. This allows an appropriate level of attention to be placed on the assets that have the greatest impact on reliability, ensuring that corrective actions are taken promptly. 

Alignment with a reliability strategy. The company’s bad actor program aligns with its overall reliability strategy, serving as a tool to address chronic or recurring issues in critical equipment and ensuring that each asset performs its intended functions within the specified period under operational conditions. 

Bad actor identification. The criteria for identifying bad actors are based on four defined criteria as per the author’s company’s best practices: 

  1. Asset failure consequences: Scored on a scale of 0 (very high) to 3 (very low), where Class 0 represents severe production loss and Class 3 indicates no repercussions. 
  2. Total cost of asset failure: Ranges from > 125% of expected maintenance costs for Class 0 to < 75% for Class 3; while Class 1 ranges between 100% and 125% of the expected maintenance cost, Class 2 is between 75% and 100% of the expected maintenance cost. 
  3. Asset failure frequency: Based on a 2-yr operational history for different asset types. 
  4. Asset repetitive failures: Evaluated based on the frequency of failures due to the same failure mode. 

Each operating facility within Saudi Aramco uses its own data to populate its assets failure consequences, maintenance costs and production losses. 

Data collection and tracking. Data for these criteria are collected using a bad actor index (BAI), which combines the severity and frequency of failures with maintenance costs (TABLE 1). This data is extracted from a centralized system for businesses and is validated against pre-determined scoring criteria. 

The four parameter classes are then put into following BAI formula:  

BAI = Asset failure consequences x Total cost of asset failure x Asset failure frequency x Asset repetitive failure mode

Frequency of data review. Once identified, the performance data of bad actors are reviewed quarterly. Assets are monitored for recommendation implementation status, and once the root causes of failure are addressed, the equipment is placed under observation for a year before being graduated from the program. 

Prioritization of bad actors. Bad actors are prioritized based on their impact on production, maintenance costs and failure modes. A weighted scoring system is used to rank assets using the BAI, which ensures the most critical bad actors are addressed first. 

Steps for conducting an RCA. After identifying a bad actor, the primary goal is to graduate them from the program by addressing root causes. An RCA is performed to uncover the underlying reasons for failure. If failures are due to a single mode, one RCA is sufficient. Multiple failures with different modes require individual investigations.  

RCA tools and techniques. In the author’s company’s plant, RCA employs tools such as FTA and the “5 Whys” methodology, facilitated through the failure reporting analysis and corrective actions system (FRACAS). This system is integral to managing defect reports, investigations and corrective actions. 

Cross-functional collaboration. RCA investigations are led by rotating equipment engineers and involve teams from maintenance, operations, instrumentation, static equipment engineering and inspection teams. This ensures that all aspects of the asset’s failure are considered to analyze determined corrective actions. 

Common root causes. Some common root causes identified during an RCA include improper installation, aging materials, operational overloads and design flaws. 

The implementation of corrective actions. Corrective actions depend on the root cause of the failure and may involve replacing aged spare parts, enhancing operational protocols, the application of protective coatings to prevent corrosion, or the implementation of improved maintenance strategies (FIG. 1). 

 

FIG. 1. Implementation tracking and graduation stages. 

Case study. A notable case study involves a proprietary horizontal, radially split 10-stage C5+ Pentane injection pumpa within the Yanbu NGL Fractionation Department. Since its commissioning in November 2018, this pump experienced several failures due to extensive internal rubbing, leading to high vibration and significant maintenance costs. Given that the maintenance cost exceeded 125% of the expected maintenance average cost, the pump was enrolled in the bad actor program in 4Q 2019. 

The RCA revealed that the pump design was oversized for the actual system requirements, leading to extensive internal rubbing. The corrective action involved retrofitting the pump with non-metallic stationary wear components to improve performance. After the corrective actions had been implemented in 3Q 2023, the pump did not experience any failures, showed significant improvement in reliability and increased its MTBF. The pump was subsequently graduated from the bad actor program in 3Q 2024. 

Monitoring corrective actions. The effectiveness of corrective actions is monitored through metrics such as increased MTBF and reduced maintenance costs, ensuring that reliability improvements are sustained. 

Challenges in implementation. One of the main challenges in managing bad actors are the frequent and unexpected failures that impact plant reliability and production. The repeated nature of these failures often makes it difficult to address the root cause, leading to inefficient troubleshooting efforts. High repair costs and limited availability of spare parts further complicate the management of bad actors, especially when procurement times are extended or parts are obsolete. In cases where parts are obsolete or delayed, short-term mitigation strategies, such as fabricating custom parts or upgrading new types of material, are employed. 

Post-implementation monitoring and verification. The performance and effectiveness of bad actors post-corrective action is monitored by tracking failures during an under-graduation period (12 mos post-corrective action implementation). Any further failures during this phase indicate a need for further action. Otherwise, the equipment is graduated from the program. 

Success metrics. The success of any bad actor program is measured through key metrics such as reduced downtime, improved reliability and significant cost savings, ensuring that reliability improvements are sustained. A notable example is the author’s company’s analysis of average maintenance costs for 16 graduated bad actors in 2Q 2024, which demonstrated a reduction of > 50%, thus validating the program’s effectiveness. 

Ongoing review. Once equipment graduates from the bad actor program, its performance is reviewed periodically as part of the plant’s standard maintenance strategy and is treated similarly to other reliable and healthy assets. 

Impacts of the bad actor program: Reliability improvements and quantitative results. The author’s company’s implementation of the bad actor program has led to a marked improvement in overall equipment reliability. The average maintenance cost reduction and increased MTBF are clear indicators of its success. Additionally, it has streamlined maintenance efficiency and planning, enabling more effective allocation of resources. 

For instance, the maintenance costs for the 16 key assets experienced a 55% savings when compared to maintenance costs 2 yr before the implementation of corrective action. Any equipment that has graduated from the bad actor program should not experience failure for 12 mos after the implementation of corrective actionsthis means the MTBF is > 1 yr. 

Challenges and lessons learned. Throughout the program's implementation, challenges included data collection inconsistenciesparticularly when work orders lacked sufficient detailand team alignment across various disciplines. Bad actors included not only rotating equipment, but also all types of equipment in plant operations. 

Key lessons learned involve the importance of addressing root causes and incorporating solutions to all similar equipment as a proactive measure to avoid future bad actors. Strategies such as nominating representatives from each discipline for the bad actor program have proven effective in fostering collaboration and overcoming resistance. 

Future considerations and recommendations. The author’s company is expanding the scope of the program by integrating it with other reliability tools, such as predictive maintenance and condition monitoring. A bad actor management platform has been deployed to facilitate the tracking of the admitted bad actor: the actor is identified and validated automatically, and an overall score is generated. Additionally, the company conducts periodic revisions of bad actor best practices every 5 yr to keep the program updated. 

For other plants that are considering the implementation of a bad actor program, it is essential to ensure strong criteria that identify bad actors based on factors using data like failure frequency, maintenance costs and production impact. Additionally, involving cross-functional teams in RCA and corrective actions ensures a complete approach to resolving chronic issues. The program should also be integrated with other reliability initiatives to maximize its impact on plant performance. 

The long-term success of any bad actor program will be sustained by regularly revising best practices and ensuring continuous improvement based on plant needs. An awareness session for bad actor programs demonstrates the importance of equipment reliability. Investing in training and team alignment will help maintain its long-term success. 

Supporting documentation and data. Historical data from the company’s Yanbu NGL bad actor program report illustrates the significant improvements in reliability and cost savings achieved through the program. TABLE 2 demonstrates the cost percentage variances before and after implementing recommendations for some graduated rotating equipment for corrective maintenance, upgrades and general maintenance work orders. 

NOTE

a Hyosung horizontal, radially split 10-stage C5 + Pentane injection pump 

ABOUT THE AUTHOR 

Hatim Saleh is an Associate Engineer specializing in rotating equipment engineering within the oil and gas industry. He holds a degree in mechanical engineering from King Fahd University of Petroleum & Minerals and works for Saudi Aramco. Saleh has experience in maintaining and ensuring the reliability of critical machinery, focusing on enhancing operational efficiency and minimizing downtime. Over the course of his career, he has collaborated with teams for turnaround and inspection (T&I), technical support for stationary equipment, and reliability and assets managements units. Saleh’s work contributes to advancing reliability strategies and supporting sustainable operations in complex industrial environments. 

 

Related Articles

Comments

Search