Shamim Ahmed, Author at SD Times https://sdtimes.com/author/shamimahmed/ Software Development News Wed, 24 Aug 2022 16:17:23 +0000 en-US hourly 1 https://wordpress.org/?v=6.1.1 https://sdtimes.com/wp-content/uploads/2019/06/bnGl7Am3_400x400-50x50.jpeg Shamim Ahmed, Author at SD Times https://sdtimes.com/author/shamimahmed/ 32 32 Optimize continuous delivery with continuous reliability https://sdtimes.com/devops/optimize-continuous-delivery-with-continuous-reliability/ Wed, 10 Aug 2022 14:32:57 +0000 https://sdtimes.com/?p=48549 The 2021 State of DevOps report indicates that greater than 74% of organizations surveyed have Change Failure Rate (CFR) greater than 16% (the report provides a range from 16% to 30%). Of these, a significant proportion (> 35%) likely have CFRs exceeding 23%.  This means that while organizations seek to increase software change velocity (as … continue reading

The post Optimize continuous delivery with continuous reliability appeared first on SD Times.

]]>
The 2021 State of DevOps report indicates that greater than 74% of organizations surveyed have Change Failure Rate (CFR) greater than 16% (the report provides a range from 16% to 30%). Of these, a significant proportion (> 35%) likely have CFRs exceeding 23%. 

This means that while organizations seek to increase software change velocity (as measured by the other DORA metrics in the report), a significant number of deployments result in degraded service (or service outage) in production and subsequently require remediation (including hotfix, rollback, fix forward, patch etc.). The frequent failures potentially impair revenue and customer experience, as well as incur significant costs to remediate. 

Most customers whom we speak to are unable to proactively predict the risk of a change going into production. In fact, the 2021 State of Testing in DevOps report also indicates that greater than 70% of organizations surveyed are not confident about the quality of their releases. A smaller, but still significant, proportion (15%) “Release and Pray” that their changes won’t degrade production. 

Reliability is a key product/service/system quality metric.  CFR is one of many reliability metrics. Other metrics include availability, latency, thruput, performance, scalability, mean time between failures, among others. While reliability engineering in software has been an established discipline, we clearly have a problem ensuring reliability.  

In order to ensure reliability for software systems, we need to establish practices that plan for, specify, engineer, measure and analyze reliability continuously along the DevOps life cycle. We call this “Continuous Reliability” (CR).  

Key Practices for Continuous Reliability 

Continuous Reliability derives from the principle of “Continuous Everything” in DevOps. The emergence (and adoption) of Site Reliability Engineering (SRE) principles has led to CR evolving to be a key practice in DevOps and Continuous Delivery. In CR, the focus is to take a continuous proactive approach at every step of the DevOps lifecycle to ensure that reliability goals will be met in production. 

This implies that we are able to understand and control the risks of changes (and deployments) before they make it to production. 

The key pillars of CR are shown in the figure below:

CR is not, however, the purview of site reliability engineers (SREs) alone. Like other DevOps practices, CR requires active collaboration among multiple personas such as SREs, product managers/owners, architects, developers, testers, release/deployment engineers and operations engineers. 

Some of the key practices for supporting CR (that are overlaid on top of the core SRE principles) are described below.

1)    Continuous Testing for Reliability

Continuous Testing (CT) is an established practice in Continuous Delivery. However, the use of CT for continuous reliability validation is less common. Specifically for validation of the key reliability metrics (such as availability, latency, throughput, performance, scalability), many organizations still use waterfall-style performance testing, where most of the testing is done in long duration tests before release. This not only slows down the deployment, but does an incomplete job of validation. 

Our recommended approach is to validate these reliability metrics progressively at every step of the CI/CD lifecycle. This is described in detail in my prior blog on Continuous Performance Testing

2)    Continuous Observability 

Observability is also an established practice in DevOps. However, most observability solutions (such as Business Services Reliability) focus on production data and events. 

What is needed for CR is to “shift-left” observability into all stages of the CI/CD lifecycle, so that reliability insights can be gleaned from pre-production data (in conjunction with production data). For example, it is possible to glean reliability insights from patterns of code changes (in source code management systems), test results and coverage, as well as performance monitoring by correlating such data with past failure/reliability history in production.   

Pre-production environments are more data rich than production environments (in terms of variety); however, most of the data is not correlated and mined for insights. Such observability requires us to set up “systems of intelligence” (SOI, see figure below) where we continuously collect and analyze pre-production data along the CI/CD lifecycle to generate a variety of reliability predictions as and when applications change (see next section). 

3)      Continuous Failure, Risk Insights and Prediction 

An observability system in pre-production allows us to continuously assess and monitor failure risk along the CI/CD lifecycle. This allows us to proactively assess (and even predict) the failure risk associated with changes. 

For example, we set up a simple SOI for an application (using Google Analytics) where I collected code change data (from the source code management system) as well as history of escaped defects (from past deployments to production). By correlating such data (gradient boosted tree algorithm), I was able to establish an understanding of what code change patterns resulted in higher levels of escaped defects. In this case, I found a significant correlation between code churn and defects leaked (see figure below).

We were then able to use the same analytics to predict how escaped defects would change based on code churn in my current deployment (see inset in the figure above). 

While this is a very simple example of reliability prediction using a limited data set, we can do continuous failure risk prediction by exploiting a broader set of data from pre-production, including testing and deployment data. 

For example, in my previous article on Continuous Performance Testing, I discussed various approaches for performance testing of component-based applications. Such testing generates a huge amount of data that is extremely difficult to process manually. An observability system can then be used to collect the data to establish baselines of component reliability and performance, and in turn used to generate insights in terms of how system reliability may be impacted by changes in individual application components (or other system components). 

4)    Continuous Feedback  

One of the key benefits of an observability system is to be able to provide quick and continuous feedback to the development/test/release/SRE teams on the risk associated with changes and provide helpful insights on how to address them. This would allow development teams to proactively address these risks before the changes are deployed to production. For example, development teams can be alerted as soon as performing a commit (or a pull request) of the failure risks associated with the changes they have made. Testers can get feedback on the tests that are the most important to run. Similarly, SREs can get early planning insights into the level of error budgets they need to plan for the next release cycle. 

Next up: Continuous Quality 

Reliability, however, is just one dimension of application/system quality. It does not, for example, fully address how we maximize customer experience that is influenced by other factors such as value to users, ease of use, and more. In order to get true value from DevOps and Continuous Delivery initiatives, we need to establish practices for predictively attaining quality – we call this “Continuous Quality.” I will discuss this in my next blog. 

The post Optimize continuous delivery with continuous reliability appeared first on SD Times.

]]>
Why performance testing is so vital and so difficult https://sdtimes.com/test/why-performance-testing-is-so-vital-and-so-difficult/ Tue, 05 Jul 2022 17:09:53 +0000 https://sdtimes.com/?p=48176 The ability to ensure applications deliver consistent, responsive performance at all times is critical for pretty much every organization, and is especially vital for retailers and other e-commerce providers.  Even if an app delivers the best, most innovative functionality, it won’t matter if loading or transactions take too long. Further, as users continue to grow … continue reading

The post Why performance testing is so vital and so difficult appeared first on SD Times.

]]>
The ability to ensure applications deliver consistent, responsive performance at all times is critical for pretty much every organization, and is especially vital for retailers and other e-commerce providers. 

Even if an app delivers the best, most innovative functionality, it won’t matter if loading or transactions take too long. Further, as users continue to grow increasingly impatient, the definition of “too long” continues to shrink.

As a result, it is critical to validate application performance, both before new services are launched and before periods of peak activity. For example, for a retailer, it is vital to do extensive performance testing in advance of Black Friday. 

Traditionally, teams have been doing what we’d refer to as “classic” performance testing. Through this approach, teams do end-to-end testing based on high volumes of virtual users or synthetic transactions. Quite often, this type of performance testing is likely to delay new releases by several weeks. Further, this type of testing is extremely costly, invariably consuming a lot of staff time and resources. These challenges are particularly problematic in modern continuous delivery lifecycles, where keeping cycle times to a minimum is critical. 

To combat these challenges, it is imperative that teams gain the ability to speed software releases, while consistently ensuring they’re not introducing performance issues into production environments. 

Introducing Continuous Performance Testing

While a lot has been written about continuous testing, much of the focus has tended to center on continuous functional testing. A subset of continuous testing, continuous performance testing (CPT) is based on the principle of “continuous everything” in DevOps. As opposed to having a single performance testing phase, CPT is employed in situations in which performance testing needs to happen across different phases of the continuous integration/continuous delivery (CI/CD) lifecycle. 

CPT is a key enabler of continuous delivery. With CPT, teams can ensure apps are peak-performance ready at all times, so they can release new code without lengthy performance testing delays. There are three keys to enabling the successful implementation of CPT: 

  •     Teams must be able to specify performance requirements at the component level. These requirements must either be tied to functional requirements or product features, or they need to be tied to an application’s specific system components.
  •     Teams must be able to test each component in isolation.
  •     Teams have to be able to test frequently as application changes occur. 
Continuous Performance Testing: Three Best Practices

As organizations seek to employ CPT in their environments, there are a few key practices that will help ensure implementation success. Each of these practices is detailed in the following sections. 

Test at the Lowest Possible Level of Granularity

With CPT, most testing can be done at the unit, component, or API levels. By establishing testing at the component level, teams can test early and often. This component-level approach offers advantages in speed and operational efficiency.  

Another key advantage is that this approach reduces the number of tests that have to be completed: If component-level tests don’t pass, teams don’t need to run higher-level tests. This means teams can reduce the amount of resource-intensive, end-to-end tests that have to be executed. This also means more testing happens at the CI level, rather than at the CD level, where minimizing lead time for changes is most critical. 

Establish Frequent, Change-Driven Testing 

Once teams begin to do more granular, component-level testing, they can then employ another approach that helps reduce elapsed testing time: change-impact testing. Through this approach, teams can focus testing on specific parts of applications that have changed. At a high level, there are two ways to make this happen:

  •     “Inside-out” approach. In this scenario, teams take an inside-out approach by analyzing the impact of changes made in the code of application components. 
  •     “Outside-in” approach. In this case, taking an outside-in approach refers to focusing on analyzing the impact of changes made to application requirements or behavior. Through this outside-in approach, every time a requirement is changed, teams flag the set of tests that have been affected. In many organizations, this approach has reduced the amount of ongoing testing that is required by approximately 70%.
Scale Testing of Individual Components 

As mentioned above, doing end-to-end performance testing is expensive and time consuming. Through CPT, teams can effectively scale testing on specific components and reduce their reliance on these resource-intensive tests. To further scale component-level testing, teams can integrate CPT activities with CI/CD orchestration engines. In this way, teams can automate a range of efforts, including provisioning of environments, deployment of app components and test assets, execution of tests, capture and dissemination of test results, and post-testing cleanup. Teams can also leverage continuous service virtualization and continuous test data management, which can further boost scalability and test coverage. 

To learn more, view my earlier article entitled, “Optimize Continuous Delivery of Microservices Applications with Continuous Performance Testing.” This article features step-by-step guidance for implementing CPT across all the stages of the CI/CD pipeline.

The post Why performance testing is so vital and so difficult appeared first on SD Times.

]]>
Continuous test data management for microservices, Part 2: Key steps https://sdtimes.com/test/continuous-test-data-management-for-microservices-part-2-key-steps/ Tue, 14 Jun 2022 15:27:51 +0000 https://sdtimes.com/?p=47960 This is part 2 in a series on applying test data management (TDM) to microservices. Part 1 can be found here.  The continuous TDM process for microservices applications is similar to that for general continuous TDM, but tailored to the nuances of the architecture. The key differences are as follows:  Step 1(b): Agile Design Rigorous … continue reading

The post Continuous test data management for microservices, Part 2: Key steps appeared first on SD Times.

]]>
This is part 2 in a series on applying test data management (TDM) to microservices. Part 1 can be found here


The continuous TDM process for microservices applications is similar to that for general continuous TDM, but tailored to the nuances of the architecture. The key differences are as follows: 

Step 1(b): Agile Design

Rigorous change impact analysis during this step is key to reducing the testing (and the TDM) burden for microservices applications—especially in the upper layers of the test pyramid and the CD stages of the lifecycle. There are various ways to do this, following are a few highlights: 

(a)   Code-change-based impact analysis (also known as a white-box, inside-out approach). Through this approach, we identify which services and transactions are affected by specific code changes in implementing backlog requirements. We then focus testing and TDM efforts on those services and transactions affected. This approach is supported by tools such as Broadcom TestAdvisor and Microsoft Test Impact Analysis. This approach is more useful for white and gray box testing, specifically unit and component testing.  

(b)  Model flow-based impact analysis (also known as a black-box, outside-in approach). Here we do change impact analysis using flows in model-based testing. This analysis helps to highlight key end-to-end or system integration scenarios that need to be tested, and can also be traced down to individual components and source code. This approach is supported by such tools as Broadcom Agile Requirements Designer, and is more beneficial for testing in the upper layers of the test pyramid. 

I recommend a combination of both approaches to ensure sufficient test coverage, while minimizing the number of tests in a microservices context. Based on the change impact set, we prepare test data for the tests discussed in the previous section. 

Step 2(a): Agile Parallel Development 

As discussed in the previous section, as part of development, a component developer must also define and implement these APIs:

  •  APIs that allow us to set test data values in the component data store. These are sometimes referred to as mutator APIs. 
  • APIs that allow us to extract test data values, for example, from instances of components in production. These are also known as accessor APIs.

Developers should use the white-box change impact testing technique discussed above to focus their unit and component testing efforts. 

Step 2(b): Agile Parallel Testing

This is an important stage in which testers and test data engineers design, or potentially generate or refresh, the test data for test scenarios that have been impacted by changes and that will be run in subsequent stages of the CI/CD lifecycle. This assessment is based on the backlog items under development. Testers use the TDM approaches described above for cross-service system testing and end-to-end testing.  

In addition, the test data will need to be packaged, for example, in containers or using virtual data copies. This approach can ease and speed provisioning into the appropriate test environment, along with test scripts and other artifacts.  

Step 3: Build

In this step, we typically run automated build verification tests and component regression tests using the test data generated in the previous step. 

Step 4: Testing in the CD Lifecycle Stages 

The focus in these stages is to run tests in the upper layers of the test pyramid using test data created during step 2(b).  The key in these stages is to minimize the elapsed time TDM activities require. This is an important consideration: The time required to create, provision, or deploy test data must not exceed the time it takes to deploy the application in each stage.  

How do you get started with continuous TDM for microservices?

Continuous TDM is meant to be practiced in conjunction with continuous testing. Various resources offer insights into evolving to continuous testing. If you are already practicing continuous testing with microservices, and want to move to continuous TDM, proceed as follows:   

  • For new functionality, follow the TDM approach I have described. 
  • For existing software, you may choose to focus continuous TDM efforts on the most problematic or change-prone application components, since those are the ones you need to test most often. It would help to model the tests related to those components, since you can derive the benefits of combining TDM with model-based testing. While focusing on TDM for these components, aggressively virtualize dependencies on other legacy components, which can lighten your overall TDM burden. In addition, developers must provide APIs to update and access the test data for their components. 
  • For other components that do not change as often, you need to test less often. As described above, virtualize these components while testing others that need testing. In this way, teams can address TDM needs as part of technical debt remediation for these components. 

The post Continuous test data management for microservices, Part 2: Key steps appeared first on SD Times.

]]>
Continuous test data management for microservices, Part 1: Key approaches https://sdtimes.com/microservices/continuous-test-data-management-for-microservices/ Mon, 06 Jun 2022 16:52:38 +0000 https://sdtimes.com/?p=47861 Applying TDM to microservices is quite challenging. This is due to the fact that an application may have many services, each with its own underlying diverse data store. Also, there can be intricate dependencies between these services, resulting in a type of ‘spaghetti architecture.’ For these systems, TDM for end-to-end system tests can be quite … continue reading

The post Continuous test data management for microservices, Part 1: Key approaches appeared first on SD Times.

]]>
Applying TDM to microservices is quite challenging. This is due to the fact that an application may have many services, each with its own underlying diverse data store. Also, there can be intricate dependencies between these services, resulting in a type of ‘spaghetti architecture.’

For these systems, TDM for end-to-end system tests can be quite complex. However, it lends itself very well to the continuous TDM approach. As part of this approach, it is key to align TDM with the test pyramid concept.

Let’s look at the TDM approaches for tests in the various layers of the pyramid. 

TDM Approach for Supporting Microservices Unit Tests

Unit tests test the code within the microservice and at the lowest level of granularity. This is typically at a function or method level within a class or object. This is no different than how we do unit testing for other types of applications. Most test data for such tests should be synthetic. Such data is typically created by the developer or software development engineer in test (SDET), who uses “as-code” algorithmic techniques, such as combinatorial. Through this approach, teams can establish a high level of test data coverage. While running unit tests, we recommend that all dependencies outside the component (or even the function being tested) are stubbed out using mocks or virtual services

TDM Approach for Supporting Microservices Component or API Tests

This step is key for TDM of microservices, since the other tests in the stack depend on it.  In these tests, we prepare the test data for testing the microservice or component as a whole via its API.

There are various ways of doing this depending on the context: 

  1. Generate simple synthetic test data based on the API specs. This is typically used for property-based testing or unit testing of the API.
  2. Generate more robust synthetic test data from API models, for example, by using a test modeling tool like Broadcom Agile Requirements Designer. This enables us to do more rigorous API testing, for example for regression tests.
  3. Generate test data by traffic sniffing a production instance of the service, for example, by using a tool like Wireshark. This helps us create more production-like data. This approach is very useful if for some reason it isn’t possible to take a subset of data from production instances. 
  4. Generate test data by sub-setting and masking test data from a production instance of the service, or by using data virtualization. Note that many microservice architectures do not allow direct access to the data store, so we may need special data access APIs to create such test data.  

Regardless of the approach, in most cases test data fabrication for a microservice must be prepared by the developer or producer of the microservice, and made available as part of service definition. Specifically, additional APIs should be provided to set up the test data for that component. This is necessary to allow for data encapsulation within a microservice. It is also required because different microservices may have various types of data stores, often with no direct access to the data. 

This also allows the TDM of microservices applications to re-use test data, which enables teams to scale tests at higher layers of the pyramid. For example, a system or end-to-end test may span hundreds of microservices, with each having its own unique encapsulated data storage. It would be very difficult to build test data for tests that span different microservices using traditional approaches.   

Again, for a single component API test, it is recommended that all dependencies from the component be virtualized to reduce the TDM burden placed on dependent systems. 

TDM Approach for Supporting Microservices Integration and Contract Tests

These tests validate the interaction between microservices based on behaviors defined in their API specifications.

The TDM principles used for such testing are generally the same as for the process for API testing described previously. The process goes as follows: 

For contract definition, we recommend using synthetic test data, for example, based on the API specs, to define the tests for the provider component. 

The validated contract should be a recorded virtual service based on the provider service. This virtual service can then be used for consumer tests. Note that in this case, a virtual service recording forms the basis of the test data for the consumer test. 

TDM Approach for Supporting an X-service System Test or Transaction Test at the API Level 

In this type of test, we have to support a chain of API calls across multiple services. For example, this type of test may involve invoking services A, B, and C in succession.

The TDM approach for supporting this type of test is essentially the same as that for a single API test described above—except that we need to set up the test data for each of the services involved in the transaction. 

However, an additional complexity is that you also need to ensure that the test data setup for each of these services (and the underlying services they depend on) are aligned, so the test can be successfully executed. Data synchronization across microservices is largely a data management issue, not specific to TDM per se, so you need to ensure that your microservices architecture sufficiently addresses this requirement. 

Assuming data synchronization between microservices is in place, the following approaches are recommended to make test management easier: 

  1. As mentioned before, use model-based testing to describe the cross-service system tests. This allows you to specify test data constraints for the test uniformly across affected services, so that that initial setup of test data is correct. This is done using the test data setup APIs we discussed above.
  2. Since setting up test data definition across services is more time consuming, I recommend minimizing the number of cross-service tests, based on change impact testing. Run transaction tests only if the transaction, or any of the underlying components of the transaction, have changed. Again, this is a key principle of continuous testing that’s aligned with the test pyramid. 
  3. If there have been no changes to a participating component or underlying sub-component, we recommend using a virtual service representation of that component. This will further help to reduce the TDM burden for that component. 
TDM Approach for Supporting End-to-End Business Process or User Acceptance Tests 

The TDM approach for these tests is similar to that for system tests described above, since user actions map to underlying API calls. Such tests are likely to span more components. 

Many customers prefer to use real components, rather than virtual services, for user acceptance testing, which means that the TDM burden can be significant. As before, the key to reducing TDM complexity for such tests is to reduce the number of tests to the bare minimum, using techniques like change-impact testing, which was discussed above. I also recommend you use the change-impact approach to decide whether to use real components or their virtual services counterparts. If a set of components has changed as part of the release or deployment, it makes sense to use the actual components. However, if any dependent components are unchanged, and their test data has not been refreshed or is not readily available, then virtual services can be considered.

The post Continuous test data management for microservices, Part 1: Key approaches appeared first on SD Times.

]]>
3 ways to align site reliability engineering with SAFe – and why it’s a smart thing to do https://sdtimes.com/agile/3-ways-to-align-site-reliability-engineering-with-safe-and-why-its-a-smart-thing-to-do/ Thu, 29 Oct 2020 18:25:54 +0000 https://sdtimes.com/?p=41882 The Scaled Agile Framework (SAFe) is a great tool for establishing agile and Lean best practices across an enterprise. It provides an overarching architecture for aligning development, quality assurance and other functions to produce a faster workflow and to boost performance across the board. There is an important missing link, though. To date the SAFe … continue reading

The post 3 ways to align site reliability engineering with SAFe – and why it’s a smart thing to do appeared first on SD Times.

]]>
The Scaled Agile Framework (SAFe) is a great tool for establishing agile and Lean best practices across an enterprise. It provides an overarching architecture for aligning development, quality assurance and other functions to produce a faster workflow and to boost performance across the board.

There is an important missing link, though. To date the SAFe framework hasn’t incorporated site reliability engineering – a function of growing importance in today’s application-driven economy. 

Site reliability specialists focus on the operational infrastructure so vital to keeping sites and services running. They work to improve availability, latency, performance, efficiency, change management, capacity planning and a host of other factors that influence service delivery and the user experience. 

RELATED CONTENT:  
Taming heterogeneous tooling into cohesion
Value-Based DevOps: Building Software Profitably

So why isn’t this important function included in SAFe? The framework focuses more on system development and delivery than on the operational end of the spectrum where site reliability resides. But times are changing. Progressive companies are shifting site reliability engineering to the left to support development and delivery.  

Why Shift Left?
Site reliability specialists have valuable software engineering skills. They bring an “as-code” approach to configuration, testing and other tasks – reducing the effort involved in monitoring and improving operational metrics. They use these software skills to manage reliability, but they haven’t been positioned to use what they know to build reliability in from the beginning.

A shift left breaks down this functional barrier. It positions reliability specialists to work in concert with development and release teams to ensure architecture and configuration quality across the entire software lifecycle. It also makes the most of those underlying software engineering skills.

The payoff from a shift left can be significant. Your organization can better manage configuration changes, service levels and error budgets. You can establish a continuous cycle of feedback and governance – from initial design and development through to the launch and operation of new services.  And you can better support and advance your agile and Lean objectives.

Mapping Reliability Engineering to SAFe 
Though SAFe doesn’t address the role of site reliability engineering, you can easily map and integrate the function on your own to support a shift left. Focus on the following three points of synergy to integrate reliability engineering at critical junctures in your DevOps lifecycle. 

1. At the application level
Integrate reliability engineers with your SAFe agile development team – the group tasked with defining, building, testing and delivering apps in sprint. These new team members can set up and track application-level service objectives, error budgets and DevOps pipelines. And they can help you ensure each new component and each new application will support reliability – not erode it.

2. At the system level
As you move further along the SAFe continuum, integrate reliability engineers with your SAFe system team to support release train activities for multiple components and applications. These specialists will be positioned to focus on launch coordination, governance of your system architecture, error budget tracking, systemwide service level objectives – and more.

3. At the enterprise level
Finally, integrate reliability engineers into the SAFe enterprise solution delivery function to oversee your enterprise system architecture and service delivery. Task them with establishing and running Centers of Enablement for reliability engineering, developing enterprise-level best practices and governance controls, improving business agility and promoting the reliability of complex architectures.

Selecting the right tools 
This significant broadening of the site reliability engineering function can clearly deliver important new benefits. For optimal outcomes, though, you will also need to broaden your supporting toolset.

At the application level, team members will need to track the resolution of issues they uncover. At the system level, they will need to evaluate readiness and performance against specific service-level objectives. At the enterprise level, they will need a big-picture view of reliability that spans all your systems and services.

Fortunately, a new generation of solutions is emerging to support site reliability engineers as they make the shift left. These new platforms are tailor-built for the task at hand and powered by artificial intelligence, machine learning and intelligent automation. 

One example: The Broadcom BizOps platform includes a Release Health and Risk Dashboard that delivers proactive insights into each new release before go-live. Reliability engineers can quickly pinpoint problems and track remediation. Once a service is in production, an Operations Dashboard helps engineers track availability, response times, error rates, and more. Importantly, the two dashboards interoperate so your reliability team can correlate release health data with production data and evaluate the quality of their release health predictions.

It’s time to get started
If you want to bake reliability into your systems and services from the start, consider broadening the role of your site reliability engineering team. Use the SAFe framework as your guide and align skilled talent at the application, systems and enterprise levels. Select the right tools to support your newly distributed team – arming them with analytics that can turn data into actionable insights. You will be poised to make significant strides in your continuous improvement journey. 

The post 3 ways to align site reliability engineering with SAFe – and why it’s a smart thing to do appeared first on SD Times.

]]>