Testing times: Ensuring BBC App testing efficiency

Tuesday 28 February 2017, 15:55

Michael Blades

Senior Test Engineer, BBC TV & Radio

Tagged with:

BBC iPlayer

Michael Blades is a Senior Test Engineer for BBC TV & Radio and is currently embedded as the Test Lead for iPlayer Mobile. In this blog, he details improvements made to quality measurements and how testing methods have changed over the past 18 months.

The test team is responsible for the native application offering on Android and iOS and is located wholly in MediaCityUK beside a host of other teams; within it we have several squads responsible for separate (but not isolated) features that are developed with the ability to release stable software at any given time.

Testers are embedded in these squads with myself overseeing the strategy for utilising our test resources, identifying process improvements and creating plans to assess the quality of our release candidates. Our testers are not just a part of the furniture but are encouraged to evolve with the industry along with our developers; to do this we constantly experiment and iterate on ideas to improve our ways of working within an agile team. We focus on up-skilling, pairing and training to make sure our squads are sharing knowledge and introducing new ideas.

In this blog I hope to provide some insight into our unique development environment and the problems we face in our day-to-day. The first of which is how our defect management has evolved over time to better serve the team, followed by details on our methodology changes to scripted testing and how this has improved our pre-release regression tests.

What we had

When I ed the team defects were not monitored or quantified, they were contained within a massive back a ticket management system that was thousands upon thousands deep. It made providing a known issues list or giving a statement of quality to stakeholders difficult; even deciding the highest priority issues due for fixing proved confusing. Regression testing was brought up as a point of concern at all meetings - mainly due to the time it took to complete - but from our testers was also extremely negative because of the amount of time they had to spend reading the same tests and flagging them as or fail.

Defect management - our first iteration

First of all, we consciously embraced the concept of defects as a part of the codebase and conveyed that managing them can radiate a clear visibility of state. We also wanted the team to understand how changes could negatively affect the various features within each App. Each discovery provided us with a better understanding of our quality status and enabled us to spot defect clusters that revealed any serious underlying problems that needed addressing.

Trying to dig through a digital backlog to pull out all relevant tickets was proving too time consuming and we needed something immediate that the entire team could use to manage defects. We decided to go with the simplest solution: a whiteboard. iPlayer Mobile already used a few of these to visualise its workload and allow collaboration but defects were often lost or forgotten. We cut our losses on existing defects and any new ones that were discovered were also created as physical tickets and sorted together by App platform and feature type.

Anything we discovered and didn’t fix as part of our current work was added to this board. These defects were prioritised according to severity and impact on the ; this is how we decide where to focus our efforts in maintaining iPlayer. Any of these issues that reached an age threshold were re-evaluated in a session where one of two decisions were made. We may have decided to close the defect if it could still be reproduced and we have not had complaints from the s regarding it or if it has been inadvertently fixed by other development work. We may have decided that the issue was still serious and made an enforced effort to get it fixed; these sessions were time consuming however, as we had more ageing defects and going over old ground wasn’t seen as the best use of time.

We discovered from this that the team was more aware of the quality status within the Apps, as anyone on the team (or any other team) could see all of our current defects at any time. We even saw developers attempting to clear tasks from the board when it was getting full, a type of engagement we’ve never had before. We were able to use this as our known issues list so that we could reference the tickets when engaging with audience complaints or another team if they were having similar issues.

However as time went on, the team grew in size and iPlayer spawned iPlayer Kids. Now with three squads and two Apps, the amount of tickets we were creating and the amount of changes that caused defects became too much. The overhead of keeping the board up to date was now a burden and due to the increased workload we were having little time to fit this in, a new solution was needed.

Defect management - what we have now

After reviewing what we needed for our day-to-day, we came up with the following requirements:

Accessible to anyone who may need it.
Easy to sort and pinpoint information.
Needs minimal interaction to ‘just work’.
Enable us to track metrics over time.

To do the above we switched to a digital dashboard, using filters and macros that read data from JIRA and output them into numbers. These can be selected to display the tickets related to the data or even turned into graphs allowing us to quickly make sense of large amounts of data. We house this on our wiki page system that’s used mainly for documentation.

As new defects are created they are automatically sorted. Although it may not be as technically advanced as some of our other tracking mechanisms, it is widely available - anyone in the company from any site can view it, allowing us to keep all of our stakeholders in the loop. As time goes on we can see that we raise more defects than we fix and end up closing more due to age. This tells us that we worry far too much about many small defects that we realistically will never get around to fixing. We lost the visibility that a physical board supplied, but as the team has matured it is not as necessary, as we are more aware that changes can cause unseen permutations throughout the Apps.

This may influence our next iteration, many agile teams implement a zero defect policy where errors are either fixed as they are found or they are forgotten about. I believe this will be difficult to introduce within our environment as we have large-scale ability both to our audiences and the business, although we could make another step in that direction and hope to get there at some point.

Changing our test case and regression test methods

The Mobile iPlayer Apps are in product maturity and therefore suffer from problems commonly associated with this lifecycle stage; for test the classic example is bloated, over-documented tests, some of which are ancient due to creating them as we go through sprints or release cycles. This caused testing to drag on for longer than required for the scale of the changes. The planning, recording of results and then reviewing them became overly time-consuming and bottlenecks in testing became obvious quickly. This also doesn’t factor in the communal groan amongst the team once ‘regression testing’ was mentioned; it had become a piece of work everyone dreaded and no-one wanted to do - it needed to change.

After looking at test techniques both internally and externally we experimented with the idea of a fully exploratory method where testers are given no rails to follow (usually in the form of instructions or Given-When-Then style tests). Results were mixed as we received concerning the lack of information supplied and a willingness to go back to a more traditional linear style. This came mainly from non-test disciplines that used the step-based tests as a rigid set of instructions; a habit we wanted to break however unpopular it was. Product was also concerned that the tests formed a repository of intended behaviour for the application that could be accessed quickly. This would be impossible without test cases that contained detail.

After this experiment ended we were left torn between serving the needs of iPlayer in of not losing information and keeping the tests accessible to all of the team, but also reducing the upkeep and planning overheads that were causing bottlenecks at this point in the release cycle. We also had to maintain the bar Test has set within Mobile iPlayer by providing thorough quality assessments and a high device coverage. We decided to perform an overhaul of our test cases over the course of the following month with all the considerations we had gained throughout our experimentation phase.

Currently our manual tests take the form of a wiki style entry for individual features containing:

High level UI information.
Example images.
Detailed descriptions for complex behaviours.
No completion goals, these are set by the tester.

This method enables experienced testers to work with less structure while providing a reference tool for their observations; not dictating the paths and interactions a tester has with the application should provide more coverage. Inversely inexperienced testers or other disciplines can use the information supplied to get a better grasp on the mechanisms and interacting components within the software.

We also set a goal of reducing the number of tests to a tenth of their original amount and allowing more people access to them with the aim being:

Reducing the overheads involved in maintaining a massive amount of tests.
Allowing us to add tests for new features quickly.
Easing the creation of a test plan.
Opening up the test cases to team ownership.

When faced with the prospect of completing hundreds of tests to cover an application, a large amount of that time is spent filling in results or doing similar . This is now a secondary task against the actual testing. We remain fairly brutal about getting rid of tests we no longer need or diminishing ones that become too large, even more so about un-necessarily testing more than we need to. Although in the future we may scale this down to a truly non-documented exploratory system, like the defect management above this is perhaps a halfway iteration that the team can handle until it’s ready for further change.

Looking Back and Looking Forward

We discovered that sweeping changes are difficult while ing a team that has an unrelenting schedule of new features and improvements. Getting the wider team to embrace change was perhaps even more difficult; for future changes it will be important to involve the team earlier and at all stages. Process changes and improvements proved to be jarring for some, if we were to attempt it again it would have been more beneficial to ease changes in over time and be able to explain our reasoning to the entire team when commencing change.

Empowered testers now work with developers to ensure we have no oversights while planning is undertaken. Since these changes have been implemented we’ve noticed less bottlenecks in the release process - at stages where we would normally be on red alert to create a test plan, we have it ready to start. The team now understands the resources a phase of testing will need and allocates an adequate amount of time to plan, extra bodies to swarm on testing and a communal review period where an assessment of quality will be made.

There aren’t many metrics we could use to track the mood of the team, but from the state we were in a year ago when regression testing was panicked, loathed and painstakingly slow, the team's mood now hardly alters as regression is just another part of our cycle. It's over a lot quicker on the whole, although mapping times isn’t fair as the size of releases can vary, as can the resources. I haven’t heard any complaining in months also, so that’s a good sign.

Potentially this is just a step towards a way of working where we don’t catalogue defects at all and testing is handled purely in very lightweight notations or mindmaps, ideally with the entire team stopping to swarm on a test task before any more progression is made. However this will take time and patience; iterating towards that goal and evaluating along the way will be much more effective due to the nature of our dependencies and the sheer size of the team.

iPlayer has evolved from an App that serves the TV shows that the requests to a platform where you can create your own catalogue of programming, it to watch later as well as being able to play purchased content from a back catalogue. Coupled with the then new iPlayer Kids that has personalisation at its core, this brought tough challenges concerning stability, data and security. This will become more of a priority as iPlayer continues to become less static, instead it will look different for each and surface vastly different content based on what they like, dislike and interact with on a regular basis.

For us this poses one large question: how do you test that? If all s get a different experience how do you ensure you represent them all, or if not all, then at least enough to give confidence in an extremely complex product’s ability to perform well within varied configurations including fragmented devices and network types. It’s one we need to be prepared to answer and then act upon sooner rather than later.

I hope you enjoyed this insight into Mobile iPlayer Test improvements and if you have any questions, comments or similar stories please add them to the comments section below.

Tagged with:

BBC iPlayer

Blog comments will be available here in future. Find out more.

Balancing, spikes and speed: Architecting media distribution cloud services

Tuesday 14 February 2017, 11:04

Six degrees of separation: from Bitesize to Snapchat

Thursday 25 May 2017, 7:44