Testing software at Foundries.io (part 1)

Photo of Milosz Wasilewski

Posted on Jan 5, 2022 by Milosz Wasilewski

5 min read

Testing a customizable operating system is a tricky business. It is even harder when the features are determined by the hardware the OS is running on. Foundries.io delivers a product that needs to work on every platform we support.

This limits the testing surface. In the first part of this blog I will describe how we test the “manufacturing scenario”. The main assumption is that every platform build that is produced by our CI should work on the hardware platform it is meant for. In this scenario, fresh software is delivered to the board before the start of the testing round. This process tries to mimic the use case of initial factory provisioning before shipping the hardware to the customers. Since we’re working with development boards, the best tool for the job is LAVA. It allows you to offload the hard part of interacting with the device. It’s a mature project that's been around for many years.

There are a few ways to prepare a LAVA setup. We’re using small PCs (aka NUCs) as “lava dispatchers”. These are responsible for interacting with the dev boards. Since it is possible to run lava dispatcher inside the container, we decided to use our own product to setup the lab. Dispatchers run a FoundriesFactory created image. The application that runs inside docker containers consists of all parts required to run a dispatcher. These are:

  • Lava-dispatcher itself
  • Ser2net for proxying serial connections
  • PDUagent for controlling power and boot settings outside of LAVA
  • Webserver for delivering LAVA overlays to the DUT

The docker-compose.yaml file for the whole setup is not public at this point. It will be shared in a separate post.

FoundriesFactory based setup allows us to easily perform upgrades. LAVA is usually released every month. The server and dispatchers have to run the same version of the software. Since the server is installed as a cloud service, there is only one piece to update after every release. Software on dispatchers needs to be updated separately. LAVA doesn't have an automatic updates system for containers. FoundriesFactory comes in very handy for this task.

LAVA is, however, only part of the story. FoundriesFactory consists of several parts. Devices are just one of them. Other things that should be considered are: CI for building factory targets, device registry in the factory and OSTree/OTA setup. LAVA isn’t very good when it comes to rebooting devices as part of the test. Therefore testing OTA updates is done in a slightly different way. LAVA performs initial device flashing with “previous” target. Another, more recent target exists and the device should update itself after registration. Checking whether the OTA really happened is tricky. It needs to be verified independently of the device - on the Factory server side. The whole process of scheduling the test and outcome verification is performed by another piece of software - Conductor. It is responsible for listening to jobserv for notifications about successful LmP builds, merging latest lmp-manifest into testing Factories, scheduling LAVA and OTA test and verifying OTA testing results. Luckily this can easily be done with the APIs available from FoundriesFactory and LAVA.

As an example, saving test results to the FoundriesFactory backend can be done in a following way:

    authentication = {
        "OSF-TOKEN": token,
    }

    url = f"https://api.foundries.io/ota/devices/{registered_device_name}/tests/"
    result = {
        "name": "smoke-test-1",
        "status": "PASSED",
        "target-name": "lmp-testfactory-123"
    }
    test_dict = result.copy()
    test_dict.pop("status")
    new_test_request = requests.post(url, json=test_dict, headers=authentication)
    logger.info(f"Reporting test {result['name']} for {registered_device_name}")
    if new_test_request.status_code == 201:
        test_details = new_test_request.json()
        result.update(test_details)
        details_url = f"{url}{test_details['test-id']}"
        update_details_request = requests.put(
            details_url,
            json=result,
            headers=authentication)
         if update_details_request.status_code == 200:
            logger.debug(f"Successfully reported details for {test_details['test-id']}")
         else:
            logger.warning(f"Failed to report details for {test_details['test-id']}")
    else:
            logger.warning(f"Failed to create test result for {registered_device_name}")
            logger.warning(new_test_request.text)

The last part of running regular tests is storing their results. Currently it’s not a very complicated setup. We’re using the existing server side infrastructure of FoundriesFactory, where we can store the test results. They can come either from fiotest (which will be described in the next part) or from outside using the API. Since LAVA doesn’t know how to interact with the device API in the Factory, all the hard work is again performed by Conductor. This setup has a few drawbacks. For example, in case the LAVA testing job fails for any reason (power outage, serial output corruption, etc.) there currently is no way to restart the job and re-record its results. It’s a rare problem but happens and requires manual check on the test results. At this point the team is able to see the results using fioctl:

$ fioctl --factory lmp-ci-testing targets tests 52
NAME                       STATUS  ID                                    CREATED AT                     DEVICE
----                       ------  --                                    ----------                     ------
disable-aklite-reboot      PASSED  094ee642-21de-40d1-b7c1-730125e74c30  2021-09-21 18:35:44 +0000 UTC  lmp-ci-testing-imx8mm-evk-01
fs-resize                  PASSED  987504e6-dd4e-4c89-a956-6dd54adaca4d  2021-09-21 18:35:44 +0000 UTC  lmp-ci-testing-imx8mm-evk-01
network-basic              PASSED  3d72ba2f-e4d8-4f73-b9b3-d674f2c3ca90  2021-09-21 18:35:44 +0000 UTC  lmp-ci-testing-imx8mm-evk-01
docker                     PASSED  c2288ce5-b410-4255-8ca9-a36d43ad6226  2021-09-21 18:35:44 +0000 UTC  lmp-ci-testing-imx8mm-evk-01
docker-networking          PASSED  b17e9c55-b941-4fc5-9e1b-ac696c906ec4  2021-09-21 18:35:44 +0000 UTC  lmp-ci-testing-imx8mm-evk-01
kernel-config-checker      PASSED  82862fd2-f05c-4689-812d-ab98d8685ace  2021-09-21 18:35:44 +0000 UTC  lmp-ci-testing-imx8mm-evk-01
ostree                     PASSED  abb3b6b1-fbbb-4022-b2a5-78c206bc0b5e  2021-09-21 18:35:44 +0000 UTC  lmp-ci-testing-imx8mm-evk-01
aklite                     PASSED  9ac15263-ebc8-4073-a690-748fdba34b42  2021-09-21 18:35:44 +0000 UTC  lmp-ci-testing-imx8mm-evk-01
wlan-smoke                 PASSED  34fa94e0-2da9-4a20-a3c7-ddb324af77f8  2021-09-21 18:35:44 +0000 UTC  lmp-ci-testing-imx8mm-evk-01

Also details of each test are available:

$ fioctl --factory lmp-ci-testing targets tests 52 34fa94e0-2da9-4a20-a3c7-ddb324af77f8
Name:      wlan-smoke
Status:    PASSED
Created:   2021-09-21 18:35:44 +0000 UTC
Completed: 2021-09-21 18:35:44 +0000 UTC
Device:    lmp-ci-testing-imx8mm-evk-01

TEST RESULT  STATUS
-----------  ------
ip-link      PASSED
wlan-up      PASSED
wlan-boot    PASSED
wlan-down    PASSED

This setup is sufficient for now, but we’re already planning on improving it and making the test results available with a web browsable interface. We also need some kind of regression tracing. When a test suddenly goes missing or returns a failed result it should trigger more investigation.

Second part of this series will discuss how the tests are performed in the "rolling update" scenario. This tries to simulated OTA updates of the devices in the field.

Related posts