Teaming ground and air vehicles for an autonomous multi-sensor survey

By Simon Batzdorfer, Markus Bobbe, Martin Becker and Ulf Bestmann, Technische Universitaet Braunschweig
All images courtesy of the authors.

Autonomous vehicles equipped with different environmental sensors, such as optical or thermal camera or a lidar, performed a team survey controlled by a central ground station. The ground station serves as a user interface to define missions and tasks and also to visualize exploration task results online. 2D stitched orthophoto or lidar point clouds are transmitted for display and processing into 3D photogrammetry. Georeferencing data is gathered by an integrated GNSS/IMU positioning system.

In disaster scenarios such as fires, floods or search-and-rescue tasks, good situational awareness is indispensable for responders coping with a complex and often chaotic environment. In most cases, a prior known map data are outdated, and an efficient situational proceeding such as path planning or creation of a search pattern cannot be performed. This information can often only be gathered by manned exploration using ground or airborne systems, with limits on availability.

The research project Automated Navigation and Communication for Exploration (ANKommEn) seeks to create an automated unmanned system to close this gap by providing up-to-date scenario information while increasing the safety of human resources, using unmanned aerial (UAV) and ground-based (UGV) vehicles.

To provide up-to-date information of the desired destination area, all vehicles are equipped with identical positioning and communication hardware complemented by diverse sensors (RGB camera, infrared [IR] camera, lidar) for visual exploration. The visual sensor information is transmitted to a central ground station for visualization and/or analysis. To increase the advantages of the system, the unmanned systems should have a high grade of automation to reduce the workload of the operator so that only basic inputs have to be done by the operator. For example, just by marking a destination area and choosing a predefined task, the mission will be planned automatically, and after the corresponding waypoint-list has been transmitted to the vehicles, the mission will start.

Automated procedures of a UAV in particular require valid position information related to accuracy, availability and continuity. In exploration areas where the UAV operates in low altitude or using a UGV, the reception of the GNSS signal can be degraded by the topology (buildings and such). Using more than one GNSS can increase the availability of position information. Vehicle control, georeferencing environmental sensor data and exploration results all require high-frequency absolute position and attitude and heading information. This data is gathered by fusing GNSS and inertial measurment unit (IMU) data.


The overall system consists of three UAVs, two UGVs (Opening photo) and a central ground control station. The latter serves as a central human-machine interface to monitor and manage cooperative operation of the UAVs/UGVs by an operator. Based on a priori known map data, exploration areas and tasks are defined and assigned to the UAVs/UGVs and will be updated with actual information of the visual sensors while performing a mission.

Figure 1 shows the interaction and information exchange between the different vehicles and sensors.

Figure 1. Diagram of interaction and information exchange.

All UAVs/UGVs are equipped with a navigation and communication unit (NAV/COM) and an environmental sensor payload (ENV) unit, including an RGB camera, thermal camera or a lidar respectively.

UAV/UGV and Sensor Hardware. The UAVs carry a payload of 2.7 kg (NAV/COM unit, mounted in the upper compartment, and ENV unit mounted under the UAV) and a flight time of up to 30 minutes (Figure 2, left). The payload sensors are carried and stabilized by a two-axis-gimbal. The environmental sensor payload unit is based on three different types of sensors, which are interchangeable between the different UAVs: RGB camera, lidar and IR camera.

For ground-based exploration, two four-wheel-drive UGVs carry a pan-tilt-zoom (PTZ) camera at the top of front chassis (Figure 2, right), and are equipped with a lidar and a thermal camera, or a stereo RGB camera, respectively.

Figure 2. UAV carrying a lidar (left) and UGV carrying lidar and IR camera (right).

The navigation and communication unit mounted as a stack includes a network processor board for communication and data exchange between the UAV/UGV and the central ground and control station. An embedded processing board provides position calculation and GNSS-NTP-based time server. Data for the position calculation is provided by a custom-designed break-out-board (Figure 3).

Figure 3. Navigation and communication unit.

Data traced by these sensors cannot be sent directly to the ground station because of the huge data amount and the limited bandwidth of the communication link. Therefore, data from the sensors are preprocessed or compressed on a small form-factor personal computer and then transmitted to the ground station.

Ground Station. The ground station is the central device for command, control and visualization of the total system. It provides several options to display the data from the sensors and vehicles and a combination of them, and also provides automated path planning and calculation of the 3D reconstruction (photogrammetry) and online 2D stitched orthophoto.

Software Frameworks. The basic software for determining the vehicle’s state in 3D position, velocity, attitude and heading is established within a modular navigation software framework, with the option to process data of different sensors in real time as well as post-processing for data evaluation and development purposes. Several algorithms for sensor data fusion are implemented. The algorithm for IMU/GNSS fusion is based on an extended Kalman filter and also provides an IMU data-based state vector, stabilized by GNSS information, for the visual sensors. This state vector is published by using the robot operating system (ROS), a framework for inter-process communication based on a TCP or UDP publisher/subscriber concept. The visual sensors and embedded PCs subscribe to different ROS messages, for example, the state-vector-message or information of other sensors.

Figure 4 shows examples of the actual camera view from the UGV, and point cloulds and map generated by the UAV. The software layout can be customized by the user.

Figure 4. From left to right: the actual view by the PTZ camera onboard the UGV, the point cloud gathered by the UAV’s lidar, and the mission parameters and map of an aerial view.


Automated operation of UGVs and UAVs requires valid position as well as attitude and heading information. In the case of using only one GNSS, signal quality and availability can be degraded by the environment (buildings) and can result in less precise or even a lack of position information.

GNSS Multi-Constellation. To overcome the risk of poor availability of GNSS-based position information, parallel usage of different GNSS can raise the number of received satellite signals: GPS, GLONASS, the evolving Galileo and BeiDou. When using a multi-constellation approach for positioning, one has to take care of several differing aspects between the GNSS. Each system uses a different geodetic reference frame and time basis. Measurements gathered from another GNSS system must be transformed into the reference frame of the desired system. The geometric distribution of the satellites is improved by using more than one GNSS constellation, indicated by a lower dilution-of-precision value.

The navigation software framework is designed for real-time computation and also for post-processing. In post-processing, the recorded sensor data is streamed to the software framework with the option of changing several parameters and settings for calculation. One option is to exclude satellites at low elevation from position calculation by changing the cut-off elevation for these satellites. This parameter will be changed to simulate environmental conditions that block receiving GNSS signals, like buildings within urban scenarios, to compare the availability of received GNSS signals for single- and multi-constellation-based position calculation.

Recorded data of a real-world test serves as the database for the post-processing with different cut-off elevation parameters. At the beginning of the field test, there was a short initialization period to boot the OS and to start basic processes for positioning. After that, a predefined mission was flown and the GNSS measurements have been saved for the described post-processing.

Post-processing has been performed with different cut-off elevation parameters of 5° up to 35°. In the case of 35°, the number of GPS satellites is reduced to the minimum for position calculation of four, in contrast to 5–7 available satellites for a multi-constellation based solution.

GNSS/IMU Fusion. Using the GNSS multi-constellation approach can increase availability of position information. For attitude and heading determination, an IMU is nevertheless indispensable. Additionally, the frequency of the pure GNSS-based positioning information is usually between 1 Hz to 5 Hz within the described hardware setup. Meaningful georeferencing of the environmental sensors requires much higher frequency position and attitude information.

The IMU provides high-frequency 3D measurements of accelerations and angular rates. Using common strapdown algorithm processing, high-frequency position, velocity, attitude and heading information is provided in real time. Due to the short time stability of pure inertial navigation, the GNSS positioning results are used for aiding purposes within the Kalman filter’s update step. To overcome the absence of GNSS aiding information even when using multi-constellations, there are mainly two options. First, a short coasting period is possible after the data fusion has reached a steady state.

Second, due to the highly modularly design of the navigation software framework, it is possible to use position or attitude increments from environmental sensor data processing for aiding the IMU.

The vehicle’s state vector is then distributed with high frequency within the system for georeferencing measurements of the environmental sensors, especially the RGB camera and the lidar for photogrammetry and simultaneous location and mapping (SLAM) applications.


In major fire scenarios, maps can be out of date. Therefore, techniques have been developed to gather a 2D overview based on several single RGB pictures taken and processed on board a UAV and transmitted to the ground station via data links. Additional processing of a 3D reconstruction of the scenario is an integrated feature within the ground station. Both approaches were implemented to get an automated rapid aerial mapping solution.

In the case of the 2D overview, SLAM algorithms, often used in robotic research, are adapted for this specific use case. These algorithms provide good results for a rapid aerial mapping solution to get an overview of the scenario, because the map is updated incrementally with every new image, but they are less precise, which can be compensated for by using the photogrammetric 3D reconstruction. The live mapping (SLAM) approach is based on the ORB-SLAM algorithm, and the photogrammetry-based approach uses commercially available photogrammetry software.

The systems, on the UAV for 2D and for 3D on the ground station, use the ROS framework for processing the visual sensor data and the described techniques for positioning, georeferencing and attitude determination. For data exchange between these frameworks, several software interfaces have been implemented. Figure 5 displays a flowchart of the implemented workflow.

The sensor/input data is received by corresponding nodes on the aerial vehicle. After adding the camera pose information to the image in the geo-image flight node, the image is sent to the geo-image ground node on the ground station. The SLAM process is separated into two parts. The SLAM tracker node calculates the transformation between images, and the SLAM stitcher node applies the transformations. The transformed images are displayed by the visualization node. The photogrammetry node receives the georeferenced images, stores the data, and initiates the photogrammetric processing once the survey is finished. The results can also be displayed by the visualization node and exported in a desired format.

Visual SLAM. Computer vision-based algorithms have developed rapidly over the last few years. One method estimates a pose by using monocular image processing, known as parallel tracking and mapping (PTAM). This integrates a bundle adjustment and separates the tracking and the mapping procedure into different threads, leading to a real-time capable framework. These basic PTAM principles have been integrated into a robust loop-closing and another method of relocalization, known as Oriented FAST and Rotated BRIEF (ORB SLAM), shown in Figure 6. Here, tracking, local mapping and loop closing are separated into different threads (gray boxes), with the main map and place recognition in the middle.

Figure 6. ORB SLAM system overview [Mur-Artal, 2015].

The tracking thread predicts the current pose from the last known position and movement by using a constant velocity model and performs a guided search of map points. If these points are found near the estimated position, the velocity model is valid and the tracking procedure continues. Otherwise, the tracking is lost and a relocalization in the global map starts by using a subset of features, which are increased after detection of corresponding features in other keyframes to optimize the camera pose and, finally, the tracking procedure continues. The last step of this procedure is to decide whether the current frame contains enough information to be inserted as a new keyframe for further calculations.

To mark a frame as a new keyframe, the frame must fulfill all of the following conditions:

  • More than minimum number of frames has passed.
  • Local mapping is on idle or condition 1 fulfilled.
  • A minimum number of 50 points is observed.
  • A maximum of 90% of the features is already observed by the other frames.

When a new keyframe is passed to the local mapping procedure and inserted as a node into a co-visibility graph structure, new correspondences are searched in the connected keyframes to triangulate new points. Based on the information accumulated during the tracking, a point culling keeps only high-quality points in the map as well as a culling of redundant keyframes.

Then a loop closing is performed. This is one of the main improvements compared to PTAM. If a loop is detected, the drift accumulated in the loop is computed, and both sides of the loop are aligned and visible points are fused. In a final step, a pose graph optimization is done to achieve global consistency.

This information of the 3D camera pose is used to generate a 2D orthophoto in real time while the vehicle is flying. To create a 2D orthophoto, a common reference frame is approximated, which is orthogonal to all camera measurements. The projection is performed by using a projection model based on a pinhole camera.

After the compensation and distortion, the whole image can be stitched to the current global map.

Photogrammetry. This approach uses off-the-shelf photogrammetric processing software. The processing is triggered automatically when the survey is completed and all images are transferred to the ground station via data link. For georeferencing of the images, the camera location and the inner camera geometry were written to the EXIF file of each image by the geo-image ground node (Figure 5). To ensure an acceptable compromise between orthophoto quality and the required processing time, an analysis regarding the impact of the most relevant processing parameters has been performed.

Figure 5. ROS node layout with SLAM (green) and photogrammetry workflow (red).

The photogrammetry process consists of four steps:

  • camera alignment (optimizing the homographic equation)
  • mesh creation by generated tie points
  • orthophoto creation (dense cloud or digital elevation model)
  • export.

Analyses and Evaluation. To evaluate the correct workflow of both approaches of 2D live-stitching and the 3D photogrammetry, a real-world flight test above agricultural cropland has been performed. The results of both approaches are shown in Figure 7 and Figure 8. Generally, agricultural cropland and its mean textured surface pose a challenge for mapping processes because of the limited number of trackable features.

Figure 7. Orthophotos created with the profiles high and lowest (including ground reference points).

Figure 8. Orthophotos created with 2D live stitching approach of cropland.

Four predefined profiles were used to cover the requirement of compromise between processing duration and quality of the generated orthophoto. Each profile level generates a corresponding level of alignment accuracy and mesh face count: lowest, low, medium and high.

To estimate the accuracy of the created maps by the different profiles, five ground reference points (GRPs) were distributed over the mission area. The location of the GRPs was determined using a RTK-GNSS system leading to a horizontal RMSE below 2 cm. To enable robust processing for this scenario, the overlap and the sidelap was chosen to be 70%. A ground-sampling distance (GSD) of 2 cm was needed to identify the GRPs. This resulted in a mission consisting of six times 100-meter (m) lines with a distance of 25 m in an altitude of 60 m over ground. During the flight time of 4.5 minutes, 271 images were taken.

To compare the profiles, they were triggered one after another with the same set of images. The created results are shown in Figure 7. All profiles resulted in consistent solutions and were successfully georeferenced. The map based on the lowest profile could not recreate the complete area (Figure 7, right). The remaining profiles led to similar results without notable differences to visual inspection. The processing time varied between 1.2 and 3.6 minutes. A comparison of this and other criteria is given in Figure 9.

Figure 9. Evaluation and comparison of defined software profiles and visual SLAM.

The created final image of the SLAM pipeline is shown in Figure 8. The image was updated with every new image and was therefore finished before the UAV landed. The mean location error measured using the reference points was about 8 m, significantly larger than the errors observed in the photogrammetry results. In Figure 9 the results are contrasted to the results of the photogrammetry approach.

While the mean error in the low profile is half as high as in the lowest profile, the calculated errors using the medium and high profiles are not enhanced significantly. The number of tie points created by the lowest profile is an order a magnitude lower compared to the other three profiles.

We conducted flight tests on Langeoog island in the North Sea, to gather information on efforts to protect the island’s coastline from water erosion. For this reason, sand was selectively washed up to the coastline by dredgers at the beginning of October 2017. Between Oct. 26 and 31, due to severe weather with a storm flood, a huge erosion of the washed up sand occurred, and the result is shown in Figure 10. The level of erosion was determined by comparison of the orthophoto of the same area. The dislocation averaged out to 9.9 m with some peaks up to 17.6 m.

Figure 10. Evaluation of erosion.

The 3D photogrammetry provides a more detailed image compared to the image of the 2D-live-stitching approach (Figure 11), but both approaches can provide the desired information of the area.

Figure 11. Result of the SLAM approach with camera poses and tracked features.

Both implemented approaches were successfully integrated to get the desired fully automated rapid aerial mapping solution. This also includes the basic tasks of the automated mission planning, camera control, image transport to ground station, automated processing and the visualization of the results.


The benefits of multi-constellation GNSS positioning have been demonstrated with a focus on UAVs and UGVs operating in catastrophic scenarios, especially where GNSS signal reception might be blocked. This position information is also used for georeferencing of images and visual reconstruction of the area. The overall system has demonstrated the capability of an automated orthophoto generation. Both implemented mapping methods — a 2D live stitching and a 3D photogrammetry — provided results that fulfill the requirements to get an instantaneous 2D overview and a contemporary 3D reconstruction of the area.


This work was done within the joint research project ANKommEn, funded by the German Federal Ministry of Economic Affairs and Energy, administered by the Space Administration of the DLR (funding code: 50NA1518). Project partners are the Institute of Flight Guidance (IFF), the Institute of Mobile Machines and Commercial Vehicles (IMN) — both part of Technische Universität Braunschweig — and AirRobot GmbH & Co. KG, a German manufacturer of multirotor UAVs. The professional fire brigade of Braunschweig and the Lower Saxony Water Management, Coastal Defense and Nature Conservation Agency also participate as associated project partners.


The UAVs are modified AR200 hexacopters, manufactured by AirRobot GmbH & Co. KG and equipped with sensors and processing units by TU Braunschweig. The UGVs are by Robotnik Summit XL. The network processor board is a Ventana GW5520, with a an embedded Cortex A9 processing board, Phytec phyBOARD-Mira i.MX6. A custom break-out board by the Institute of Flight Guidance combines an Analog Devices ADIS16488 IMU and a u-blox LEA-M8T GNSS receiver. The UAVs carry an Allied Vision Manta G-917 RBG camera, a Velodyne VLP-16 lidar, a FLIR A65sc IR camera and an Intel NUC. The navigation software framework is by the Institute of Flight Guidance. The photogrammetry software is Agisoft Photoscan.

SIMON BATZDORFER holds a Dipl.-Ing. in mechanical engineering and is a research engineer at the Technische Universitaet Braunschweig, Institute of Flight Guidance (IFF).

MARKUS BOBBE holds a M.Sc. in aerospace engineering and is a research engineer at the Braunschweig IFF.

MARTIN BECKER holds a Dipl.-Ing. in aerospace engineering and is a research engineer at the Braunschweig IFF.

ULF BESTMANN received his Dr.-Ing. in mechanical engineering from TU Braunschweig. He is head of the navigation department of the IFF. He co-founded the company messWERK GmbH, a service provider in flight testing and certification.

Post a Comment