Creation of Bathymetric Surfaces
Using High-Density Point Surveys
Prepared by:
Mark Schmelter
CEE 6440
GIS in Water Resource Engineering
Fall 2007
David Maidment
Christopher Neale
Table of Contents
Interpolation
of Varying Point-Densities
List of Figures
Figure
1. Spatial extent of LiDAR data and study reach
Figure
3. RMSE as a function of data subset
Figure
4. Time-to-compute as a function of data subset
Figure
5. Mean residuals of interpolation methods as a function of data subset
Figure
6. Three-dimensional view of LiDAR dataset
Figure
7. (a) Typical interpolated surface, (b) standard error map
Figure
8. (a) Iinterpolated surface, (b) standard error map, with aerial photography
List of Tables
Table
1. Data density of full LiDAR survey
Table
2. Subset feature classes
Table
3. Interpolation method and subset results
Table
4. Simulated survey results
The practice of water resource engineering is constantly evolving due to improved methods of data collection and analysis. Literature in this discipline cites the use of novel techniques, sophisticated algorithms, and specialized equipment as the basis of their analysis. For example, multi-dimensional hydraulic models are used increasingly to describe complex flow patterns in engineered and natural environments. Because flow patterns are the result of water-land interactions, the internal hydraulics of a hydrodynamic model are a function of bathymetry and planform forcing data. These models are, therefore, constrained by how accurately the real bathymetry and planform are represented in the model. Another technology that is being increasingly employed is airborne bathymetric LiDAR (Light Detection And Ranging). The past two decades have seen substantial improvements to airborne LiDAR bathymetry (ALB) technology (Hilldale and Raff 2007), making it an efficient method for collecting high-density surveys over large geographic extents. For many applications (e.g., tracking geomorphic change of river reaches over time, or obtaining surveys dense enough to capture small scale features in bathymetry for a hydrodynamic model), the traditional river transect surveys are insufficient due to the intense labor and cost, not to mention the time and potential dangers associated with obtaining a full bathymetric survey of a river with field crews. The high-density survey is both a strength and a weakness, however; a strength because it is able to better capture small scale features, and a weakness because the high number of survey points quickly becomes difficult and time-consuming to analyze.
With this in mind, the present paper explores the
relationship between the time requirements and accuracy of multi-resolution
data collected with ALB on the
The process of transforming point survey data into a continuous surface involves spatial interpolation. For this paper, three different interpolation techniques are explored: inverse distance weighted (IDW), ordinary kriging (OK), and local polynomial interpolation (LPI). These methods were selected, in part, due to their availability in the Geostatistical Analyst an extension to the ESRI ArcMap environment but mostly because they are commonly employed methods for spatial interpolation (especially IDW and OK). The geostatistical analyst provided a uniform environment in which the interpolation methods could be implemented and, furthermore, facilitated the evaluation of each method through its cross-validation tool; thus, it was possible to compare quantitative metrics of the resulting bathymetries across methods and point densities.
This paper addresses three main issues dealing with spatial interpolation of point data: (1) evaluate the differences between interpolation methods based on identical input point feature classes, (2) evaluate each interpolation method using varying point-density forcing data with respect to time and accuracy, and determine if there is a threshold of diminishing returns after which the additional time spent in computation results in only slight gains in accuracy, and (3) simulate different field survey patterns to create interpolated surfaces from each pattern and determine if an augmented survey methodology can improve interpolated surfaces in situations where high-density surveys (such as LiDAR or photogrammetry) may not be feasible.
The LiDAR data on the Trinity were collected in December
2005 with a helicopter-mounted SHOALS-1000T.
This unit emits both green and infra-red radiation; the green
wavelengths are able to penetrate the water up to 50 m under ideal conditions (at
the time of the Trinity LiDAR survey, the deepest section of river did not
exceed 6 m) and the infra-red wavelengths are used to detect the water surface (Hilldale and Raff 2007). With this equipment, both the bathymetric and
topographic readings can be taken (i.e., it will survey both underwater
surfaces as well as the banks) with horizontal and vertical accuracies of 2.5 m
and 0.25 m respectively. In contrast, a
study on the
Due to their nature, the first objective (compare different interpolation methods using an identical input point file for each method) and the second objective (compare the resultant surface of differing input feature class point densities) were implemented in parallel. Instead of using all 7.5 million points of the LiDAR dataset, a short reach of river below Lewiston Dam was selected as the study reach. Figure 1 shows the overall extent of the full survey data but with the actual points omitted (the LiDAR data span the entire photographic background shown in Figure 1) and the study reach with its survey points. The study site contained nearly 250,000 points over an area of 3,000,000 ft2. Table 1 is a summary of the point densities for the study reach.
Table 1. Data density of full LiDAR survey
Full LiDAR Dataset |
||
Area |
3000000 |
ft2 |
Points |
242,752 |
Points |
Density |
0.081 |
Points
per ft2 |
Density |
12 |
ft2
per point |
Figure 1. Spatial extent of LiDAR data and study reach
(full LiDAR survey corresponds with the aerial photography)
Subsets of 85%, 70%, 55%, 40%, 25%, and 10% of the original point data in the study reach were created using the Create subsets tool in geostatistical analyst. This tool randomly selects the defined percentage of points from the original point survey feature class and creates a new feature class at the corresponding lower density. Table 2 is a summary of the subset feature classes with their respective point densities.
Table 2. Subset feature classes
Subset, % |
Number of Points |
points/ft2 |
ft2/point |
100% |
241657 |
0.081 |
12 |
85% |
205412 |
0.069 |
15 |
70% |
169165 |
0.057 |
18 |
55% |
132916 |
0.045 |
22 |
40% |
96669 |
0.032 |
31 |
25% |
60419 |
0.020 |
49 |
10% |
24168 |
0.008 |
124 |
Each subset feature class was interpolated to a surface using IDW, OK, and LPI with consistent parameterization between subsets (e.g., IDW used the same search radius and minimum number of points independent of subset feature class, etc.) Each interpolation run was timed to get the approximate time-to-calculate for the different methods with each subset feature class.
All interpolation methods rely upon what has been widely labeled as the first law of geography, which is that "Everything is related to everything else, but near things are more related than distant things" (Tobler 1970). This commonality notwithstanding, not all methods are alike; some methods require substantial user interaction while others require very little thought. A brief introduction to each method evaluated in the study follows.
Inverse distance weighted is a deterministic technique that utilizes local measurements to make predictions. As the name suggests, the weight or influence given to a particular point is determined by the inverse distance; as the distance increases, the weight decreases. The search radius (or ellipse in anisotropic scenarios) can be customized to accommodate various distributions of data. The output from IDW, however, only includes a prediction map. Ordinary kriging is a robust statistical method that can not only provide a prediction map of the variable of interest (e.g., elevation), but also the standard deviation associated with each prediction. Thus, kriging makes it possible to create, for example, a confidence interval of the prediction map. This feature is particularly useful in risk analysis and other scenarios where informed decisions must be made. Kriging, however, requires a working knowledge of the statistical parameters that comprise the method, namely: covariance structures, variography, and linear regression. A more complete description of both IDW and ordinary kriging can be found in Merwade et. al. (2006). Local polynomial interpolation is an inexact interpolator that, much like IDW, makes use of data immediately surrounding the location of interest (local neighborhood) and creates a smoothed polynomial that fits the observed points. This method does not assess the associated prediction error and, therefore, only produces a prediction map.
One qualitative and three quantitative metrics were used to evaluate the interpolation methods and optimum survey point density. Quantitative measures included the root mean squared error (RMSE) and mean residual values obtained from cross-validation and the time required to actually create the prediction surface, which is a measure of the computational requirements. Qualitatively, each method was compared using the perceived ease with which it was implemented, how intuitive the method was, and how much parameterization was involved. The values for RMSE and mean residuals were obtained by way of cross-validation. This method will remove an observed (i.e., surveyed) data point, create an interpolated surface using the remaining surveyed data, calculate the predicted elevation at the exact location of the removed data point, and subtract the observed value from the predicted value. This difference between predicted and observed data values is called the residual. The cross-validation algorithm in the geostatistical analyst will do this for every observed point in the data file, resulting in a distribution of residuals, which can be analyzed to make inference on the predicted surface. For example, Figure 2 is a set of simulated residuals that help facilitate the interpretation of the RMSE and mean residual. The mean residual is an indicator of whether or not the predicted value is generally under- or over-estimated. The red data, for example, have a mean residual of 0.1 while the black data have a mean residual of -4. Generally speaking, the method resulting in the black data points under-predicts by 4 units while the method for the red units over-predicts by 0.1. The variability, or spread of the residuals is an indicator of the consistency of the prediction. In Figure 2, the red data over- and under-predict by large amounts (beyond 5 units). The black data, on the other hand, have a small spread indicating that although the predictions are, on average, 4 units below observed, the method is returns fairly consistent results. Thus, the best models return residuals that have a mean of zero (i.e., they predict right on the observed values) and have a low RMSE (i.e., predictions are consistent).
Red: Mean residual = 0.1; RMSE = 2
Black: Mean residual = -4; RMSE = 0.5
The final objective of this study involves simulating different survey point patterns achievable by field crews. Instances may arise when high-density survey methods such as LiDAR or photogrammetry may not be feasible, thus necessitating the collection of bathymetric data through traditional survey techniques. Using the full resolution LiDAR survey, individual points were selected in different arrangements in an effort to find a survey pattern that decreases the RMSE and/or standard error of the interpolated surface without requiring extra survey points or knowledge of the bathymetry a priori. The three aforementioned interpolation methods were utilized and the RMSE and mean residuals used as metrics.
In brief, the purpose of this study is to evaluate and compare the results of three different interpolation techniques using variable-density point feature classes. Generally speaking, the more survey points available for spatial interpolation the better. In reality, however, computer resources are not infinite, and time spent on intense computations for large datasets may provide only marginal improvements in accuracy. While accuracy is paramount in the context of modeling, achieving the best possible resolution may not be a worthwhile, or computationally feasible endeavor. The current study is readily applicable to the practice of water resource engineering as it provides a benchmark for those tasked with the creation of continuous surfaces from high-density point data. In this particular case, bathymetry data are the main focus, though the concepts presented in this paper extend beyond river channels. Many analyses in water resource engineering are heavily reliant upon interpolated surfaces either topography or bathymetry in the form of digital elevation models (DEM) or vector surfaces. For example, watershed delineation and characterization, incipient channel models, and multi-dimensional hydraulic models all use continuous surfaces. Furthermore, numerous models are indirectly dependent upon accurate spatial interpolation of surfaces (e.g., habitat models, geomorphologic models, and ecological models at-large incorporate hydraulic models as forcing data.)
Table 1 presents the results of the three interpolation methods of interest using the variable-density point surveys that have been previously described. The top-most section of Table 3 organizes the data by subset, and presents the corresponding point densities and total number of points for each subset.
Table 3. Interpolation method and subset results
Covariables |
|||||||
Subset, % |
100% |
85% |
70% |
55% |
40% |
25% |
10% |
points/ft2 |
0.081 |
0.069 |
0.057 |
0.045 |
0.032 |
0.020 |
0.008 |
ft2/point |
12 |
15 |
18 |
22 |
31 |
49 |
124 |
Number of Points |
241657 |
205412 |
169165 |
132916 |
96669 |
60419 |
24168 |
Inverse Distance Weighted |
|||||||
[n=15, min n=10] |
|||||||
RMSE, ft |
1.4 |
1.422 |
1.447 |
1.465 |
1.524 |
1.573 |
1.836 |
Mean Residual, ft |
-0.006351 |
-0.00892 |
-0.00841 |
-0.01266 |
-0.01596 |
-0.02827 |
-0.03692 |
Time, mm:ss |
01:16 |
01:01 |
00:50 |
00:35 |
00:23 |
00:12 |
00:04 |
Ordinary Kriging |
|||||||
[Exponential Model; First-order
Trend, 100% Local; Lag=10, No. Lags=20, Nugget=1] |
|||||||
RMSE, ft |
1.357 |
1.365 |
1.385 |
1.401 |
1.443 |
1.483 |
1.673 |
Mean Residual, ft |
0.0005898 |
0.0014 |
0.00242 |
0.001919 |
0.002408 |
0.000391 |
0.00499 |
Time, mm:ss |
12:12 |
10:14 |
06:17 |
05:16 |
03:48 |
02:35 |
00:27 |
Local Polynomial |
|||||||
[Default parameters] |
|||||||
RMSE, ft |
1.354 |
1.369 |
1.392 |
1.409 |
1.461 |
1.514 |
1.672 |
Mean Residual, ft |
0.007514 |
0.00822 |
0.00877 |
0.01007 |
0.01221 |
0.01755 |
0.02328 |
Time, mm:ss |
00:46 |
00:51 |
00:45 |
00:31 |
00:24 |
00:12 |
00:10 |
Perhaps a more effective method of conveying the
interpolation results of Table 3 is embodied in figures 3 through 5. Figure 3 clearly shows a negative correlation
between point densities and percent of total data. This is a rather intuitive result: increased
points beget more accurate surfaces.
Interpolation by ordinary kriging parameterized as shown in Table 3
consistently returned more accurate surfaces than both IDW and LPI. Although it is indistinguishable from Figure
2, LPI actually produced a more accurate survey at full data-point density
the LPI RMSE was 1.354 compared to ordinary krigings 1.357. One notable feature in Figure 3 is the
exponential decay of RMSE with respect to point density. For ordinary kriging, the decrease in RMSE
slows rapidly after 25% of full data density (this corresponds to approximately
1 data point for every 50 ft2.
At
Figure 3. RMSE as a function of data subset
Figure 4 shows that ordinary kriging experienced rapid increases in the time-to-compute relative to IDW and LPI. LPI was actually the fastest interpolator, and it returned a similar RMSE value to ordinary kriging at each point density. Figure 5 shows the mean residual of the methods at each point density. Ordinary kriging performed the best in that it provided small mean residual values that were consistent across point density. IDW and LPI, were biased low and high, indicating that they generally under-predict and over-predict, respectively. The mean residual value for IDW and LPI was correlated with point density.
Figure 4. Time-to-compute as a function of data subset
Figure 5. Mean residuals of interpolation methods as a function of data subset
When utilizing high-density point surveys from LiDAR it is necessary to quality control the data. For example, the present dataset was analyzed in the geostatistical analysts Explore Data tool. Using the three-dimensional data viewer, points that likely should have been removed were identified. Figure 6 shows a screen-shot of the three-dimensional viewer for the current dataset. The large spikes in the data (elevations) are unlikely to be topographic. It is possible that these values are actually tree tops or other physical features that may have been in the field of view of the LiDAR. If these data points are indeed tree tops or other structures, they should be removed prior to interpolation. The present study did not remove these data prior to analysis and it is recognized that by not doing so, the interpolated surface relative to the real surface may not be entirely accurate. The analysis at hand comparing different interpolation methods and point densities is not biased, however, since all methods must utilize the same data set and are subject to the same cross-validation requirements.
Figure 6. Three-dimensional view of LiDAR dataset
Figures 7 and 8 show an interpolated surface of the study
site with its associated standard error map.
These surfaces were prepared using ordinary kriging, which is the only
method that allows for a map of the prediction standard errors. The prediction standard error map is a
function of point location; in areas of
Figure 7. (a) Typical interpolated surface, (b) standard error map
Figure 8. (a) Iinterpolated surface, (b) standard error map, with aerial photography
Shown in Figure 9(a) are the three different survey configurations that were simulated from the full-density LiDAR dataset. The red circles represent the classic transect survey that is used extensively by researchers. The cyan squares is a modified survey method in a V pattern and, likewise, the magenta triangles represent the inverted V or IV survey. Figures 9(a) through (b) show the standard error prediction maps associated with each method. Table 4 shows the tabular results of the three interpolation methods with each survey method. These data indicate that the methods have similar results, with moderate differences in the evaluation metrics. The survey-interpolation combination that resulted in the lowest RMSE was ordinary kriging with the classic transect survey. Ordinary kriging is the best interpolator in this instance as it provides the best predictor, an estimate of the standard errors, and at this point density presents no significant time-to-calculate. While the lowest RMSE occurred with kriging as the interpolator using the classic transect survey configuration, the V and IV methods do provide a seemingly larger coverage of lower standard errors. By spreading out the survey points, the standard error cloud seems to be stretched, which would result in a tighter confidence interval of the predicted surface. This phenomenon is particularly noticeable when comparing Figure 8(c) to (d).
Table 4. Simulated survey results
|
|
Classical
Transect |
V-Survey |
IV-Survey |
IDW |
RMSE,ft |
2.335 |
2.976 |
2.707 |
Mean
Residual, ft |
-0.2753 |
0.1436 |
-0.07645 |
|
Kriging |
RMSE,ft |
2.063 |
2.704 |
2.817 |
Mean
Residual, ft |
-0.2323 |
-0.0176 |
-0.1142 |
|
Local Polynomial |
RMSE,ft |
2.907 |
2.929 |
3.21 |
Mean
Residual, ft |
0.08582 |
-0.0834 |
-0.12 |
|
|
|
|
|
|
|
smooth search; 200 - 100 ellipse,
12 degrees |
|
|
|
|
First-order trend |
|
|
|
|
100% local |
|
|
|
|
Gaussian
lag, 20; lags, 12; nugget 0.5 |
|
|
|
|
smooth
search; 200 - 100 ellipse, 12 degrees |
|
|
Because the accurate representation of continuous surfaces is often used as forcing data in various types of water resource-related models, it is beneficial to evaluate the efficacy of various interpolation methods and point-elevation input data. Three interpolation methods were evaluated using the geostatistical analyst extension to ArcMap: inverse distance weighted, ordinary kriging, and local polynomial interpolation. Both IDW and LPI are extremely easy to implement in the geostatistical analyst environment, and provide similar results. Ordinary kriging is the preferred interpolation method not only because it provides the best RMSE and mean residual results, but because it also provides a method of analyzing the prediction error of the interpolated surface. These strengths notwithstanding, kriging requires a strong background in spatial statistics for meaningful use and inference, and is computationally expensive relative to IDW and LPI. Each interpolation method was applied to varying point-density input files to determine the effect of increasing or decreasing point density. It was found that the decreases in RMSE diminished non-linearly as a function of increasing point density. Using the results from Figure 3 and Table 3, it can be inferred that the optimum density for interpolation was at approximately 40-55% of the full data density (corresponding to approximately one survey point for every 25ft2).
An alternate method for field surveys was explored and evaluated with the three interpolation techniques described in this paper. The methods included the traditional straight transect, a V pattern, and an inverse, or IV pattern. The best interpolation technique and survey method was the classic transect and ordinary kriging combination. This survey pattern and interpolation technique resulted in the lowest RMSE and, due to the extremely low point-density, did not pose any distinguishable increases in time-to-compute. Although the classic transect resulted in the lowest RMSE, the alternate methods, particularly the V method, effectively stretched the prediction error cloud possibly making the confidence interval of the overall surface tighter.
A number of modifications to the current study would have been interesting or improved the results. For example, it would have been ideal to remove the spurious elevation data that were left in along the banks as shown in Figure 6. The overall analysis of this paper was not affected by these values as each method was working from the same datasets. It would have also been interesting to evaluate the effect of removing, averaging, or otherwise mitigating overlapping point measurements. Furthermore, research has been done on interpolating surfaces in linearized systems (Merwade et al. 2005) which seems to have a significant positive impact on surface interpolation. Due to the sinuous nature of rivers, it is sometimes advantageous to remove the sinuosity, interpolate, and then return to the original coordinate system in lieu of interpolating in a Cartesian coordinate system.
The geostatistical analyst ArcMap extension provided a uniform environment in which the methods could be implemented and evaluated (particularly the cross-validation tools) and was simple to use for IDW and LPI models that required very little parameterization. For ordinary kriging, however, the geostatistical analyst seemed to obfuscate some statistical principles and, despite efforts by the author to reconcile the statistical methodology of the geostatistical analyst with the computer package R, complicated the parameterization of the kriging model. As with any complex method, graphical user interfaces sometimes simplify too much, thereby limiting the flexibility of the technique.
Hilldale, R. C. "Using bathymetric LiDAR and a 2-D hydraulic model to identify aquatic river habitat." Proceedings of the 2007 World Environmental and Water Resources Congress, Tampa, Florida.
Hilldale, R. C., and Raff, D. (2007). "Assessing the ability of airborn LiDAR bathymetry to map shallow river channel geometry." In press, Earth Surface Processes and Landforms.
Merwade, V. M., Maidment, D. R., and Goff, J. A. (2006). "Anisotropic considerations while interpolating river channel bathymetry." Journal of Hydrology, 331(3-4), 731-741.
Merwade, V. M., Maidment, D. R., and Hodges, B. R. (2005). "Geospatial representation of river channels." Journal of Hydrologic Engineering, 10(3), 243-251.
Miles, S. B., and Ho, C. L. (1999). "Applications and issues of GIS as tool for civil engineering modeling." Journal of Computing in Civil Engineering, 13(3), 144-152.
Tobler, W. R. (1970). "A computer model simulation of urban growth in the Detroit region." Economic Geography, 46(2), 234-240.