## Algorithms and Experimental Analysis

In this section, we discuss several algorithms for user location and tracking, and present an analysis of how well these perform using our experimental data.

A premise of our work is that signal strength (SS) information provides a means of inferring user location. To demonstrate that this is a reasonable premise, we show in Figure 2 shows how the SS measured at each of the 3 base stations varies as the user walks along the outer hallway of the floor in a counter-clockwise direction (Figure 1). The walk begins and terminates at the north-west corner (close to BS_{1}). There is a definite trend in the SS recorded at the three base stations as the user walks around the loop. Not surprisingly, the signal received at a base station is the strongest when the user is close to it and weakest when the user is far away. This strong trend is an indication that using SS to infer location is a promising approach.

**Figure 2 Signal strength recorded at the three base stations as the user walks around the floor.**

Our basic approach is triangulation5. Given a set of signal strength measurements at each of the base stations, we determine the location that best matches the observed signal strength data. We then “guess” that to be the location of the user. There are multiple variations of this basic idea that arise because of several choices for each of the following:

- Ways of summarizing the signal strength samples measured at the base stations
- Basis for determining the best match.
- Metric for determining the best match.

We discuss each of these in turn.

It is just coincidental that we have three base station in our testbed. Triangulation can also be done with fewer or more base stations.

First, we summarize multiple signal strength samples from a base station using the sample mean. In the case of a static user whose location and orientation are fixed (the user location problem), it is clear which signal strength measurements should be included in the sample set. In the case of a mobile user (the user tracking problem), it is less clear what the sample set should be. In the latter case, we define the sample set to be all samples that lie within a sliding time window.

Second, to determine the location and orientation that best match a given (summarized) set of SS measurements, we first need to determine what the SS (at each base station) should be used for a particular combination of user location and orientation. We consider a couple of alternative approaches. The first is the empirical method where we use the location and SS data gathered during the off-line phase (Section 3.2). The second approach is signal propagation modeling. As discussed in Section 4.2, we have developed a model that accounts for both free-space loss and loss due to obstructions in computing the SS at each base station corresponding to given a user location.

Third, we need a metric and a search methodology to compare multiple locations and pick the one that best matches the observed signal strength. We term our general technique nearest neighbor(s) in signal space (NNSS). The idea is to compute the distance (in signal space) between the observed set of SS measurements, (ss_{1},ss_{2},ss_{3}), and the recorded SS, (ss’_{1},ss’_{2,}ss’_{3}), at a fixed set of locations, and then pick the location that minimizes the distance. In our analysis, we use the Euclidean distance measure, i.e., sqrt((ss1-ss’_{1})^{2} (ss_{2}-ss’_{2})^{2} (ss_{3}-ss’_{3})^{2}). It is possible to use other distance metrics, for example, the sum of the absolute differences for each base station (the “Manhattan” distance [Cor90]) or a metric weighted by the signal strength level at each base station. We experimented briefly with these alternatives, but do not present the results here due to space limitations.

In all of our analyses, we characterize the goodness of our estimate of the user’s location using the error distance, which is the Euclidean distance between the actual (physical) location of the user and the estimated location.

**4.1 Empirical Method**

In this case, we use the empirical data obtained in the off-line phase (Section 3.2) to construct the search space for the NNSS algorithm. We present results of the various analyses that we performed. Unless otherwise indicated, we assume the user to be stationary.

**4.1.1 Basic Analysis**

For the basic analysis, we use all of the (more than 20) signal strength samples collected for each of the 70*4 = 280 combinations of user location and orientation. In the analysis, we pick one of the locations and orientations at random, and then conduct an NNSS search for the corresponding signal strength tuple in the space defined by the remaining 69 points times 4 orientations. This emulates the process of locating a (stationary) user during the realtime phase. Note that the exclusion of one of the 70 physical points from the search space would result in a somewhat worse measured accuracy than would be obtained with the RADAR system in real use. In this sense, the accuracy results presented here are somewhat conservative.

We compare the empirical method with two other methods: random selection and strongest base station selection [Hod97]. With random selection, we estimate the user’s location by picking one of the 70 points at random, regardless of the SS information. With strongest base station selection, we guess the user’s location to be the same as the location of the base station that records the strongest signal. A comparison with these simple methods enables us to evaluate how worthwhile the increased sophistication of our techniques is.

Figure 3 shows the cumulative distribution function (CDF) of the error distance for the empirical, strongest base station, and random methods. The empirical method performs significantly better than both of the other methods. Table 1 summarizes the information in the figure in terms of the 25^{th}, 50^{th} (median), and 75^{th} percentile values of the error distance for each method.

**Figure 3 CDF of the error in location estimation.**

Considering the median (50^{th} percentile), for instance, the empirical method has a resolution of under 3 meters, which is about the size of an office room in our building. In terms of linear resolution, it is 2.8 times better than the strongest base station method and 5.5 times better than the random method. In terms of spatial resolution, the improvement is even greater: 7.7 and 30.6 times, respectively. We use the percentile values for the empirical method in Table 1 as a basis for comparison in the rest of the analysis.

**Table 1 The 25th, 50th, and 75th percentile values of the error distance. The numbers in parentheses indicate the degradation of the strongest BS and random methods compared to the empirical method. **

In summary, the empirical method performs quite well. We now discuss ways of making it perform even better.

**4.1.2 Multiple Nearest Neighbors**

Unlike the basic analysis where we only considered the single nearest neighbor in signal space, we now consider k nearest neighbors, for various values of k. The intuition is that often there are multiple neighbors that are at roughly the same distance from the point of interest (in signal space). Given the inherent variability in the measured signal strength at a point, there is fundamentally no reason to pick only the closest neighbor (in signal space) and reject others that are almost as close.

A second, and equally important, reason for considering additional neighbors is that it is likely that the error vector (in physical space) corresponding to each neighbor is oriented in a different direction. So averaging the coordinates of the neighbors may yield an estimate that is closer to the user’s true location than any individual neighbor is. Figure 4 illustrates this for k=3 nearest neighbors.

Our experimental analysis of averaging over k nearest neighbors shows that for small k, averaging has some benefit though not very significant. For instance, for k=5, the 25^{th} percentile of error distance is 1.5 m (22% better than the 1.92 m in Table 1) and the 50th percentile is 2.75 m (9% better). For large k, accuracy degrades rapidly because points far removed from the true location also are included in the averaging procedure, thereby corrupting the estimate.

**Figure 4 An illustration of how averaging multiple nearest neighbors (N**_{1}, N_{2}, N_{3}) can lead to a guess (G) that is closer to the user’s true location (T) than any of the neighbors is individually.

The reason why the benefits of averaging are not very significant even for small k is that often the k nearest neighbors in signal space are not k physically distinct points. In many instances, multiple nearest neighbors in signal space correspond to different orientations at the same point in physical space. So averaging in physical space does not improve the location estimate by very much.

**4.1.3 Max Signal Strength Across Orientations**

Since the dependence of signal strength on orientation creates a challenge for location estimation, we analyze how well the empirical method would perform if orientationwere not an issue. For each user location in the off-line data set, we compute the maximum SS at each base station across the four possible orientations at that location6. Note that the maximum for each base station may correspond to a different orientation of the user. The goal is to emulate the case where for each base station and user location, we first compute the mean SS for each of the four orientations at that location, and then pick the maximum among the four means. the signal generated by the mobile host is not obstructed by the user’s body. While this may not be realistic given the antenna design and positioning for existing wireless LANs, it may be possible to approximate this “ideal case” with new antenna designs (e.g., omnidirectional wearable antenna) We repeat the analysis of the previous sections with the smaller “maximum signal strength” data set of 70 data points (instead of 70*4=280 data points in the original data set). In Figure 5, we plot the 25th and the 50th percentile values of the error distance with averaging over neighbor sets of various sizes. the signal generated by the mobile host is not obstructed by the user’s body. While this may not be realistic given the antenna design and positioning for existing wireless LANs, it may be possible to approximate this “ideal case” with new antenna designs (e.g., omnidirectional wearable antenna)

We repeat the analysis of the previous sections with the smaller “maximum signal strength” data set of 70 data points (instead of 70*4=280 data points in the original data set). In Figure 5, we plot the 25^{th} and the 50^{th} percentile values of the error distance with averaging over neighbor sets of various sizes.

**Figure 5 The error distance for the empirical method with averaging on the data set containing the max signal strength measurement for each location.**

We make a couple of observations. First, just as expected, the use of the maximum SS data set improves the accuracy of location estimation slightly even in the absence of averaging (k=1). The 25^{th} percentile value of the error distance is 1.8 m and the 50^{th} percentile 2.67 m, 6% and 9% better, respectively, compared to Table 1. Second, averaging over 2-4 nearest neighbors improves accuracy significantly; the 25^{th} percentile is about 1 m (48% better) and the 50th percentile is 2.13 m (28% better). Averaging is more effective here than in Section 4.1.2 because the set of k nearest neighbors in signal space necessarily correspond to k physically distinct locations

**4.1.4 Impact of the Number of Data Points**

We now investigate how the accuracy of location estimation would be impacted if we had data from fewer than the 70 distinct physical locations considered thus far.

For each value of n, the number of physical locations (ranging between 2 and 70), we conducted 20 runs of our analysis program. In each run, we picked n points at random from the entire data set collected during the off-line phase and used this subset to construct the search space for the NNSS algorithm. We collated the error distance data from all the runs corresponding to the same value of n (Figure 6).

For small n (5 or less), the error distance is a factor of 2 to 4 worse than when the entire empirical set containing 70 physical points is used. But the error distance diminishes rapidly as n increases. For n=20, the median error distance is less than 33% worse and for n=40, it is less than 10% worse. The diminishing returns as n becomes large is due to the inherent variability in the measured SS. This translates into inaccuracy in the estimation of physical location. So there is little benefit in obtaining empirical data at physical points spaced closer than a threshold.

**Figure 6 The error distance versus the size of the empirical data set (on a log scale).**

In summary, for our floor, the empirical method would perform almost as well with a data set of 40 physical points as with a set of 70 points. In practice, we could make do with even fewer points by picking physical locations that are distributed uniformly over the area of the floor rather than at random.

**4.1.5 Impact of the Number of Samples**

In the analysis presented so far, we have worked with the mean of all of the samples recorded during the off-line phase for each combination of location and orientation. While it may be reasonable to construct the empirical data set with a large number of samples (since it is a one-time task), there may be constraints on the number of samples that can be obtained in real-time to determine a user’s location. So we investigate the impact of a limited number of realtime samples (while retaining the entire off-line data set for the NNSS search space) on the accuracy of location estimation. Our analysis shows that only a small number of real-time samples are needed to approach the accuracy obtained using all of the samples (Table 1). With just 1 realtime sample, the median error distance is about 30% worse than when all samples were considered. With 2 samples, it is about 11% worse and with 3 samples it is under 4% worse.

**4.1.6 Impact of User Orientation**

As we have already discussed, the user’s orientation has a significant impact on the SS measured at the base stations. In Section 4.1.3, we did a best-case analysis using the maximum SS across all four orientations. We now consider, in some sense, the worst case where the off-line data set only has points corresponding to a particular orientation (say north) while the real-time samples correspond to the opposite orientation (i.e., south). We compute the error distance for all four combinations of opposing directions: north-south, south-north, east-west, and west-east.

We observe a fairly significant degradation in the accuracy of location estimation. For instance, for the northsouth case, the 25^{th} percentile of the error distance is 2.95 meters (54% worse than in Table 1) while the 50^{th} percentile (median) is 4.90 meters (67% worse). This degradation underscores the importance of obtaining empirical data for multiple orientations to construct the NNSS search space. However, even in this worst case, the empirical method outperforms the strongest base station and random methods by a factor of 2 to 4

**4.1.7 Tracking a Mobile User**

In this sub-section, we analyze the problem of tracking a mobile user rather than locating a stationary user, as we have done so far. For this analysis, we collected a new SS data set corresponding to random walks by the user along the hallways of our floor. We collected 4 signal strength samples per second at each of the base stations. Assuming that the user walked at a uniform pace, we are able to determine the true location of the user at each time instant.

We reduce the problem of tracking the mobile user to a sequence of location determination problems for a (nearly stationary) user. We use a sliding window of 10 samples to compute the mean signal strength on a continuous basis. This information is then used with the basic method (Section 4.1.1) to estimate the user’s location on a continuous basis.

The error distance for tracking the mobile user is only slightly worse than that for locating a stationary user. The median error distance is 3.5 meters, about 19% worse than that for a stationary user.

**4.1.8 Summary of Empirical Method**

The empirical method is able to estimate user location with a high degree of accuracy. The median error distance is 2 to 3 meters, about the size of a typical office room. For our experimental environment, much of the accuracy can be achieved with an empirical data set of about 40 physical points and about 3 real-time signal strength samples (at each base station). It is important, however, that the empirical data set contain data corresponding to multiple user orientations at each location.

The main limitation of the empirical method is that significant effort is needed to construct the SS data set for each physical environment of interest (each floor, each building, etc.). Furthermore, the data collection process may need to be repeated in certain circumstances, e.g., when a base station is relocated.

We now discuss a method based on signal propagation modeling, which avoids these limitations.

**4.2 Radio Propagation Model**

Radio propagation modeling provides an alternative to the empirical method for constructing the search space for the NNSS algorithm.

**4.2.1 Motivation **

The primary motivation for the radio propagation model is to reduce RADAR’s dependence on empirical data. Using a mathematical model of indoor signal propagation, we generate a set of theoretically-computed signal strength data akin to the empirical data set discussed in Section 3.2. The data points correspond to locations spaced uniformly on the floor. The NNSS algorithm can then estimate the location of the mobile user by matching the signal strength measured in real-time to the theoretically-computed signal strengths at these locations. It is clear that the performance of this approach is directly impacted by the "goodness” of the propagation model. In the following subsections, we develop the model and discuss the performance of location determination based on the model.

**4.2.2 Determination of the model**

For a radio channel, signal propagation in an indoor environment is dominated by reflections, diffraction, and scattering of radio waves caused by structures within the building. The transmitted signal generally reaches the receiver via multiple paths (termed the multipath phenomenon). Multipath causes fluctuations in the received signal envelope and phase, and the signal components arriving from indirect and direct paths combine to produce a distorted version of the transmitted signal. Since multipath within buildings is strongly influenced by the layout of the building, the construction material used, and the number and type of objects in the building, characterizing the radio channel in such an environment is challenging.

We considered three different models before settling on one. The first model was the well-accepted Rayleigh fading model [Has93], which describes small-scale rapid amplitude fluctuation in the absence of a strong received component. The Rayleigh distribution is widely used to describe multipath fading because of its elegant theoretical explanation and the occasional empirical justification. However, in deriving this distribution, a critical assumption made is that all signals reaching the receivers have equal strength. In general, this is unrealistic. Our empirical data shows that for a number of sample points (along the hallways), there exists a dominant line-of-sight (LoS) component that is not accounted for by this distribution. Hence, we did not use this distribution.

The second model we considered was the Rician distribution model [Ric44]. The Rician distribution occurs when a strong path exists in addition to the low level scattered path. This strong component may be the LoS path or a path that encounters much less attenuation than others. The Rayleigh distribution is a special case of the Rician distribution; when the strong path is eliminated, the amplitude distribution becomes Rayleigh. While the model is intuitively appealing, it is very difficult to determine the model parameters (i.e., the local mean of the scattered power and the power of the dominant component) precisely as this requires physically isolating the direct wave from the scattered components. To keep the system simple and easy to deploy, we decided against using this distribution to model the radio channel.

We found a good compromise between simplicity and accuracy in the Floor Attenuation Factor propagation model (FAF) suggested by [Sei92]. We like this model because it provides flexibility in accommodating different building layouts while taking into account large-scale path loss. We adapted the original model proposed by Seidel and Rappaport, which included an attenuation factor for building floors, to disregard the effects of the floors and instead consider the effects of obstacles (walls) between the transmitter and the receiver. The Wall Attenuation Factor (WAF) model is described by

where n indicates the rate at which the path loss increases with distance, P(_{do}) is the signal power at some reference distance do and d is the transmitter-receiver (T-R) separation distance. C is the maximum number of obstructions (walls) up to which the attenuation factor makes a difference, nW is the number of obstructions (walls) between the transmitter and the receiver, and WAF is the wall attenuation factor. In general the values of n and WAF depend on the building layout and construction material, and are derived empirically. The value of P(_{do}) can either be derived empirically or obtained from the wireless network hardware specifications.

**Figure 7 SS as a function of T-R separation derived from the empirical data collected in Section 3.2**

Figure 7 illustrates how the signal strength varies with distance between the transmitter and the receiver. The wide difference in signal strengths between points at similar distances is explained as follows: the layout of the rooms in the building, the placement of base stations, and the location of the mobile user all have an effect on the received signal. Signals transmitted from two locations at the same distance from the receiver (base station) may be attenuated by different amounts due to the differences in the number and types of obstructions they encounter. For instance, in Figure 7, we observe that the strength of the signal from two locations approximately 36 meters from the receiver (base station) were approximately 10 dBm apart. This is because there were several walls between one of the locations and the base station, while the other location had line-of-sight to the base station.

Previous work in indoor radio propagation modeling has included extensive characterization of signal loss for various materials and at different frequencies [Rap96]. However, using this information in a practical setting is difficult because the obstructing materials vary considerably in their physical and electrical characteristics. For example, water causes signal attenuation and the human body is made up of water, so the size of a human body and its orientation can result in different amounts of signal loss. It is virtually impossible to characterize such loss precisely since the number and sizes of humans in the building at any time is generally a finite but random number. Thus, despite their complexity, these extensive models fall short in terms of accuracy. So we decided to go with the much simpler WAF propagation model.

We determined the attenuation caused by a wall empirically (described next) and used this in conjunction with the number of intervening walls (determined using the Cohen-Sutherland algorithm mentioned in Section 3.3.2) to compensate for the attenuation caused by the walls.

We conducted the following experiment to determine the Wall Attenuation Factor (WAF): we measured the signal strength at the receiver when the receiver and the transmitter had line-of-sight. We then measured the signal strength with varying but known number of walls between the receiver and the transmitter. We computed the average of the difference between these signal strength values to determine the WAF. We observed that the amount of additional attenuation dropped off as the number of walls separating the transmitter and the receiver increased. This observation is consistent with [Sei92] where the attenuation between different floors was considered and shown to flatten out as the number of floors between the transmitter and the receiver increased. In general, with a large T-R separation and a large number of intervening walls, free-space path loss dominates the loss due to obstructions. Based on our measurements, we chose WAF to be 3.1 dBm and C to be 4 (where C represents number of walls that are factored into the model). Figure 8 shows the result after the measured signal strength has been compensated for signal loss due to the intervening walls between the transmitter and the receiver. We observe that the resulting plot shows a trend similar to the free-space loss trend. This demonstrates that the WAF propagation model compensates effectively for attenuation due to obstructions.

**Figure 8 Effect of applying correction for intervening walls between the base station and the mobile user.**

After compensating for the effect of walls and creating the “corrected” data for all three base stations, we proceeded to determine the two other parameters, n and Pdo, of our model. We reduced the propagation model to a form where it exhibits a linear relationship between the theoretically predicted signal strength and logarithm of the distance between the transmitter and the receiver, and then applied simple linear regression to determine the parameters of the model [Jai91].

Table 2 contains the numerical values of the model parameters for the three base stations considered separately and when taken together. We note that the values for the path loss exponent (n) and the reference signal strength (P_{do}) for all three base stations are similar despite their different physical locations and surroundings. This result is encouraging since it indicates that the parameter values are not tied to the specific location of the base stations. The values of P_{do} are higher than those published by the manufacturer [Roa96] (for d_{0} = 1 meter) because our WAF model does not account for multipath propagation. The values of the path loss exponent are smaller than those reported in previous work on indoor radio propagation modeling [Rap96]. However, they are consistent with our expectations since we compensate the measured signal strength for attenuation due to obstructions, and since we do not consider multipath (which can boost the signal strength at a given location). R^{2} represents the coefficient of determination, which is a useful measure for indicating the goodness of regression [Jai91]. The high values of R^{2} (on a scale of 0 to 1) suggest that there is a good match between the estimated and the measured values of the signal strength. Another value of interest shown in the table is the mean squared error (MSE). These numbers reinforce the observation that the WAF propagation model fits the measured data well.

**Table 2 Parameter estimates using linear regression**

The final column in Table 2 shows the values for P_{do} and n when the data from all the transmitter-receiver pairs (i.e., all three base stations) was combined. The motivation for this was to determine a value of P_{do} and n that could be used for all base stations without overly affecting the result. The advantage of using a common value is that it avoids the need for individual measurements of each base station as they are installed in the network, thus greatly reducing the cost of system setup. We can then use these values to estimate the signal strength at various points within the building.

Figure 9 illustrates how the predicted values of the signal strength generated with the propagation model (after compensating for wall attenuation) compares with the actual measurements. We observe a good match between the two. While this plot is for one of the three base stations, plots for the other two base stations exhibit a similar match.

**4.2.3 Results using the Propagation Model**

To determine the performance of location estimation with the signal propagation modeling method, we used the model to compute the signal strength at a grid of locations on the floor. We then used this data set as the search space for the NNSS algorithm.

Considering the median (50^{th} percentile), the propagation method provides a resolution of about 4.3 m, compared to a resolution of 2.94 m for the empirical method and 8.16 m for the strongest base station method (Table 1). For the 25^{th} percentile the propagation method provides a resolution of 1.86 m compared to 1.92 m for the empirical method and 4.94 m for the strongest base station method.

**Figure 9 Predicted versus measured signal strength.**

While the propagation method is not as accurate as the empirical method, it is significantly better than the strongest BS and random methods. Thus, even without extensive empirical measurements, RADAR based on the propagation model alone would significantly outperform the strongest base station method proposed in [Hod97].

**4.2.4 Summary of Radio Propagation Method**

The WAF propagation model provides a cost effective means for user location and tracking in an indoor RF wireless network. The model is cost effective in the sense that it does not require detailed empirical measurements to generate a signal strength map and consequently has a low set up cost. A significant result from Section 4.2.2 is that the parameters for the wall attenuation propagation model are similar across base stations despite the latter being in different locations. This suggests that the entire system can be relocated to a different part of the building, but the same parameter values can be used to model propagation and thereby determine a user’s location.