## Smart Phone Computing

Relevant Course: Smart Phone Computing

Relevant Department: Computer Science and Engineering

Relevant Semester: Nil

Pre- requisite : Nil

Course Description and Outline

• Basics of Android programming
• Location Tracking with GPS
• Location Tracking without GPS
• Cell Tower Localization
• Project Idea Discussion

Schedule for Lecture Delivery

Session 1 : 8-Oct-2015 (10-12 pm)

Session 2 : 12-Oct-2015 (10-12 pm)

Session 3 : 13-Oct-2015 (10-12 pm)

## Basics of the GPS Technique - Observation Equations

1. INTRODUCTION

The purpose of this paper is to introduce the principles of GPS theory, and to provide a background for more advanced material. With that in mind, some of the theoretical treatment has been simplified to provide a starting point for a mathematically literate user of GPS who wishes to understand how GPS works, and to get a basic grasp of GPS theory and terminology. It is therefore not intended to serve as a reference for experienced researchers; however, my hope is that it might also prove interesting to the more advanced reader, who might appreciate some “easy reading” of a familiar story in a relatively short text (and no doubt, from a slightly different angle).

## GPS DESCRIPTION

In this section we introduce the basic idea behind GPS, and provide some facts and statistics to describe various aspects of the Global Positionining System.

2.1 THE BASIC IDEA

GPS positioning is based on trilateration, which is the method of determining position by measuring distances to points at known coordinates. At a minimum, trilateration requires 3 ranges to 3 known points. GPS point positioning, on the other hand, requires 4 “pseudoranges” to 4 satellites.

This raises two questions: (a) “What are pseudoranges?”, and (b) “How do we know the position of the satellites?” Without getting into too much detail at this point, we address the second question first.

2.1.1 How do we know position of satellites?

A signal is transmitted from each satellite in the direction of the Earth. This signal is encoded with the “Navigation Message,” which can be read by the user’s GPS receivers. The Navigation Message includes orbit parameters (often called the “broadcast ephemeris”), from which the receiver can compute satellite coordinates (X,Y,Z). These are Cartesian coordinates in a geocentric system, known as WGS-84, which has its origin at the Earth centre of mass, Z axis pointing towards the North Pole, X pointing towards the Prime Meridian (which crosses Greenwich), and Y at right angles to X and Z to form a right-handed orthogonal coordinate system. The algorithm which transforms the orbit parameters into WGS-84 satellite coordinates at any specified time is called the “Ephemeris Algorithm,” which is defined in GPS textbooks [e.g., Leick, 1991]. We discuss the Navigation Message in more detail later on. For now, we move on to “pseudoranges.”

2.1.2 What are pseudoranges?

Time that the signal is transmitted from the satellite is encoded on the signal, using the time according to an atomic clock onboard the satellite. Time of signal reception is recorded by receiver using an atomic clock. A receiver measures difference in these times:

pseudorange = (time difference) ´ (speed of light)

Note that pseudorange is almost like range, except that it includes clock errors because the receiver clocks are far from perfect. How do we correct for clock errors?

2.1.3 How do we correct for clock errors?

Satellite clock error is given in Navigation Message, in the form of a polynomial. The unknown receiver clock error can be estimated by the user along with unknown station coordinates. There are 4 unknowns; hence we need a minimum of 4 pseudorange measurements.

2.2 THE GPS SEGMENTS

There are four GPS segments:

• the Space Segment, which includes the constellation of GPS satellites, which transmit the signals to the user;·
• the Control Segment, which is responsible for the monitoring and operation of the Space Segment,
• the User Segment, which includes user hardware and processing software for positioning, navigation, and timing applications;
• the Ground Segment, which includes civilian tracking networks that provide the User Segment with reference control, precise ephemerides, and real time services (DGPS) which mitigate the effects of “selective availability” (a topic to be discussed later).

Before getting into the details of the GPS signal, observation models, and position computations, we first provide more information on the Space Segment and the Control Segment.

2.2.1 Orbit Design

The satellite constellation is designed to have at least 4 satellites in view anywhere, anytime,to a user on the ground. For this purpose, there are nominally 24 GPS satellites distributed in 6 orbital planes. So that we may discuss the orbit design and the implications of that design, we must digress for a short while to explain the geometry of the GPS constellation.

According to Kepler’s laws of orbital motion, each orbit takes the approximate shape of an ellipse, with the Earth’s centre of mass at the focus of the ellipse. For a GPS orbit, the eccentricity of the ellipse is so small (0.02) that it is almost circular. The semi-major axis (largest radius) of the ellipse is approximately 26,600 km, or approximately 4 Earth radii.

The 6 orbital planes rise over the equator at an inclination angle of 55o to the equator. Thepoint at which they rise from the Southern to Northern Hemisphere across the equator is called the “Right Ascension of the ascending node”. Since the orbital planes are evenly distributed, the angle between the six ascending nodes is 60o

Each orbital plane nominally contains 4 satellites, which are generally not spaced evenly around the ellipse. Therefore, the angle of the satellite within its own orbital plane, the “true anomaly”, is only approximately spaced by 90o. The true anomaly is measured from the point of closest approach to the Earth (the perigee). (We note here that there are other types of “anomaly” in GPS terminology, which are angles that are useful for calculating the satellite coordinates within its orbital plane). Note that instead of specifying the satellite’s anomaly at every relevant time, we could equivalently specify the time that the satellite had passed perigee, and then compute the satellites future position based on the known laws of motion of the satellite around an ellipse.

Finally, the argument of perigee is the angle between the equator and perigee. Since the orbit is nearly circular, this orbital parameter is not well defined, and alternative parameterisation schemes are often used.

Taken together (the eccentricity, semi-major axis, inclination, Right Ascension of the ascending node, the time of perigee passing, and the argument of perigee), these six parameters define the satellite orbit. These parameters are known as Keplerian elements. Given the Keplerian elements and the current time, it is possible to calculate the coordinates of the satellite.

GPS satellites do not move in perfect ellipses, so additional parameters are necessary. Nevertheless, GPS does use Kepler’s laws to its advantage, and the orbits are described in parameters which are Keplerian in appearance. Additional parameters must be added to account for non-Keplerian behaviour. Even this set of parameters has to be updated by the Control Segment every hour for them to remain sufficiently valid.

2.2.2 Orbit design consequences

Several consequences of the orbit design can be deduced from the above orbital parameters, and Kepler’s laws of motion. First of all, the satellite speed can be easily calculated to be approximately 4 km/s relative to Earth’s centre. All the GPS satellites orbits are prograde, which means the satellites move in the direction of Earth’s rotation. Therefore, the relative motion between the satellite and a user on the ground must be less than 4 km/s. Typical values around 1 km/s can be expected for the relative speed along the line of sight (range rate).

The second consequence is the phenomena of “repeating ground tracks” every day. It is straightforward to calculate the time it takes for the satellite to complete one orbital revolution. The orbital period is approximately T = 11 hr 58 min. Therefore a GPS satellite completes 2 revolutions in 23 hr 56 min. This is intentional, since it equals the sidereal day, which is the time it takes for the Earth to rotate 360o. (Note that the solar day of 24 hr is not 360o, because during the day, the position of the Sun in the sky has changed by 1/365.25 of a day, or 4 min, due to the Earth’s orbit around the Sun).

Therefore, every day (minus 4 minutes), the satellite appears over the same geographical location on the Earth’s surface. The “ground track” is the locus of points on the Earth’s surface that is traced out by a line connecting the satellite to the centre of the Earth. The  ground track is said to repeat. From the user’s point of view, the same satellite appears in the same direction in the sky every day minus 4 minutes. Likewise, the “sky tracks” repeat. In general, we can say that the entire satellite geometry repeats every sidereal day (from the point of view of a ground user).

As a corollary, any errors correlated with satellite geometry will repeat from one day to the next. An example of an error tied to satellite geometry is “multipath,” which is due to the antenna also sensing signals from the satellite which reflect and refract from nearby objects. In fact, it can be verified that, because of multipath, observation residuals do have a pattern that repeats every sidereal day. As a consequence, such errors will not significantly affect the precision, or repeatability, of coordinates estimated each day. However, the accuracy can be significantly worse than the apparent precision for this reason.

Another consequence of this is that the same subset of the 24 satellites will be observed every day by someone at a fixed geographical location. Generally, not all 24 satellites will be seen by a user at a fixed location. This is one reason why there needs to be a global distribution of receivers around the globe to be sure that every satellite is tracked sufficiently well.

We now turn our attention to the consequences of the inclination angle of 55o. Note that a satellite with an inclination angle of 90o would orbit directly over the poles. Any other inclination angle would result in the satellite never passing over the poles. From the user’s point of view, the satellite’s sky track would never cross over the position of the celestial pole in the sky. In fact, there would be a “hole” in the sky around the celestial pole where the satellite could never pass. For a satellite constellation with an inclination angle of 55o, there would therefore be a circle of radius at least 35o around the celestial pole, through which the sky tracks would never cross. Another way of looking at this, is that a satellite can never rise more than 55o elevation above the celestial equator.

This has a big effect on the satellite geometry as viewed from different latitudes. An observer at the pole would never see a GPS satellite rise above 55o elevation. Most of the satellites would hover close to the horizon. Therefore vertical positioning is slightly degraded near the poles. An observer at the equator would see some of the satellites passing overhead, but would tend to deviate from away from points on the horizon directly to the north and south. Due to a combination of Earth rotation, and the fact that the GPS satellites are moving faster than the Earth rotates, the satellites actually appear to move approximately north-south or south-north to an oberver at the equator, with very little east-west motion. The north component of relative positions are therefore better determined than the east component the closer the observer is to the equator. An observer at mid-latitudes in the Northern Hemisphere  would see satellites anywhere in the sky to the south, but there would be a large void towards the north. This has consequences for site selection, where a good view is desirable to the south, and the view to the north is less critical. For example, one might want to select a site in the Northern Hemisphere which is on a south-facing slope (and visa versa for an observer in the Southern Hemisphere).

2.2.3 Satellite Hardware

There are nominally 24 GPS satellites, but this number can vary within a few satellites at any given time, due to old satellites being decommissioned, and new satellites being launched to replace them. All the prototype satellites, known as Block I, have been decommissioned. Between 1989 and 1994, 24 Block II (1989-1994) were placed in orbit. From 1995 onwards, these have started to be replaced by a new design known as Block IIR. The nominal specifications of the GPS satellites are as follows:

•  Life goal: 7.5 years
•  Mass: ~1 tonne (Block IIR: ~2 tonnes)
•  Size: 5 metres
•  Power: solar panels 7.5 m2 Ni-Cd batteries
•  Atomic clocks: 2 rubidium and 2 cesium

The orientation of the satellites is always changing, such that the solar panels face the sun, and the antennas face the centre of the Earth. Signals are transmitted and received by the satellite using microwaves. Signals are transmitted to the User Segment at frequencies L1 = 1575.42 MHz, and L2 = 1227.60 MHz. We discuss the signals in further detail later on. Signals are received from the Control Segment at frequency 1783.74 Mhz. The flow of information is a follows: the satellites transmit L1 and L2 signals to the user, which are encoded with information on their clock times and their positions. The Control Segment then tracks these signals using receivers at special monitoring stations. This information is used to improve the satellite positions and predict where the satellites will be in the near future. This orbit information is then uplinked at 1783.74 Mhz to the GPS satellites, which in turn transmit this new information down to the users, and so on. The orbit information on board the satellite is updated every hour.

2.2.4 The Control Segment

The Control Segment, run by the US Air Force, is responsible for operating GPS. The main Control Centre is at Falcon Air Force Base, Colorado Springs, USA. Several ground stations monitor the satellites L1 and L2 signals, and assess the “health” of the satellites. As outlined previously, the Control Segment then uses these signals to estimate and predict the satellite orbits and clock errors, and this information is uploaded to the satellites. In addition, the Control Segment can control the satellites; for example, the satellites can be maneuvered into a different orbit when necessary. This might be done to optimise satellite geometry when a new satellite is launched, or when an old satellite fails. It is also done to keep the satellites to within a certain tolerance of their nominal orbital parameters (e.g., the semi-major axis may need adjustment from time to time). As another example, the Control Segment might switch between the several on-board clocks available, should the current clock appear to be malfunctioning.

2.3 THE GPS SIGNALS

We now briefly summarise the characteristics of the GPS signals, the types of information that is digitally encoded on the signals, and how the U.S. Department of Defense implements denial of accuracy to civilian users. Further details on how the codes are constructed will be presented in Section 3.

2.3.1 Signal Description

The signals from a GPS satellite are fundamentally driven by an atomic clocks (usually cesium, which has the best long-term stability). The fundamental frequency is 10.23 Mhz. Two carrier signals, which can be thought of as sine waves, are created from this signal by multiplying the frequency by 154 for the L1 channel (frequency = 1575.42 Mhz; wavelength = 19.0 cm), and 120 for the L2 channel (frequency = 1227.60 Mhz; wavelength = 24.4 cm). The reason for the second signal is for self-calibration of the delay of the signal in the Earth’s ionosphere.

Information is encoded in the form of binary bits on the carrier signals by a process known as phase modulation. (This is to be compared with signals from radio stations, which are typically encoded using either frequency modulation, FM, or amplitude modulation, AM). The binary digits 0 and 1 are actually represented by multiplying the electrical signals by either 1 or -1, which is equivalent to leaving the signal unchanged, or flipping the phase of the signal by 180 o. We come back later to the meaning of phase and the generation of the binary code.
There are three types of code on the carrier signals:

•  The C/A code
•  The P code
•  The Navigation Message

The C/A (“course acquisition”) code can be found on the L1 channel. As will be described later, this is a code sequence which repeats every 1 ms. It is a pseudo-random code, which appears to be random, but is in fact generated by a known algorithm. The carrier can transmit the C/A code at 1.023 Mbps (million bits per second). The “chip length”, or physical distance between binary transitions (between digits 1 and -1), is 293 metres. The basic information that the C/A code contains is the time according to the satellite clock when the signal was transmitted (with an ambiguity of 1 ms, which is easily resolved, since this corresponds to 293 km). Each satellite has a different C/A code, so that they can be uniquely identified.

The P (“precise”) code is identical on both the L1 and L2 channel. Whereas C/A is a courser code appropriate for initially locking onto the signal, the P code is better for more precise positioning. The P code repeats every 267 days. In practice, this code is divided into 7 day segments; each weekly segment is designated a “PRN” number, and is designated to one of the GPS satellites. The carrier can transmit the P code at 10.23 Mbps, with a chip length of 29.3 metres. Again, the basic information is the satellite clock time or transmission, which is identical to the C/A information, except that it has ten times the resolution. Unlike the C/A code, the P code can be encrypted by a process known as “anti-spoofing” , or “A/S” (see below).

The Navigation Message can be found on the L1 channel, being transmitted at a very slow rate of 50 bps. It is a 1500 bit sequence, and therefore takes 30 seconds to transmit. The Navigation Message includes information on the Broadcast Ephemeris (satellite orbital parameters), satellite clock corrections, almanac data (a crude ephemeris for all satellites), ionosphere information, and satellite health status.

2.3.2 Denial of Accuracy

The U.S. Department of Defense implements two types of denial of accuracy to civilian users: Selective Availability (S/A), and Anti-Spoofing (A/S). S/A can be thought of as intentional errors imposed on the GPS signal. A/S can be thought of as encryption of the P code.

There are two types of S/A: epsilon, and dither. Under conditions of S/A, the user should be able to count on the position error not being any worse than 100 metres. Most of the time, the induced position errors do not exceed 50 metres.

Epsilon is implemented by including errors in the satellite orbit encoded in the Navigation Message. Apparently, this is an option not used, according to daily comparisons made between the real-time broadcast orbits, and precise orbits generated after the fact, by the International GPS Service for Geodynamics (IGS). For precise geodetic work, precise orbits are recommended in any case, and therefore epsilon would have minimal impact on precise users. It would, however, directly impact single receiver, low-precision users. Even then, the effects can be mitigated to some extent by using technology known as “differential GPS”, where errors in the GPS signal are computed at a reference station at known coordinates, and are transmitted to the user who has appropriate radio receiving equipment.

Dither is intentional rapid variation in the satellite clock frequency (10.23 MHz). Dither, therefore, looks exactly like a satellite clock error, and therefore maps directly into pseudorange errors. Dither is switched on at the moment (1997), but recent U.S. policy statements indicate that it may be phased out within the next decade. As is the case for epsilon, dither can be mitigated using differential GPS. The high precision user is minimally effected by S/A, since relative positioning techniques effectively eliminate satellite clock error (as we shall see later).

Anti-Spoofing (A/S) is encryption of the P-code. The main purpose of A/S is prevent “the enemy” from imitating a GPS signal, and therefore it is unlikely to be switched off in the foreseeable future. A/S does not pose a signficant problem to the precise user,since precise GPS techniques rely on measuring the phase of the carrier signal itself, rather than the pseudoranges derived from the P code. However, the pseudoranges are very useful for various algorithms, particularly in the rapid position fixes required by moving vehicles and kinematic surveys. Modern geodetic receivers can, in any case, form 2 precise pseudorange observables on the L1 and L2 channels, even if A/S is switched on. (We briefly touch on how this is done in the next section). As a consequence of not having full access to the P code, the phase noise on measuring the L2 carrier phase can be increased from the level of 1 mm to the level of 1 cm for some types of receivers. This has negligible impact on long sessions for static positioning, but can have noticeable effect on short sessions, or on kinematic positioning. Larger degradation in the signal can be expected at low elevations (up to 2 cm) where signal strength is at a minimum

## THE PSEUDORANGE OBSERVABLE

In this section, we go deeper into the description of the pseudorange observable, an give some details on how the codes are generated. We develop a model of the pseudorange observation, and then use this model to derive a least-squares estimator for positioning. We discuss formal errors in position, and the notion of “Dilution of Precision”, which can be used to assess the effect of satellite geometry on positioning precision.

3.1 CODE GENERATION

It helps to understand the pseudorange measurement if we first take a look at the actual generation of the codes. The carrier signal is multiplied by a series of either 1 or -1, which are seperated by the chip length (293 m for C/A code, and 29.3 m for P code). This series of  1 and -1 multipliers can be interpreted as a stream of binary digits (0 and 1).

How is this stream of binary digits decided? They are determined by an algorithm, known as a linear feedback register. To understand a linear feedback register, we must first introduce the XOR binary function.

3.1.1 XOR: The “Exclusive OR” Binary Function

A binary function takes two input binary digits, and outputs one binary digit (0 or 1). More familiar binary functions might be the “AND” and “OR” functions. For example, the AND function gives a value of 1 if the two input digits are identical, that is (0,0), or (1,1). If the input digits are different, the AND function gives a value of 0. The OR function gives a value of 1 if either of the two input digits equals 1, that is (0,1), (1,0), or (1,1).

The XOR function gives a value of 1 if the two inputs are different, that is (1,0) or (0,1). If the two inputs are the same, (0,0) or (0,1), then the value is 0.

3.1.2 Linear Feedback Registers

Linear feedback registers are used to generate a pseudorandom number sequence. The sequence is pseudorandom, since the sequence repeats after a certain number of digits (which, as we shall see, depends on the size of the register). However, the statistical properties of the sequence are very good, in that the sequence appears to be white noise. We return to these properties later, since they are important for understanding the measurement process. For now, we look at how the register works

Table 2 illustrates a simple example: the “3 stage linear feedback register.” The “state” of the register is defined by three binary numbers (A, B, C). The state changes after a specific time interval. To start the whole process, the intial state of a feedback register is always filled with 1; that is, for the 3 stage register, the initial state is (1, 1, 1). The digits in this state are now shifted to the right, giving (blank, 1, 1). The digit (1) that is pushed off the right side is the output from the register. The blank is replaced by taking the XOR of the other two digits (1,1). The value, in this case, equals 0. The new state is therefore (0, 1, 1). This process is then repeated, so that the new output is (1), and the next state is (1, 0, 1). The next output is (1) and the next state is (1, 1, 0). The next output is (0), and the next state is (0, 1, 1), and so  on.

In the above example, the outputs can be written (1, 1, 1, 0, ....). This stream of digits is known as the “linear feedback register sequence.” This sequence will start to repeat after a while. It turns out that during a complete cycle, the feedback register will produce every possible combination of binary numbers, except for (0, 0, 0). We can therefore easily calculate the length of the sequence before it starts to repeat again. For a 3 stage register, there are 8 possible combinations of binary digits. This means that the sequence will repeat after 7 cycles. The sequence length is therefore 7 bits. More generally, the sequence length is:

where N is the size of the register (number of digits in the state). For example, a 4 state linear feedback register will have a sequence length of 15 bits.

3.1.3 C/A Code

The C/A code is based on the 10 stage linear feedback register sequence, for which the sequence length is L(10) = 210-1 = 1023 bits. The C/A code really has a repeating sequence of 1023 bits; however the design is slightly more complicated than presented above. The C/A code is actually a “Gold code”, which is derived by taking the XOR of the output from 2 linear feedback registers. Unique C/A codes can be generated for each satellite by selecting different pairs of cells from each register to define the output.

In summary, the C/A code is a unique Gold code on each satellite, which is a pseudorandom sequence of bits with a repeating sequence length of 1023. C/A bit transitions occur at 1.023 Mhz. Note that the fundamental frequency in the satellite is 10.23 Mhz, so this represents one transition every 10 cycles. At this rate of bit transitions, the full sequence of 1023 bits is transmitted in 1 ms. Therefore, the sequence repeats 1000 times per second. The chip length (distance between bit transitions) is 293 m. Therefore, the sequence repeats every 300 km.

3.1.4 P Code

The P code is also generated from a combination of two different registers, in such a way that it repeats every 266.4 days. Each 7 day section is assigned a “PRN code.” Satellites are often identified by their PRN number; however, the user should beware that any given satellite can have its PRN code changed. Therefore, PRN codes should not be used in place of Satellite Vehicle Numbers (SVN) when talking about particular satellites. (For example, if someone writes software which identifies satellites using PRN numbers, there might be a problem in orbit modelling, for example, PRN 2 is assigned to a Block II satellite now, but to a Block IIR satellite next year). There are 38 possible PRN codes; given that there are 24 nominal satellites, some PRN codes are left unused. The PRN sequence is reset at Saturday midnight, defining the start of “GPS week.”

3.1.5 GPS Signal transmission and reception

Let us now summarise how the GPS signal is transmitted from space, and then received on the ground. The GPS signal starts in the satellite as a voltage which oscillates at the fundamental clock frequency of 10.23 Mhz. (If selective availability is on, this signal is then “dithered” so that the frequency varies unpredictably). This signal is then separately multiplied in frequency by the integers 154 and 120, to create the L1 and L2 carrier signals. The signals are then multiplied by 1 and -1 according the algorithms described above to generate the C/A code (on L1) and the P code (on both L1 and L2). These codes are unique to each satellite. Finally, the Navigation Message is encoded onto the signal. The signals are boosted by an amplifier, and then sent to transmitting antennas, which point towards the Earth. These antennas are little more than exposed electrical conductors which radiate the signal into space in the form of electromagnetic waves.

These electromagnetic waves pass through space and the Earth’s atmosphere, at very close to the speed of light in a vacuum, until they reach the receiver’s antenna. The waves create a minute signal in the antenna, in the form of an oscillating voltage. The signal is now preamplified at the antenna, to boost the signal strength, so that it is not overcome by noise by the time it gets to the other end of the antenna cable. The signal then enters the receiver, which  then measures it using a process known as “autocorrelation.” It is beyond the scope of this paper to go into the details of receiver design, so our description will be kept at the level required to understand how the observable model can be developed.

## AUTOCORRELATION TECHNIQUE

The time difference is computed by autocorrelation. The first bit from signal one is multiplied by the first bit of signal two. For example, if the first bits from the two signals both have values -1, then the result is (-1) ´ (-1) = 1. Similarly, if both bits have values  1, then the result is 1. On the other hand, if the two bits disagree, the result is ( 1) ´ (-1) = -1. This process is repeated for the second pair of bits, and so on. The result can be written as a sequence of 1 (where the bits agree) and -1 (where the bits disagree). This sequence is then summed, and divided by the total number of bits in each signal. For example, if signal A can be written ( 1, -1, -1, 1, -1), and signal B can be written ( 1, 1, -1, -1, 1), then multiplication gives ( 1, -1, 1, -1, -1); the sum of which gives -1; then dividing by the number of bits (5) gives -0.2. Note that if the two signals matched perfectly, the result would be 1. If the two signals were completely random, we should expect a result close to zero.

This is why the GPS signals are designed to look random. When the two signals are not properly matched in time, the result of auto correlation gives an answer close to zero; if the signals are matched in time, the result is close to 1 (but not exactly, since a real signal also has noise, so some bits are incorrect). One can see that the larger the number of bits that are compared, the better the resolution. This is because the random bits will average to zero better, the more bits we compare.

The Gold codes have the property that the autocorrelation is constant until we get to within one chip of the correct answer. Within that window of ±1 chip, the autocorrelation function looks like an equilateral triangle, with a value of 1 at its peak (assuming no noise). We can therefore use the known triangular shape as a model to help us find the time displacement that maximises the autocorrelation. (More sophisticated receivers account for the fact that multipath distorts the shape of this triangle, and can thus reduce the effect of multipath)

Now that we have found the peak autocorrelation, the inferred time displacement between the two signals is multiplied by the speed of light. This observation is called the pseudorange. The pseudorange measurement is described schematically in Figure 1.

3.3.1 Simplified Pseudorange Model

Receivers record data at regular, specified intervals (say, every 30 seconds, as instructed by the receiver user). It is the reading of the receiver clock time T, which is used to say exactly when the measurement is sampled. Therefore, the value of T at a measurement epoch is known exactly, and is written to the data file along with the observation. (What is not known, is the true time of measurement). The actual observation to satellite s can be writted:

The modelled observation can be developed by setting the clock time T equal to the true receive time t plus a clock bias t, for both the receiver and satellite clocks:

Substitution gives the pseudorange as a function of the true time the signal was received:

Note that the above algorithm starts with the true receive time, which requires the receiver clock bias. We usually don’t know in advance what the bias is, but for most receivers it never gets larger than a few milliseconds (beyond which, the receiver will reset its clock). If we assume it is zero in the above computation, the error produced is a few metres, which is much smaller than typical point positioning precision of approximately 50 metres with S/A switched on. We can therefore safely ignore this effect for now, and return to it later when we discuss the more precise carrier phase observable.

We now look at our system of simplified observation equations from 4 satellites in view of the receiver. Using the above notation, we can write the pseudoranges to each satellite as:

## POINT POSITIONING USING PSEUDORANGE

4.1 LEAST SQUARES ESTIMATION

4.1.1 Linearised Model

We solve the point positioning problem by first linearising the pseudorange observation equations, and then using the familiar methods of least-squares analysis. For completeness, we summarise the linearisation procedure and the development of the least squares method specifically for the GPS point positioning problem. First, we assume we can write the actual observation to be the sum of a modelled observation, plus an error term:

which expresses a linear relationship between the residual observations b (i.e., observed minus computed observations) and the unknown correction to the parameters x. The column matrix v contains all the noise terms, which are also unknown at this point. We call the above matrix equation the “linearised observation equations”.

4.1.2 The Design Matrix

The linear coefficients, contained in the “design matrix” A, are actually the partial derivatives of each observation with respect to each parameter, computed using the provisional parameter values. Note that A has the same number of columns as there are parameters, n = 4 , and has the same number of rows as there are data, m > 4 . We can derive the coefficients of A by partial differentiation of the observation equations, producing the following expression:

Note that A is shown to be purely a function of the direction to each of the satellites as observed from the receiver.

4.1.3 The Least Squares Solution

Let us consider a solution for the linearised observation equations, denoted . The “estimated residuals” are defined as the difference between the actual observations and the new, estimated model for the observations. Using the linearised form of the observation equations, we can write the estimated residuals as:

Under these assumptions, the expected covariance in the parameters for the least squares solution takes on a simple form:

Note that the “cofactor matrix” (ATA)-1 also appears in the formula for the least squares estimate, x . The “cofactor matrix” is also sometimes called the “covariance matrix,” where it is implicitly understood that it should be scaled by the variance of the input observation errors. Since GPS observation errors are a strong function of the particular situation (e.g., due to environmental factors), it is common to focus on the cofactor matrix, which, like A, is purely a function of the satellite-receiver geometry at the times of the observations. The cofactor matrix can therefore be used to assess the relative strength of the observing geometry, and to quantify how the level of errors in the measurements can be related to the expected level of errors in the position estimates.

It should therefore be clear why A is called the “design matrix”; we can in fact compute the cofactor matrix in advance of a surveying session if we know where the satellites will be (which we do, from the almanac in the Navigation Message). We can therefore “design” our survey (in this specific case, select the time of day) to ensure that the position precision will not be limited by poor satellite geometry.

4.2.2 Interpreting the Covariance Matrix

The covariance matrix for the estimated parameters can be written in terms of its components:

4.2.3 Local Coordinate Errors

For future reference, the general form of the resulting equation  is applicable to any problem involving an affine transformation (i.e., multiplication of a column vector by any  rectangular matrix, G). Note that for this particular problem, CX is really the 3x3 submatrix taken from the original 4x4 matrix (which also included coefficients for the clock parameter  The covariance matrix in the local system CL can be written in terms of its components:

We could then use this covariance, for example, to plot error ellipses in the horizontal plane.

4.2.4 Dilution of Precision

We can now define the various types of “dilution of precision” (DOP) as a function of diagonal elements of the covariance matrix in the local system:

where, for example, VDOP stands for “vertical dilution of precision,” H stands for horizontal, P for position, T for time, and G for geometric. As an example of how to interpret DOP, a standard deviation of 1 metre in observations would give a standard deviation in horizontal position of HDOP metres, and a standard deviation in the receiver clock bias of TDOP seconds. If VDOP had a value of 5, we could expect pseudorange errors of 1 metre to map into vertical position errors of 5 metres, and so on. As we have seen, the cofactor matrix and therefore the DOP values are purely a function of satellite geometry as observed by the receiver. A “good geometry” therefore gives low DOP values. A “bad geometry” can give very high DOP values. As a general rule, PDOP values larger than 5 are considered poor. If there are fewer than a sufficient number of satellites to produce a solution, or if 2 out of 4 satellites lie in approximately the same direction in the sky, then the cofactor matrix becomes singular, and the DOP values go to infinity. The above formulas assume that all 4 parameters (x, y, z, t) are being estimated. Of course, if fewer than these are estimated, for example if height is not estimated, then the modified DOP values would get smaller, and they would no longer be generally infinity for only 3 satellites in view.

4.2.5 Mission Planning

Mission planning is the term used to describe the pre-analysis of the satellite geometry in advance of a survey. Essentially, it typically involves using commercial software to plot the DOP values as a function of time at a given geographical location. Since most applications involve local to regional distances, it is not too important which station’s location is used for this analysis, since the satellites will appear approximately in the same position in the sky for all stations. One thing that can vary a lot from station to station is the “elevation mask”. Most software allow the user to specify which parts of the sky obstruct the view of the satellites (e.g., due to trees, buildings, or mountains). The elevation mask can substantially change the DOP values, so careful attention should be paid to this. Even if the elevation mask went down to the horizon, the user may wish to set it to 15 degrees all around, as research shows that data below 15 degrees is usually plagued by multipathing errors and other problems, such as cycle slips, and a low signal to noise ratio. As mentioned previously, the user might only be interested in horizontal position, where the height is known adequately in advance (e.g., for a boat at sea). Most software allow for DOP values to be computed under the assumption that height is fixed.

## THE CARRIER PHASE OBSERVABLE

5.1 CONCEPTS

We now introduce the carrier phase observable, which is used for high precision applications. We start with the basic concepts, starting with the meaning of “phase”, the principles of interferometry, and the Doppler effect. We then go on to describe the process of observing the carrier phase, and develop an observation model. Fortunately, most of the model can be reduced to what we have learned so far for the pseudorange. Unlike most textbooks, we take the approach of presenting the model in the “range formulism”, where the carrier phase is expressed in units of metres, rather than cycles. However, there are some fundamental differences between the carrier phase and the pseudorange observables, as we shall see when we discuss “phase ambiguity” and the infamous problem of “cycle slips”.

5.1.1 The Meaning of “Phase,” “Frequency” and “Clock Time”

“Phase” is simply “angle of rotation,” which is conventionally in units of “cycles” for GPS
analysis. Consider a point moving anti-clockwise around the edge of a circle, and draw a line from the centre of the circle to the point. As illustrated in Figure 2, the “phase”   at any given time t can be defined as the angle through which this line has rotated.

Phase is intimately connected with our concept of time, which is always based on some form of periodic motion, such as the rotation of the Earth, the orbit of the Earth around the Sun (“dynamic time”), or the oscillation of a quartz crystal in a wristwatch (“atomic time”). Even our reprentation of time is often based on rotation, such as the angle of the hands on the face of a clock. Angles of rotation give us our measure of “time.” In this way, phase can be thought of as a measure of time (after conversion into appropriate units). We can write this formally as:

of clock time. Whether of not this clock time is useful depends on the constancy of rate of change of phase. This brings us to the concept of frequency.

The “frequency,” expressed in units of “cycles per second,” is the number of times the line completes a full 360o rotation in one second (which of course, is generally a fractional number). This definition is somewhat lacking, since it seems to assume that the rotation is steady over the course of one second. One can better define frequency instantaneously as the first derivative of phase with respect to time; that is, the angular speed.

We chose to treat phase as a fundamental quantity, and frequency as a derived quantity. For example, we can say that frequency is a constant, if we observe the phase as changing linearly in time. Constant frequency is the basis of an ideal clock. If the frequency can be written as a constant, f0, then we can write the phase of an ideal clock as:

5.1.2 How phase is related to a periodic signal

At time t, the height of point A(t) above the centre of the circle in figure 2 is given by

We note that the nominal GPS signal takes on the above form, except that the signal is modulated by “chips”, formed by multiplying the amplitudes A0 S (for C/A code) and A0 C (for P code) by a pseudorandom sequence of 1 or -1. The underlying sinusoidal signal is called the “carrier signal.” It is the phase of the carrier signal that gives us precise access to the satellite clock time; therefore we can use this phase for precise positioning.

5.1.3 Carrier Beat Signal

The GPS carrier signal G(t) from the satellite is “mixed” (multiplied) with the receiver’s own replica carrier signal R(t). The result of this mixing is shown in Figure 3.

Mathematically, one can show that one would expect the result to be the difference between a low frequency signal and a high frequency signal:

We note that the above formulas apply even when the carrier phase is modulated with codes, provided the replica signal is also modulated (because the values of -1 will cancel when multiplying the two signals). If the codes are not known, it is possible to square both the incoming signal and the replica signal prior to mixing. The problem with this is that squaring amplifies the noise, thus introducing larger random measurement errors.

5.1.4 Origin of the Phase Ambiguity
Our model of carrier beat phase not a complete picture, since we can arbitrarily add an integer number of cycles to the carrier beat phase, and produce exactly the same observed beat signal. Suppose we only record the fractional phase of the first measurement. We would have no way of telling which integer N to add to this recorded phase so that it really did equal the difference in phase between the replica signal and the GPS signal. This is fundamentally because we have no direct measure of the total phase of the incoming GPS signal. We can express this as follows:

where we use a capital Greek symbol  to emphasise that it represents the phase value actually recorded by the receiver. Provided the receiver does keep track of how many complete signal oscillations there have been since the first measurement, it can attach this number of cycles to the integer portion of the recorded beat phase. However, there will still be an overall ambiguity N that applies to all measurements. That is, we can model N as being the same (unknown) constant for all measurements. If the receiver looses count of the oscillations (e.g., because the signal is obstructed, or because of excessive noise), then a new integer parameter must be introduced to the model, starting at that time. This integer discontinuity in phase data is called a “cycle slip.”

5.1.5 Interpretation of the Phase Ambiguity

The reader might also be wondering if there is some kind of geometrical interpretation for N. It turns out that there is, but it does require some oversimplified assumptions. By definition, the unknown value of N can be written as:

The second term is completely arbitrary, and depends on the receiver firmware. For example, some receivers set this value to zero for the first measurement. Let us assume this is true, and drop this term. For the sake of interpretation, let us now assume that the receiver and satellite clocks keep perfect time. Under these circumstances, the first term would equal the integer portion of the number of signal oscillations that occur in the receiver from the time the signal was transmitted to the time the signal was received. We can therefore interpret N as equal to the number of carrier wavelengths between the receiver (at the time it makes the first observation), and the satellite (at the time it transmitted the signal). Of course, we made assumptions about perfect clocks and the particular nature of the firmware; so we must be ware not to take this interpretation too literally.

5.1.6 Intuitive Model: The Doppler Effect

How can phase be used to measure distance? One way hinted at above is that the phase essentially tells you the clock time. As we shall see in the next section, we can develop phase in almost the same way as the pseudorange model. Another intuitive way of looking at it is to consider the Doppler effect. We are all familiar with the acoustic version of the Doppler effect, as we hear a vehicle’s at a higher pitch when it is approaching, and a lower pitch when receding. Can we use the Doppler effect to design a distance measuring device?

Imagine two perfect clocks; one is at a fixed point, the other is approaching in a vehicle. Let both clocks be generating a sinusoidal signal. The frequency difference between the reference signal, and the approaching signal, increases with the vehicle’s speed of approach. Let us build a receiver to mix the two signals and measure the beat signal. The beat frequency would be a measure of the speed.

Let us count the cycles of the beat signal; or better yet, let us measure the phase (cycles plus fractional cycles) of the beat signal. Clearly, the beat phase would be measures the change in distance to vehicle. We can therefore (after appropriate unit conversion) write the intuitive equation:

Beat phase = distance to vehicle constant

This demonstrates that, although beat phase can be used to precisely measure change in distance from one time to another, there is an unknown constant which prevents us from knowing the full distance. This can be seen by considering moving the reference observer 10 metres away from the original position, and then repeating the experiment. The Doppler effect is clearly exactly the same, and the number of cycles passing by would not change. The very first value of the measured beat phase will indeed be different, but this single measurement cannot be used to infer distance. For example, we have already discussed that don’t know what integer number of cycles to attribute to the first beat phase measurement.

5.2 CARRIER PHASE OBSERVATIONMODEL

5.2.1 Carrier Beat Phase Model

We now move towards a more rigorous treatment of the carrier beat phase observable, building on our concepts of phase and signal mixing. Our notation will change slightly in preparation for further development.

To summarise what we know already, the satellite carrier signal (from antenna) is mixed with reference signal generated by receiver’s clock. The result, after high pass filtering, is a “beating” signal. The phase of this beating signal equals the reference phase minus the incoming GPS carrier phase from a satellite; however, it is ambiguous by an integer number of cycles. From this point on, “carrier beat phase” will be simply called “carrier phase” (but it should not be confused with the phase of the incoming signal!).

Of course, if we adopt this point of view, then we shall eventually have to consider the model of how long it takes a wave front of constant phase to propagate from the satellite to the receiver, so that we may model the appropriate satellite clock time at the time of signal transmission, TS. We return to that later.

As discussed previously, we can write clock time as a function of phase and nominal
frequency:

where we implicitly understand that the clock times refer to different events (reception and transmission, respectively).

We note that any term containing the superscript S are different for each satellites, but all other terms are identical. Receivers are designed and calibrated so that the phase constant j0 is identical for all satellites; that is, there should be no interchannel biases. Receivers should also sample the carrier phase measurements from all satellites at exactly the same time. (If the receivers have multiplexing electronics to save on cost, then the output should have been interpolated to the same epoch for all satellites). The time TS will vary slightly from satellite to satellite, since the satellite transmission time must have been different for all signals to arrive at the same time. We also note that the last three terms are constant, and cannot be separated from each other. We can collectively call these terms the “carrier phase bias,” which is clearly not an integer.

In preparation for multi-receiver and multi-satellite analysis, we now introduce the subscripts A, B, C, etc. to indicate quantities specific to receivers,, and we introduce superscripts j, k, l, etc. to identify satellite-specific quantities. We write the carrier phase observed by receiver A from satellite j:

Note that data should be sampled at exactly the same values of clock time (called “epochs”) for all receivers, so all values of TA are identical at a given epoch. However receivers clocks do not all run at exactly the same rate, therefore the true time of measurement will differ slightly from receiver to receiver. Also, note that each receiver-satellite pair has a different carrier phase ambiguity.

5.2.2 Range Formulation

It is convenient to convert the carrier phase model into units of range. This simplifies concepts, models, and software. In the range formulation, we multiply the carrier phase equation by the nominal wavelength.

Note that the carrier phase bias for (undifferenced) data is not an integer number of wavelengths, but also includes unknown instrumental phase offsets in the satellite and receiver.

We have not mentioned yet about any differences between carrier phase on the L1 and L2 channel. Although they have different frequencies, in units of range the above equations take on the same form. Actually, the clock bias parameters would be identical for both L1 and L2 phases, but the carrier phase bias would be different. The main difference comes when we develop the model in terms of the propagation delay, which is a function of frequency in the Earth’s ionosphere.

5.2.3 Observation Model

We note that the first term in the carrier phase model is simply the pseudorange, and the second term is a constant. We have already developed a simplified model for pseudorange, so we can therefore write a model for carrier phase as follows:

This is because, from physics theory, any information, such as the 1 and -1 “chips” which are modulated onto the carrier wave, must travel with the “group velocity” rather than “phase velocity”. According to the theory of relativity, information can not be transmitted faster than c. From the physics of wave propagation in the ionosphere, it can be shown that the group delay is (to a very good first order approximation) precisely the same magnitude, but opposite sign of the phase delay (which is really a phase “advance”).

5.2.4 Accounting for Time-Tag Bias

Before proceeding, we return to the problem posed in our discussion of the pseudorange model, that we typically do not know the true time of signal reception tA which we need to calculate the satellite-receiver range term   precisely. From section 3.3.1, the true time of reception can be written:

There are various approaches to dealing with this in GPS geodetic software, which typically use some combination of the following methods:

where the satellite clock bias  is obtained from the Navigation Message. One can then directly compute the range term and true receive time with sufficient precision, provided the approximate station coordinates are known to within 300 m (corresponding to the  timing requirement). Interestingly, this is the basis for “time transfer,” since it allows one to compute the receiver clock bias using pseudorange data from only one GPS satellite. (For precise time transfer, two GPS satellites are always in operation with no S/A switched
on.) As a method for computing range for precise positioning, this is not often used, perhaps for the reason that it is not a pure model, as it depends on pseudorange data and approximate positions. ·

• one can take a modelling “short cut” to avoid iteration by expanding the range model as a first order Taylor series. Since this method often appears in the development of the observation equation in textbooks, we discuss it in more detail here.

5.2.5 A Note on the Range-Rate Term

The observation equation can be approximated as follows:

where we see that the effect can be accounted for by introducing the modelled range rate (i.e., the relative speed of the satellite in the direction of view). The “prime” for the satellite transmit time t ' j (which is used to compute the satellite coordinates) is to indicate that it is not the true transmit time, but the time computed using the nominal receive time TA. A first order Taylor expansion has been used. The higher order terms will only become significant error sources if the receiver clock bias is greater than about 10 ms, which does not usually happen with modern receivers. In any case, clock biases greater than this amount would result
in a worse error in relative position due to the effect of S/A (see section 5.3.1).

Textbooks sometimes include a “range rate” term in the development of the phase observation model, even though, strictly speaking, it is unnecessary. After all, the first line line of the above equation is correct, and the lack of a priori knowledge of the receiver clock bias can easily be dealt with by least-squares iteration, or prior point positioning using the pseudorange. On the other hand, it is nevertheless instructional to show the above set of equations, since it does illustrate that it is more correct to use   as the partial derivatives with respect to the receiver clock in the design matrix, rather than simply using c (section 4.1.2). This is  rucial if one is not initialising clocks using point position solutions or iteration (as is typical, for example, with the GIPSY OASIS II software). It is not important if initialisation of   is achieved with accuracy.

In the expressions to follow, we shall not explicitly include the range rate term on the assumption that time-tag bias has been handled one way or another.

5.3 DIFFERENCING TECHNIQUES

5.3.1 Single Differencing

where we use the double-subscript to denote quantities identified with two receivers, and the triangular symbol as a mnemonic device, to emphasise that the difference is made between two points on the ground. The geometry of single differencing is illustrated in Figure 4.

The atmospheric delay terms are now considerably reduced, and vanish in the limit that the receivers are standing side by side. The differential troposphere can usually be ignored for horizontal separations less than approximately 30 km, however differences in height should be modelled. The differential ionosphere can usually be ignored for separations of 1 to 30 km, depending on ionospheric conditions. Due to ionospheric uncertainty, it is wise to calibrate for the ionosphere using dual-frequency receivers for distances greater than a few km.

Although the single difference has the advantage that many error sources are eliminated or reduced, the disadvantage is that only relative position can be estimated (unless the network is global-scale). Moreover, the receiver clock bias is still unknown, and very unpredictable. This takes us to “double differencing”.

5.3.2 Double Differencing

where we use the double-superscript to denote quantities identified with two satellites, and the upside-down triangular symbol as a mnemonic device, to emphasise that the difference is made between two points in the sky. Figure 5 illustrates the geometry of double differencing.

A point worth mentioning, is that although the receiver clock error has been eliminated to first order, the residual effect due “time tag bias” on the computation of the range term (section 5.2.4) does not completely cancel, and still needs to be dealt with if the receiver separation is large.

Any systematic effects due to unmodelled atmospheric errors are generally increased slightly by approximately 40% by double differencing as compared to single differencing. Similarly, random errors due to measurement noise and multipath are increased. Overall, random errors are effectively doubled as compared with the undifferenced observation equation. On the other hand, the motivation for double differencing is to remove clock bias, which would create much larger errors.

One could process undifferenced or single differenced data, and estimate clock biases. In the  limit that clock biases are estimated at every epoch (the “white noise clock model”), these methods become almost identical, provided a proper treatment is made of the data covariance (to be described later). It is almost, but not quite identical, because differencing schemes almost always involve pre-selection of baselines in a network to form single differences, and data can be lost by lack of complete overlap of the observations to each satellite. (This problem can be minimised by selecting the shortest baselines in the network to process, and by assuring that no more than one baseline be drawn to a receiver with a significant loss of data).

5.3.3 Double Differenced Ambiguity

The double difference combination has an additional advantage, in that the ambiguity is an integer:

5.3.4 Triple Differencing

where we use the delta symbol to indicate the operator that differences data between epochs. Figure 6 illustrates triple differencing geometry.

The triple difference only removes the ambiguity if it has not changed during the time interval between epochs. Any cycle slips will appear as outliers, and can easily be removed by conventional techniques. This is unlike the situation with double differencing, where cycle slips appear as step functions in the time series of data.

The disadvantage of the triple difference is that it introduces correlations between observations in time. Generally, increasing correlations in data has the property of decreasing the data weights. With triple differencing, the degradation in precision is substantial; so triple differenced data are inappropriate for precise surveys. On the other hand, it is a very useful method for obtaining better nominal parameters for double differencing (to ensure linearity), and it is a robust method, due to the ease with which cycle slips can be identified and removed.

It can be shown that triple difference solution is identical to the double differenced solution, provided just one epoch double differenced equation is included for the first point in a data arc, along with the triple differences, and provided the full data covariance matrix is used to compute the weight matrix. This special approach can provide tremendous savings in computation time over straightforward double differencing, while retaining robustness.

## RELATIVE POSITIONING USING CARRIER PHASE

6.1 SELECTION OFOBSERVATIONS

6.1.1 Linear Dependence of Observations

We can usually form many more possible combinations of double differenced observations than there are original data. This poses a paradox, since we cannot create more information than we started with. The paradox is resolved if we realise that some double differences can be formed by differencing pairs of other double differences. It then becomes obvious that we should not process such observations, otherwise we would be processing the same data more than once. This would clearly be incorrect.

Figure 7: Double difference geometry with 3 satellites.

6.1.2 The Reference Satellite Concept

The “reference satellite concept” involves using either set Lj ,Lk or Ll throughout the data set. For example, double differences in set Ll all involve the satellite l. Any set is equally as valid, and will produce identical solutions provided the data covariance is properly constructed (see the next section). Obviously, the reference satellite itself has to have data at every epoch, otherwise data will be lost. This can cause problems for less sophisticated software. Typically, a reference satellite should be picked which has the longest period in view. A better algorithm is to select a reference satellite epoch by epoch.

Our simple example can be easily extended to more than 3 satellites. For example consider satellites 1, 2, 3, 4 and 5 in view. We can pick satellite 4 as the reference satellite; therefore our linearly independent set is:

Note that for a single baseline (i.e. 2 receivers), the number of linearly independent double differenced observations is s-1, where s is the number of satellites being tracked.

6.1.3 The Reference Station Concept

However, if we have a network of more than 2 receivers, we must account for the fact that double differenced data from the set of all baselines are linearly dependent. We therefore introduce the “reference station” concept, where our set of double differences all include a common reference station. This guarantees linear independence. For example, consider satellites 1, 2, 3 and 4 being tracked by stations A, B, and C. If we pick our reference satellite to be 3, and reference station to be B, then our chosen set is:

Note that the number of linearly independent double differenced observations is    (s-1)(r-1), where s is the number of satellites being tracked, and r is the number of receivers. So, in our previous example, 3 receivers and 4 satellites gives 6 observations. This assumes that satellites are observed by all stations. This may not be the case, either due to obstructions, receiver problems, or because the receivers are separated by such a large distance that the satellite is not above the horizon for some receivers.

If using the reference station concept, it is therefore best to choose a receiver close to the middle of a large network, with few obstructions, and no hardware problems, otherwise the set of double differences may not be as complete as it could be. The reference station concept is obviously not optimal, and is seriously problematic for large networks. A better strategy for large networks is to select short baselines that connect together throughout the entire network, being careful not to introduce linear dependent observations, by not including any closed polygons (such as triangles) in the network. In principle, there must be only one possible path between pairs of stations. An even better strategy would be to optimise this choice for every epoch.

6.1.4 Solution Uniqueness

It should be stressed that, if all stations were tracking the same set of satellites at all epochs, then the selection of reference station and reference satellite will not matter, since an identical solution will be produced whatever the selection. This assumes that the data weight matrix is properly constructed (as described below) and that no data outliers are removed.

The problem of linear dependence usually introduces a level of arbitrariness into the solutions due to violation of the above assumptions. This problem is also true even if the previously suggested improvements are made to the reference station concept, since the user typically has to make decisions on which baselines to process (even for more sophisticated software). This is somewhat unsatisfactory, since it is there generally no unique solution, However, experience shows that any reasonable selection will only produce small differences in the final solutions.

There is a way to produce a unique solution, and that is to process undifferenced observations, estimating clock parameters at each epoch. As stated previously, this will produce a solution identical to double differencing under ideal conditions. This class of software is not typically available commercially; however, it is should be stressed that double differencing software does not produce significantly inferior results for most situations. What is far more important is the quality of the observable models, the range of appropriate estimation options, and the ability to detect and deal with cycle slips and outliers.

6.2 BASELINE SOLUTION USING DOUBLE DIFFERENCES

6.2.1 Simplified Observation Equations

The observation equations therefore require linearisation in terms of all these parameters (according to the process explained in section 4.1). Typically, one station is held fixed at good nominal coordinates, which quite often come from an initial pseudorange point position solution. We should mention, however, that due to S/A, point position solutions can have substantial errors (100 m) which may create significant errors in the double differenced observation model, and in the design matrix.

If we call the fixed station A, then estimating the baseline vector is equivalent to estimating the coordinates of station B. It is convenient to formulate the problem to estimate parameters (xB yB  zB )   , , . For example, consider a GPS survey between stations A and B, which observe satellites 1, 2, 3 and 4 for every epoch in the session, where we arbitrarily pick satellite 2 as the reference satellite. For every epoch i, we have the following linearly independent set of 3 double differenced observations:

The covariance matrix can be used to judge whether the theoretically expected precision from the observation scenario is sufficient to allow ambiguities to be fixed to integer values. If ambiguity parameters can be fixed in the model, a theoretically more precise solution can be generated from the same data, but without estimating the ambiguities. This process will necessarily reduce the covariance matrix, lowering the expected errors in the station coordinates. This does not necessarily mean that the solution is better, but that it statistically ought to be better, assuming the integers were correctly fixed. The assessment of solution accuracy goes beyond the scope of this discussion, but basically one can compare results with previous results (using GPS, or even some other technique). In addition, how well the data are fit by the model is reflected in the standard deviation of the post-fit residuals.

6.2.3 The Design Matrix

6.2.4 Minimum Data Requirements for Least Squares

6.3 STOCHASTICMODEL

6.3.1 Statistical Dependence of Double Differences

6.3.2 Data Weight Matrix for Double Differences

In a situation where data are correlated, weighted least squares is appropriate. To complete our description of how to compute a relative position estimate, we therefore need to explain how to compute the appropriate data weight matrix, W. The construction of W can be generally called the “stochastic model,” which describes the statistical nature of our data (as opposed to the “functional model” described so far, from which the observables can be computed deterministically.)

(As an aside for more advanced readers, some software process undifferenced observations, estimating clock biases as “stochastic parameters” at every epoch. It should be emphasised that there is a equivalence between explicit estimation of “stochastic parameters,” and the use of an appropriate “stochastic model” which, in effect, accounts for the missing parameters through the introduction of correlations in the data. In principle, any parameter can either be estimated as part of the functional model, or equivalently removed using an appropriate stochastic model. To go more into this would be beyond the scope of this text.)

The weight matrix is the inverse of the covariance matrix for the double differenced data:

We start by assuming a covariance matrix for undifferenced data (i.e., the actually recorded data), which has dimensions qrs x qrs . Typically, this is assumed to be diagonal, since the receiver independently measures the signals from each satellite separately. We shall,however, keep the form general. So the problem is, given a covariance matrix for undifferenced data, how do we compute the covariance matrix for double-differenced data? This is achieved using the rule of propagation of errors, which we have already seen in section 4.2.3, where geocentric coordinates were mapped into topocentric coordinates using an affine transformation. By analogy, we can deduce that the covariance of double-differenced data can be written:

We can now write down the full expression for the computed covariance matrix, by substituting for the double differenced data weight matrix W:

As mentioned above, for the (undifferenced) data covariance C we often use a diagonal matrix, assuming a value for the standard deviation of an observation. Typical realistic values for this are several mm. Although the receiver can usually measure the phase with better precision than a mm, the post-fit residuals typically show several mm standard deviations, due to unmodelled errors such as multipath.

Even using such an inflated value for measurement precision might not produce a realistic covariance matrix for station coordinates. This is partly due to two effects: (i) unmodelled errors can be correlated with the parameters being estimated (an “aliasing effect”), and (ii) post-fit almost always show some degree of time-correlation (e.g., due to multipath). A simple, and often surprisingly effective way to deal with this problem, is to multiply the final coordinate covariance matrix by an empirical scaling factor, inferred “by experience,” according to the brand of software being used, the observation scenario, and the estimation strategy used.

## INTRODUCING HIGH PRECISION GPS GEODESY

7.1 HIGH PRECISION SOFTWARE

The observable model discussed so far has been very basic, as it glosses over advanced features that are important for high precision software. Several software packages have been developed since the 1980’s that are capable of delivering high precision geodetic estimates over long baselines. This software is a result of intensive geodetic research, mainly by universities and government research laboratories.

Typical features of such software include:

•  orbit integration with appropriate force models;
• accurate observation model (Earth model, media delay...) with rigorous treatment of celestial and terrestrial reference systems;
• reliable data editing (cycle-slips, outliers);
• estimation of all coordinates, orbits, tropospheric bias, receiver clock bias, polar motion, and Earth spin rate;
• ambiguity resolution algorithms applicable to long baselines;
• estimation of reference frame transformation parameters and kinematic modelling of station positions to account for plate tectonics and co-seismic displacements.

We can summarise the typical quality of geodetic results from 24 hours of data:

•  relative positioning at the level of few parts per billion of baseline length;· absolute (global) positioning at the level of 1 cm in the IERS Terrestrial Reference Frame (ITRF);
•  tropospheric delay estimated to a few mm;
• GPS orbits determined to 10 cm;
• Earth pole position determined to 1 cm;
• clock synchronisation (relative bias estimation) to 100 ps.

Two features of commercial software are sometimes conspicuously absent from more advanced packages: (i) sometimes double differencing is not implemented, but instead, undifferenced data are processed, and clock biases are estimated; (ii) network adjustment using baseline solutions is unnecessary, since advanced packages do a rigorous, one-step, simultaneous adjustment of station coordinates directly from all available GPS observations.

Some precise software packages incorporate a Kalman filter (or an equivalent formulism). This allows for certain selected parameters to vary in time, according to a statistical (“stochastic”) model. Typically this is used for the tropospheric bias, which can vary as a random walk in time. A filter can also be used to estimate clock biases, where “white noise” estimation of clock bias approaches the theoretical equivalent of double differencing.

Although many more packages have been developed, there are 3 ultra high-precision software packages which are widely used around the world by researchers and are commonly referenced in the scientific literature:

• BERNESE software, developed by the Astronomical Institute, University of Berne, Switzerland;
• GAMIT software, developed by the Massachussets Institute of Technology, USA;
• GIPSY software, developed by the Jet Propulsion Laboratory, California Institute of Technology, USA

There are several other packages, but they tend to be limited to the institutions that wrote them. It should be noted that, unlike commercial software packages, use of the above software can require a considerable investment in time to understand the software and how best to use it under various circumstances. Expert training is often recommended by the distributors.

7.2 SOURCES OFDATA AND INFORMATION

For high precision work, it is important to abide by international reference system standards and use the best available sources of data and ancilliary information. We therefore summarise two especially important international sources of data information for the covenience of the interested reader:

•  IERS: International Earth Rotation Service
• Central Bureau located at the Paris Observatory, France
• Documented IERS Conventions for observation models and reference systems
• IERS Annual Reports
• IERS Terrestrial Reference Frame for reference station coordinates
• Routine publication of Earth rotation parameters
• IGS: International GPS Service for Geodynamics
• Central Bureau located at the Jet Propulsion Laboratory, USA
• Documented IGS Standards for permanent GPS stations
• Oversees operation of global GPS network (~100 stations)
• Distributes tracking data and precise ephemerides
•  Maintains on-line database with Internet access

8. CONCLUSIONS

Having read and understood this text, you should now understand the basics of GPS positioning observation models and parameter estimation. You should also have an appreciation of the difference between basic positioning, and the more advanced positioning using high precision software packages. If all has gone well, and you think the above statements are true, then you should now have a good background knowledge and an appropriate context to prepare you for more advanced material.

To View:

Video -1

Video -2

Video -3

Video -4

## Forum for Basics of the GPS Technique:

Forum for Basics of the GPS Technique:

Forum not available

## RADAR- An In-Building RF-based User Location and Tracking System

Abstract

The proliferation of mobile computing devices and local-area wireless networks has fostered a growing interest in location-aware systems and services. In this paper we present RADAR, a radio-frequency (RF) based system for locating and tracking users inside buildings. RADAR operates by recording and processing signal strength information at multiple base stations positioned to provide overlapping coverage in the area of interest. It combines empirical measurements with signal propagation modeling to determine user location and thereby enable location aware services and applications. We present experimental results that demonstrate the ability of RADAR to estimate user location with a high degree of accuracy.

Keywords: location-aware services, user location and tracking, wireless LAN, radio-frequency wireless network

1 Introduction

The proliferation of mobile computing devices and local-area wireless networks has fostered a growing interest in location-aware systems and services. A key distinguishing feature of such systems is that the application information and/or interface presented to the user is, in general, a function of his or her physical location. The granularity of location information needed could vary from one application to another. For example, locating a nearby printer requires  fairly coarse-grained location information whereas locating a book in a library would require fine-grained information.

While much research has focussed on developing services architectures for location-aware systems (e.g., [Maa97,Nel98]), less attention has been paid to the fundamental and challenging problem of locating and tracking mobile users, especially in in-building environments. The few efforts that have addressed this problem have typically done so in the context of infrared (IR) wireless networks. The limited range of an IR network, which facilitates user location, is a handicap in providing ubiquitous coverage. Also, the IR network is often deployed for the sole purpose of locating people and does not provide traditional data networking services. To avoid these limitations, we focus on RF wireless networks in our research. Our goal is to complement the data networking capabilities of RF wireless LANs with accurate user location and tracking capabilities, thereby enhancing the value of such networks.

In this paper, we present RADAR, an RF-based system for locating and tracking users inside buildings. RADAR uses signal strength information gathered at multiple receiver locations to triangulate the user’s coordinates. Triangulation is done using both empirically-determined and theoretically computed signal strength information.

Our experimental results are quite encouraging. With high probability, RADAR is able to estimate a user’s location to within a few meters of his/her actual location. This suggests that a large class of location-aware services can be built over an RF local-area wireless data network.

The remainder of this paper is organized as follows. In Section 2, we survey related work in location determination technologies. In Section 3, we discuss our research methodology. Section 4 contains the core of the paper where we present and analyze the empirical and the signal propagation modeling methods. A discussion of extensions to the base RADAR system appears in Section 5. Finally, we present our conclusions in Section 6.

2 Related Work

Related work in the area of user location and tracking falls into the following broad categories: (1) in-building IR networks, (2) wide-area cellular networks (based on RF), and (3) Global Positioning System (GPS).

The Active Badge system [Wan92,Har94] was an early and significant contribution to the field of location-aware systems. In this system, a badge worn by a person emits a unique IR signal every 10 seconds. Sensors placed at known positions within a building pick up the unique identifiers and relay these to the location manager software. While this system provides accurate location information, it suffers from several drawbacks: (a) it scales poorly due to the limited range of IR, (b) it incurs significant installation and maintenance costs, and (c) it performs poorly in the presence of direct sunlight, which is likely to be a problem in rooms with windows.

Another system based on IR technology is described in [Azu93]. IR transmitters are attached to the ceiling at known positions in the building. An optical sensor on a headmounted unit senses the IR beacons, which enables the system software to determine the user's location. This system suffers from similar drawbacks as the Active Badge system.

The system described in [ATC97] is based on pulsed DC magnetic fields. Multiple sensors are placed on body mounted peripherals, such as data gloves, and their output is processed to determine a person's location and orientation with a high degree of precision. This technology is used  extensively in the computer animation industry. It is, however, quite expensive and, like IR, severely range limited, hence unsuitable for large-scale deployment.

Recently, several location systems have been proposed for wide-area cellular systems [Tek98]. The technological alternatives for locating cellular telephones involve measuring the signal attenuation, the angle of arrival (AOA), and/or the time difference of arrival (TDOA). While these systems have been found to be promising in outdoor environments, their effectiveness in indoor environments is limited by the multiple reflections suffered by the RF signal, and the inability of off-the-shelf and inexpensive hardware to provide fine-grain time synchronization.

Systems based on the Global Positioning System [GPS99], while very useful outdoors, are ineffective indoors because buildings block GPS transmissions.

The Daedalus project [Hod97] developed a system for coarse-grained user location. Base stations transmit beacons augmented with their physical coordinates. A mobile host estimates its location to be the same as that of the base station to which it is attached. Consequently, the accuracy of the system is limited by the (possibly large) cell size.

Our work differs from previous work in that we tackle the problem of user location and tracking on a widely available radio frequency based wireless network in an inbuilding environment. RF networks offer a significant advantage over IR networks in terms of range, scalability, deployment, and maintenance. With speeds of up to 11 Mbps, these systems have gained rapid acceptance and are being widely deployed in offices, schools, homes, etc.

We recently became aware of the Duress Alarm Location System (DALS) described in [CG93]. While their work and ours are similar in some ways, they also differ in significant ways. Briefly, their system (1) is dependent on specialized hardware, (2) does not use propagation modeling  to build a radio map of the building, (3) does not factor in user body orientation, and (4) requires infrastructural deployment over and above a wireless data network. These points are clarified in the following sections.

## Research Methodology

We begin with a description of our experimental test bed. We then discuss the data collection process, including tools we developed for this purpose. Finally, we describe the processing we performed on the data as a precursor to the analysis described in Section 4.

3.1 Experimental Testbed

Our experimental test bed is located on the second floor of a 3-storey building. The layout of the floor is shown in Figure 1. The floor has dimension of 43.5 m by 22.5 m, an area of 980 sq. m (10500 sq. ft.), and includes more than 50 rooms.

We placed three base stations, BS1, BS2 and BS3, at the locations indicated in Figure 1. Each base station was a Pentium-based PC running FreeBSD 3.0 equipped with awireless adapter. Our mobile host, carried by the user being tracked, was a Pentium-based laptop computer running Microsoft Windows 95.

Each base station and the mobile host was equipped with a Digital RoamAboutTM network interface card (NIC), based on Lucent’s popular WaveLANTM RF LAN technology. The network operates in the 2.4 GHz licensefree ISM (Industrial, Scientific and Medical) band. It has a raw data rate of 2 Mbps and a one-way delay of 1-2 ms. The range of the network, as specified in [Roa96], is 200 m, 50 m, and 25 m, respectively, for open, semi-open, and closed office environments. This classification is based on the type and density of obstructions between the transmitter and the receiver. With this nomenclature, our testbed environment would be classified as being open along the hallways where the base stations are located and closed elsewhere. The base stations provide overlapping coverage in portions of the floor, and together cover the entire floor.

3.2 Data Collection

A key step in our research methodology is the data collection phase. We record information about the radio signal as a function of the user’s location. As discussed in Section 4, we use the signal information to construct and validate models for signal propagation during off-line  analysis as well as to infer the location of a user in real time.  We refer to the former as the off-line phase and the latter as the real-time phase.

Among other information, the WaveLAN NIC makes available the signal strength (SS) and the signal-to-noise ratio (SNR). SS is reported in units of dBm and SNR is expressed in dB. A signal strength of s Watts is equivalent to 10*log10(s/0.001) dBm. A signal strength of s Watts and a noise power of n Watts yields an SNR of 10*log10(s/n) dB. For example, a signal strength of 1 Watt is equivalent to 30 dBm. Furthermore, if the noise power is 0.1 Watt, the SNR would be 10 dB.

The FreeBSD 3.0 WaveLAN driver extracts the SS and the SNR information from the WaveLAN firmware each time a broadcast packet is received1. It then makes the information available to user-level applications via ioctl system calls. We used the wlconfig utility, which provides a wrapper around the ioctl calls, to extract the signal information.

In our testbed, we chose to have the Windows-based mobile host broadcast packets (beacons) periodically and have the FreeBSD base stations record signal strength information. However, in a production system with many more mobiles than base stations, it may be desirable to have the latter transmit the beacons and the former measure the signal strength. Nevertheless, the accuracy of the user location and tracking is not impacted by this choice2.

We wrote a simple application using Tcl/Tk [Ous94] and Perl [Wal96] to control the entire data collection process.

1. It is quite easy to modify the driver to record information for other packets as well, but we found no reason to do so.

2 While our analysis does not assume symmetry of signal strength, the few instances where we measured signal strength at both ends indicate little asymmetry. from the mobile host. The process operates as follows. First, the clocks on the mobile host and the base stations are synchronized (to within the round-trip latency of the wireless link, essentially less than 5 ms). The mobile host then starts broadcasting UDP packets, each with a 6-byte payload and spaced apart uniformly, at a default rate of 4 per second. Each base station (bs) records the signal strength (ss) measurement3 together with a synchronized timestamp t, i.e., it records tuples of the form (t, bs, ss). This information is collected both during the off-line phase and the real-time phase.

Figure 1 Map of the floor where the experiments were conducted. The black dots denote locations were empirical signal strength information was collected. The large stars show the locations of the 3 base stations. The orientation is North (up) and East (right).

3. During the course of our experiments, we discovered that  the signal strength is a stronger function of location than the signal-to-noise ratio. The latter is impacted by random fluctuations in the noise process. So we only use signal strength information in our analysis.

In addition, during the off-line phase (but not the realtime phase), the user indicates his/her current location by clicking on a map of the floor. The user’s coordinates (x,y) and timestamp t are recorded.

During our experiments, we discovered that signal strength at a given location varies quite significantly (by up to 5 dBm) depending on the user’s orientation, i.e., the direction he/she is facing. In one orientation, the mobile host’s antenna may have line-of-sight (LoS) connectivity to a base station’s antenna while in the opposite orientation, the user’s body may form an obstruction. So, in addition to user's location (x,y), we also recorded the direction (d) (one of north, south, east, or west) that he/she is facing at the time the measurement is made4. Thus, the mobile host records tuples of the form (t,x,y,d) during the off-line phase. We discuss the implications of the user’s orientation in more detail in Section 4.

In all, during the off-line phase, we collected signal strength information in each of the 4 directions at 70 distinct physical locations on our floor. For each combination of location and orientation (i.e., (x,y,d) tuple), we collected at least 20 signal strength samples.

3.3 Data Processing

We outline the data processing that we performed as a precursor to the analyses discussed in Section 4.

3.3.1 Signal Strength Information

Using the synchronized timestamps, we merged all of the traces collected during the off-line phase into a single, unified table containing tuples of the form (x,y,d,ssi,snri), where corresponding to the three base stations. For each (x,y,d) tuple, we computed the mean, the standard deviation, and the median of the corresponding signal strength values for each of the base stations. For much of our analysis, we use this processed data set (primarily the mean) rather than the original, raw data set.

We wrote routines to search through the processed data set to determine exact as well as closest matches. There is a fair amount of database research literature that describes efficient data structures and algorithms for such multidimensional searches (e.g., R-Tree [Gut84], X-Tree [Ber96], optimal k-nearest neighbor search [Sei98], etc.) However, we chose a simple linear-time search algorithm because our relatively small data set and dimensionality (at most 3, as explained in Section 4) did not warrant the complexity of the aforementioned algorithms. Moreover, the focus of our research is on the analysis rather than on developing an optimal closest match implementation.

3.3.2 Building Floor Layout Information

We obtained the layout information for our floor, which specified the coordinates of each room. We also obtained the coordinates of the three base stations. Using these and 4 While there are other sources of fluctuation, such as the movement of other people and objects, these tend to be random. In contrast, the body of the person carrying the mobile host introduces a systematic source of error. the Cohen-Sutherland line-clipping algorithm [Fol90], we computed the number of walls that obstructed the direct line between the base stations and the locations where we hadcollected the empirical signal strength data. We use this to build an accurate signal propagation model (Section 4.2).

## Algorithms and Experimental Analysis

In this section, we discuss several algorithms for user location and tracking, and present an analysis of how well  these perform using our experimental data.

A premise of our work is that signal strength (SS) information provides a means of inferring user location. To demonstrate that this is a reasonable premise, we show in Figure 2 shows how the SS measured at each of the 3 base stations varies as the user walks along the outer hallway of the floor in a counter-clockwise direction (Figure 1). The walk begins and terminates at the north-west corner (close to BS1). There is a definite trend in the SS recorded at the three base stations as the user walks around the loop. Not surprisingly, the signal received at a base station is the strongest when the user is close to it and weakest when the user is far away. This strong trend is an indication that using SS to infer location is a promising approach.

Figure 2 Signal strength recorded at the three base stations as the user walks around the floor.

Our basic approach is triangulation5. Given a set of signal strength measurements at each of the base stations, we determine the location that best matches the observed signal strength data. We then “guess” that to be the location of the user. There are multiple variations of this basic idea that arise because of several choices for each of the following:

• Ways of summarizing the signal strength samples measured at the base stations
• Basis for determining the best match.
• Metric for determining the best match.

We discuss each of these in turn.

It is just coincidental that we have three base station in our testbed. Triangulation can also be done with fewer or more base stations.

First, we summarize multiple signal strength samples from a base station using the sample mean. In the case of a static user whose location and orientation are fixed (the user location problem), it is clear which signal strength measurements should be included in the sample set. In the case of a mobile user (the user tracking problem), it is less clear what the sample set should be. In the latter case, we define the sample set to be all samples that lie within a sliding time window.

Second, to determine the location and orientation that best match a given (summarized) set of SS measurements, we first need to determine what the SS (at each base station) should be used for a particular combination of user location and orientation. We consider a couple of alternative approaches. The first is the empirical method where we use the location and SS data gathered during the off-line phase (Section 3.2). The second approach is signal propagation modeling. As discussed in Section 4.2, we have developed a model that accounts for both free-space loss and loss due to obstructions in computing the SS at each base station corresponding to given a user location.

Third, we need a metric and a search methodology to compare multiple locations and pick the one that best matches the observed signal strength. We term our general technique nearest neighbor(s) in signal space (NNSS). The idea is to compute the distance (in signal space) between the observed set of SS measurements, (ss1,ss2,ss3), and the recorded SS, (ss’1,ss’2,ss’3), at a fixed set of locations, and then pick the location that minimizes the distance. In our analysis, we use the Euclidean distance measure, i.e., sqrt((ss1-ss’1)2 (ss2-ss’2)2 (ss3-ss’3)2). It is possible to use other distance metrics, for example, the sum of the absolute differences for each base station (the “Manhattan” distance [Cor90]) or a metric weighted by the signal strength level at each base station. We experimented briefly with these alternatives, but do not present the results here due to space limitations.

In all of our analyses, we characterize the goodness of our estimate of the user’s location using the error distance, which is the Euclidean distance between the actual (physical) location of the user and the estimated location.

4.1 Empirical Method

In this case, we use the empirical data obtained in the off-line phase (Section 3.2) to construct the search space for the NNSS algorithm. We present results of the various analyses that we performed. Unless otherwise indicated, we assume the user to be stationary.

4.1.1 Basic Analysis

For the basic analysis, we use all of the (more than 20) signal strength samples collected for each of the    70*4 = 280 combinations of user location and orientation. In the analysis, we pick one of the locations and orientations at random, and then conduct an NNSS search for the corresponding signal strength tuple in the space defined by the remaining 69 points times 4 orientations. This emulates the process of locating a (stationary) user during the realtime phase. Note that the exclusion of one of the 70 physical points from the search space would result in a somewhat worse measured accuracy than would be obtained with the RADAR system in real use. In this sense, the accuracy results presented here are somewhat conservative.

We compare the empirical method with two other methods: random selection and strongest base station selection [Hod97]. With random selection, we estimate the user’s location by picking one of the 70 points at random, regardless of the SS information. With strongest base station selection, we guess the user’s location to be the same as the location of the base station that records the strongest signal. A comparison with these simple methods enables us to evaluate how worthwhile the increased sophistication of our techniques is.

Figure 3 shows the cumulative distribution function (CDF) of the error distance for the empirical, strongest base station, and random methods. The empirical method performs significantly better than both of the other methods. Table 1 summarizes the information in the figure in terms of the 25th, 50th (median), and 75th percentile values of the error distance for each method.

Figure 3 CDF of the error in location estimation.

Considering the median (50th percentile), for instance, the empirical method has a resolution of under 3 meters, which is about the size of an office room in our building. In terms of linear resolution, it is 2.8 times better than the strongest base station method and 5.5 times better than the random method. In terms of spatial resolution, the improvement is even greater: 7.7 and 30.6 times, respectively. We use the percentile values for the empirical method in Table 1 as a basis for comparison in the rest of the analysis.

Table 1 The 25th, 50th, and 75th percentile values of the error distance. The numbers in parentheses indicate the degradation of the strongest BS and random methods compared to the empirical method.

In summary, the empirical method performs quite well. We now discuss ways of making it perform even better.

4.1.2 Multiple Nearest Neighbors

Unlike the basic analysis where we only considered the single nearest neighbor in signal space, we now consider k nearest neighbors, for various values of k. The intuition is that often there are multiple neighbors that are at roughly the same distance from the point of interest (in signal space). Given the inherent variability in the measured signal strength at a point, there is fundamentally no reason to pick only the closest neighbor (in signal space) and reject others that are almost as close.

A second, and equally important, reason for considering additional neighbors is that it is likely that the error vector (in physical space) corresponding to each neighbor is oriented in a different direction. So averaging the coordinates of the neighbors may yield an estimate that is closer to the user’s true location than any individual neighbor is. Figure 4 illustrates this for k=3 nearest neighbors.

Our experimental analysis of averaging over k nearest  neighbors shows that for small k, averaging has some benefit though not very significant. For instance, for k=5, the 25th percentile of error distance is 1.5 m (22% better than the 1.92 m in Table 1) and the 50th percentile is 2.75 m (9% better). For large k, accuracy degrades rapidly because points far removed from the true location also are included in the averaging procedure, thereby corrupting the estimate.

Figure 4 An illustration of how averaging multiple nearest neighbors (N1, N2, N3) can lead to a guess (G) that is closer to the user’s true location (T) than any of the neighbors is individually.

The reason why the benefits of averaging are not very significant even for small k is that often the k nearest neighbors in signal space are not k physically distinct points. In many instances, multiple nearest neighbors in signal space correspond to different orientations at the same point in physical space. So averaging in physical space does not improve the location estimate by very much.

4.1.3 Max Signal Strength Across Orientations

Since the dependence of signal strength on orientation creates a challenge for location estimation, we analyze how well the empirical method would perform if orientationwere not an issue. For each user location in the off-line data set, we compute the maximum SS at each base station across the four possible orientations at that location6. Note that the maximum for each base station may correspond to a different orientation of the user. The goal is to emulate the case where for each base station and user location, we first compute the mean SS for each of the four orientations at that location, and then pick the maximum among the four means. the signal generated by the mobile host is not obstructed by the user’s body. While this may not be realistic given the antenna design and positioning for existing wireless LANs, it may be possible to approximate this “ideal case” with new antenna designs (e.g., omnidirectional wearable antenna) We repeat the analysis of the previous sections with the smaller “maximum signal strength” data set of 70 data points (instead of 70*4=280 data points in the original data set). In Figure 5, we plot the 25th and the 50th percentile values of the error distance with averaging over neighbor sets of various sizes. the signal generated by the mobile host is not obstructed by the user’s body. While this may not be realistic given the antenna design and positioning for existing wireless LANs, it may be possible to approximate this “ideal case” with new antenna designs (e.g., omnidirectional wearable antenna)

We repeat the analysis of the previous sections with the smaller “maximum signal strength” data set of 70 data points (instead of 70*4=280 data points in the original data set). In Figure 5, we plot the 25th and the 50th percentile values of the error distance with averaging over neighbor sets of various sizes.

Figure 5 The error distance for the empirical method with averaging on the data set containing the max signal strength measurement for each location.

We make a couple of observations. First, just as expected, the use of the maximum SS data set improves the accuracy of location estimation slightly even in the absence of averaging (k=1). The 25th percentile value of the error distance is 1.8 m and the 50th percentile 2.67 m, 6% and 9% better, respectively, compared to Table 1. Second, averaging over 2-4 nearest neighbors improves accuracy significantly; the 25th percentile is about 1 m (48% better) and the 50th percentile is 2.13 m (28% better). Averaging is more effective here than in Section 4.1.2 because the set of k nearest neighbors in signal space necessarily correspond to k physically distinct locations

4.1.4 Impact of the Number of Data Points

We now investigate how the accuracy of location estimation would be impacted if we had data from fewer than the 70 distinct physical locations considered thus far.

For each value of n, the number of physical locations (ranging between 2 and 70), we conducted 20 runs of our analysis program. In each run, we picked n points at random from the entire data set collected during the off-line phase and used this subset to construct the search space for the NNSS algorithm. We collated the error distance data from all the runs corresponding to the same value of n (Figure 6).

For small n (5 or less), the error distance is a factor of 2 to 4 worse than when the entire empirical set containing 70 physical points is used. But the error distance diminishes rapidly as n increases. For n=20, the median error distance is less than 33% worse and for n=40, it is less than 10% worse. The diminishing returns as n becomes large is due to the inherent variability in the measured SS. This translates into inaccuracy in the estimation of physical location. So there is little benefit in obtaining empirical data at physical points spaced closer than a threshold.

Figure 6 The error distance versus the size of the empirical data set (on a log scale).

In summary, for our floor, the empirical method would perform almost as well with a data set of 40 physical points as with a set of 70 points. In practice, we could make do with even fewer points by picking physical locations that are  distributed uniformly over the area of the floor rather than at random.

4.1.5 Impact of the Number of Samples

In the analysis presented so far, we have worked with the mean of all of the samples recorded during the off-line phase for each combination of location and orientation. While it may be reasonable to construct the empirical data set with a large number of samples (since it is a one-time task), there may be constraints on the number of samples that can be obtained in real-time to determine a user’s location. So we investigate the impact of a limited number of realtime samples (while retaining the entire off-line data set for the NNSS search space) on the accuracy of location estimation. Our analysis shows that only a small number of real-time samples are needed to approach the accuracy obtained using all of the samples (Table 1). With just 1 realtime sample, the median error distance is about 30% worse than when all samples were considered. With 2 samples, it is about 11% worse and with 3 samples it is under 4% worse.

4.1.6 Impact of User Orientation

As we have already discussed, the user’s orientation has a significant impact on the SS measured at the base stations. In Section 4.1.3, we did a best-case analysis using the maximum SS across all four orientations. We now consider, in some sense, the worst case where the off-line data set only has points corresponding to a particular orientation (say north) while the real-time samples correspond to the opposite orientation (i.e., south). We compute the error distance for all four combinations of opposing directions: north-south, south-north, east-west, and west-east.

We observe a fairly significant degradation in the accuracy of location estimation. For instance, for the northsouth case, the 25th percentile of the error distance is 2.95 meters (54% worse than in Table 1) while the 50th percentile (median) is 4.90 meters (67% worse). This degradation underscores the importance of obtaining empirical data for multiple orientations to construct the NNSS search space. However, even in this worst case, the empirical method outperforms the strongest base station and random methods by a factor of 2 to 4

4.1.7 Tracking a Mobile User

In this sub-section, we analyze the problem of tracking a mobile user rather than locating a stationary user, as we have done so far. For this analysis, we collected a new SS data set corresponding to random walks by the user along the hallways of our floor. We collected 4 signal strength samples per second at each of the base stations. Assuming that the user walked at a uniform pace, we are able to determine the true location of the user at each time instant.

We reduce the problem of tracking the mobile user to a sequence of location determination problems for a (nearly stationary) user. We use a sliding window of 10 samples to compute the mean signal strength on a continuous basis. This information is then used with the basic method (Section 4.1.1) to estimate the user’s location on a continuous basis.

The error distance for tracking the mobile user is only slightly worse than that for locating a stationary user. The median error distance is 3.5 meters, about 19% worse than that for a stationary user.

4.1.8 Summary of Empirical Method

The empirical method is able to estimate user location with a high degree of accuracy. The median error distance is 2 to 3 meters, about the size of a typical office room. For our experimental environment, much of the accuracy can be achieved with an empirical data set of about 40 physical points and about 3 real-time signal strength samples (at each base station). It is important, however, that the empirical data set contain data corresponding to multiple user orientations at each location.

The main limitation of the empirical method is that significant effort is needed to construct the SS data set for each physical environment of interest (each floor, each building, etc.). Furthermore, the data collection process may need to be repeated in certain circumstances, e.g., when a base station is relocated.

We now discuss a method based on signal propagation modeling, which avoids these limitations.

4.2 Radio Propagation Model

Radio propagation modeling provides an alternative to the empirical method for constructing the search space for the NNSS algorithm.

4.2.1 Motivation

The primary motivation for the radio propagation model is to reduce RADAR’s dependence on empirical data. Using a mathematical model of indoor signal propagation, we  generate a set of theoretically-computed signal strength data akin to the empirical data set discussed in Section 3.2. The data points correspond to locations spaced uniformly on the floor. The NNSS algorithm can then estimate the location of the mobile user by matching the signal strength measured in real-time to the theoretically-computed signal strengths at these locations. It is clear that the performance of this approach is directly impacted by the "goodness” of the propagation model. In the following subsections, we develop the model and discuss the performance of location determination based on the model.

4.2.2 Determination of the model

For a radio channel, signal propagation in an indoor environment is dominated by reflections, diffraction, and scattering of radio waves caused by structures within the building. The transmitted signal generally reaches the receiver via multiple paths (termed the multipath phenomenon). Multipath causes fluctuations in the received signal envelope and phase, and the signal components arriving from indirect and direct paths combine to produce a distorted version of the transmitted signal. Since multipath within buildings is strongly influenced by the layout of the building, the construction material used, and the number and type of objects in the building, characterizing the radio channel in such an environment is challenging.

We considered three different models before settling on one. The first model was the well-accepted Rayleigh fading model [Has93], which describes small-scale rapid amplitude fluctuation in the absence of a strong received component. The Rayleigh distribution is widely used to describe multipath fading because of its elegant theoretical explanation and the occasional empirical justification. However, in deriving this distribution, a critical assumption made is that all signals reaching the receivers have equal strength. In general, this is unrealistic. Our empirical data shows that for a number of sample points (along the hallways), there exists a dominant line-of-sight (LoS) component that is not accounted for by this distribution. Hence, we did not use this distribution.

The second model we considered was the Rician distribution model [Ric44]. The Rician distribution occurs when a strong path exists in addition to the low level scattered path. This strong component may be the LoS path or a path that encounters much less attenuation than others. The Rayleigh distribution is a special case of the Rician distribution; when the strong path is eliminated, the amplitude distribution becomes Rayleigh. While the model is intuitively appealing, it is very difficult to determine the model parameters (i.e., the local mean of the scattered power and the power of the dominant component) precisely as this requires physically isolating the direct wave from the scattered components. To keep the system simple and easy to deploy, we decided against using this distribution to model the radio channel.

We found a good compromise between simplicity and accuracy in the Floor Attenuation Factor propagation model (FAF) suggested by [Sei92]. We like this model because it provides flexibility in accommodating different building layouts while taking into account large-scale path loss. We adapted the original model proposed by Seidel and Rappaport, which included an attenuation factor for building floors, to disregard the effects of the floors and instead consider the effects of obstacles (walls) between the transmitter and the receiver. The Wall Attenuation Factor (WAF) model is described by

where n indicates the rate at which the path loss increases with distance, P(do) is the signal power at some reference distance do and d is the transmitter-receiver (T-R) separation distance. C is the maximum number of obstructions (walls) up to which the attenuation factor makes a difference, nW is the number of obstructions (walls) between the transmitter and the receiver, and WAF is the wall attenuation factor. In general the values of n and WAF depend on the building layout and construction material, and are derived empirically. The value of P(do) can either be derived empirically or obtained from the wireless network hardware specifications.

Figure 7 SS as a function of T-R separation derived from the empirical data collected in Section 3.2

Figure 7 illustrates how the signal strength varies with distance between the transmitter and the receiver. The wide difference in signal strengths between points at similar distances is explained as follows: the layout of the rooms in the building, the placement of base stations, and the location of the mobile user all have an effect on the received signal. Signals transmitted from two locations at the same distance from the receiver (base station) may be attenuated by different amounts due to the differences in the number and types of obstructions they encounter. For instance, in Figure 7, we observe that the strength of the signal from two locations approximately 36 meters from the receiver (base station) were approximately 10 dBm apart. This is because there were several walls between one of the locations and the base station, while the other location had line-of-sight to the base station.

Previous work in indoor radio propagation modeling has included extensive characterization of signal loss for various materials and at different frequencies [Rap96]. However, using this information in a practical setting is difficult because the obstructing materials vary considerably in their physical and electrical characteristics. For example, water causes signal attenuation and the human body is made up of water, so the size of a human body and its orientation can result in different amounts of signal loss. It is virtually impossible to characterize such loss precisely since the number and sizes of humans in the building at any time is generally a finite but random number. Thus, despite their complexity, these extensive models fall short in terms of accuracy. So we decided to go with the much simpler WAF propagation model.

We determined the attenuation caused by a wall empirically (described next) and used this in conjunction with the number of intervening walls (determined using the Cohen-Sutherland algorithm mentioned in Section 3.3.2) to compensate for the attenuation caused by the walls.

We conducted the following experiment to determine the Wall Attenuation Factor (WAF): we measured the signal strength at the receiver when the receiver and the transmitter had line-of-sight. We then measured the signal strength with varying but known number of walls between the receiver and the transmitter. We computed the average of the difference between these signal strength values to determine the WAF. We observed that the amount of additional attenuation dropped off as the number of walls separating the transmitter and the receiver increased. This observation is consistent with [Sei92] where the attenuation between different floors was considered and shown to flatten out as the number of floors between the transmitter and the receiver increased. In general, with a large T-R separation and a large number of intervening walls, free-space path loss dominates the loss due to obstructions. Based on our measurements, we chose WAF to be 3.1 dBm and C to be 4 (where C represents number of walls that are factored into the model). Figure 8 shows the result after the measured signal strength has been compensated for signal loss due to the intervening walls between the transmitter and the receiver. We observe that the resulting plot shows a trend similar to the free-space loss trend. This demonstrates that the WAF propagation model compensates effectively for attenuation due to obstructions.

Figure 8 Effect of applying correction for intervening walls between the base station and the mobile user.

After compensating for the effect of walls and creating the “corrected” data for all three base stations, we proceeded to determine the two other parameters, n and Pdo, of our model. We reduced the propagation model to a form where it exhibits a linear relationship between the theoretically predicted signal strength and logarithm of the distance between the transmitter and the receiver, and then applied simple linear regression to determine the parameters of the model [Jai91].

Table 2 contains the numerical values of the model parameters for the three base stations considered separately and when taken together. We note that the values for the path loss exponent (n) and the reference signal strength (Pdo) for all three base stations are similar despite their different physical locations and surroundings. This result is encouraging since it indicates that the parameter values are not tied to the specific location of the base stations. The values of Pdo are higher than those published by the manufacturer [Roa96] (for d0 = 1 meter) because our WAF model does not account for multipath propagation. The values of the path loss exponent are smaller than those reported in previous work on indoor radio propagation modeling [Rap96]. However, they are consistent with our expectations since we compensate the measured signal strength for attenuation due to obstructions, and since we do not consider multipath (which can boost the signal strength at a given location). R2 represents the coefficient of determination, which is a useful measure for indicating the goodness of regression [Jai91]. The high values of R2 (on a scale of 0 to 1) suggest that there is a good match between the estimated and the measured values of the signal strength. Another value of interest shown in the table is the mean squared error (MSE). These numbers reinforce the observation that the WAF propagation model fits the measured data well.

Table 2 Parameter estimates using linear regression

The final column in Table 2 shows the values for Pdo and n when the data from all the transmitter-receiver pairs (i.e., all three base stations) was combined. The motivation for this was to determine a value of Pdo and n that could be used for all base stations without overly affecting the result. The advantage of using a common value is that it avoids the need for individual measurements of each base station as they are installed in the network, thus greatly reducing the cost of system setup. We can then use these values to estimate the signal strength at various points within the building.

Figure 9 illustrates how the predicted values of the signal strength generated with the propagation model (after compensating for wall attenuation) compares with the actual measurements. We observe a good match between the two. While this plot is for one of the three base stations, plots for the other two base stations exhibit a similar match.

4.2.3 Results using the Propagation Model

To determine the performance of location estimation with the signal propagation modeling method, we used the model to compute the signal strength at a grid of locations on the floor. We then used this data set as the search space for the NNSS algorithm.

Considering the median (50th percentile), the propagation method provides a resolution of about 4.3 m, compared to a resolution of 2.94 m for the empirical method and 8.16 m for the strongest base station method (Table 1). For the 25th percentile the propagation method provides a resolution of 1.86 m compared to 1.92 m for the empirical method and 4.94 m for the strongest base station method.

Figure 9 Predicted versus measured signal strength.

While the propagation method is not as accurate as the empirical method, it is significantly better than the strongest BS and random methods. Thus, even without extensive empirical measurements, RADAR based on the propagation model alone would significantly outperform the strongest base station method proposed in [Hod97].

4.2.4 Summary of Radio Propagation Method

The WAF propagation model provides a cost effective means for user location and tracking in an indoor RF wireless network. The model is cost effective in the sense that it does not require detailed empirical measurements to generate a signal strength map and consequently has a low set up cost. A significant result from Section 4.2.2 is that the parameters for the wall attenuation propagation model are similar across base stations despite the latter being in different locations. This suggests that the entire system can be relocated to a different part of the building, but the same parameter values can be used to model propagation and thereby determine a user’s location.

## Discussion and Future Work

We discuss extensions to the RADAR system that would help improve its robustness and accuracy. Due to space constraints, we keep our discussion brief.

We are investigating how user-mobility profiles can supplement signal strength information in locating and tracking users. A profile specifies a priori likelihood of user location and/or movement patterns, which can be derived from history [Liu98], calendar information, building layout, etc.

We are also investigating base station-based environmental profiling to make RADAR robust in the face of large-scale variations in the RF signal propagation environment (caused, for instance, by the varying number of people in a building during the course of a day). Instead of recording just one set of signal strength measurements, we record multiple sets at different times of the day. The base stations probe the channel periodically to determine the current conditions, and accordingly pick the data set that is most appropriate for these conditions.

6  Conclusions

In this paper, we have presented RADAR, a system for locating and tracking users inside a building. RADAR is based on empirical signal strength measurements as well as a simple yet effective signal propagation model. While the empirical method is superior in terms of accuracy, the signal propagation method makes deployment easier.

We have shown the despite the hostile nature of the radio channels, we are able to locate and track users with a high degree of accuracy. The median resolution of the RADAR system is in the range of 2 to 3 meters, about the size of a typical office room.

Our results indicate that it is possible to build an interesting class of location-aware services, such as printing to the nearest printer, navigating through a building, etc., on an RF wireless LAN, thereby adding value to such a network. This, we believe, is a significant contribution of our research.

Our eventual plan is to combine location information services with the RADAR system and deploy this within our organization.

## Acknowledgements

We would like to thank Stephen Dahl for his help in setting up our experimental testbed, and the anonymous reviewers for their perspicacious comments.

## References

[ATC98] http://www.ascension-tech.com

[Azu93] R. Azuma, “Tracking Requirements for Augmented Reality,” Communications of the ACM, Vol. 36, No. 7, pp: 50- 51, July 1993

[Ber96] S. Berchtold, D. A. Keim, H. P. Kriegel, “The X-tree: An Index Structure for High-Dimensional Data,”, Proc. VLDB, 1996

[CG93] T. W. Christ, P. A. Godwin, "A Prison Guard Duress Alarm Location System", Proc. IEEE International Carnahan Conference on Security Technology, October 1993

[Cor90] T. H. Cormen, C. E. Leiserson, R. L. Rivest, “Introduction to Algorithms,” The MIT Press, 1990

[Fol90] Foley, van Dam, Feiner, and Hughes, Computer Graphics Principles and Practice, 2nd Edition, Addison-Wesley, 1990

[GPS99] P. Enge, and P. Misra, “Special Issue on GPS: The Global positioning System,” Proc. of the IEEE, pp. 3-172, January 1999

[Gut84] A. Guttman, “R-Trees: A Dynamic Index Structure for Spatial Searching,” ACM, 1984

[Har94] A. Harter and A. Hopper, “A Distributed Location System for the Active Office,” IEEE Network, January 1994

[Has93] H. Hashemi, “The Indoor Radio Propagation Channel,” Proceedings of the IEEE, Vol. 81, No. 7, pages 943-968 July 1993

[Hod97] T. D. Hodes, R. H. Katz, E. S. Schreiber, and L. Rowe, “Composable Ad Hoc Mobile Services for Universal Interaction,” MobiCom ’97 Proceedings, pp: 1-12, September 1997

[Jai91] R. Jain, The Art of Computer Systems Performance Analysis, John Wiley, 1991

[Liu98] T. Liu, P. Bahl, "Mobility Modeling, Location Tracking, and Trajectory Prediction in Wireless ATM Networks", IEEEJSAC, Vol. 16, No. 6, pp. 922-936, August 1998

[Maa97] H. Maass, “Location-Aware Mobile Applications based on Directory Services,” MobiCom ’97, pp. 23-33, September 1997

[Nel98] G. J. Nelson, “Context-Aware and Location Systems,” Ph.D. Theses, University of Cambridge, Computer Lab., UK., Jan. 1998

[Ous94] J. K. Ousterhout, “Tcl and the Tk Toolkit,” Addison- Wesley Professional Computing Series, 1994

[Ric44] S. O. Rice, “Mathematical analysis of Random Noise,” Bell Systems Technical Journal, Vol. 23 (1944), & Vol. 24 (1945)

[Rap96] T. S. Rapport, Wireless Communications – Principles and Practice, IEEE Press, 1996

[Roa96] “Digital RoamAbout 915/2400 DS/PC Card and ISA Network Adapter: Installation and Configuration,” DEC, April 1996

[Sei98] T. Seidl, H. P. Kriegel, “Optimal Multi-Step k-NearestNeighbor Search,” Proc. ACM SIGMOD, 1998

[Sei92] S. Y. Seidel and T. S. Rapport, “914 MHz path loss prediction Model for Indoor Wireless Communications in Multi-floored buildings,” IEEE Trans. on Antennas & Propagation, Feb. 1992

[Tek98] S. Tekinay, “Wireless Geolocation Systems and Services,” Special Issue of the IEEE Communications Magazine, April 1998

[Wal96] L. Wall, T. Christiansen, R. L. Schwartz, “Programming Perl,” O’Reilly & Associates, Inc., 1996

[Wan92] R. Want, A. Hopper, V. Falcao, J. Gibbons., “The Active Badge Location System,” ACM Transactions on Information Systems, Vol. 40, No. 1, pp. 91-102, January 1992

To View:

Video - 1

Video - 2

Video - 3

Video - 4

## Forum for RADAR- An In-Building RF-based User Location and Tracking System

Forum for RADAR-  An In-Building RF-based User Location and Tracking System

Forum not available

## Accuracy Characterization of Cell Tower Localization

ABSTRACT

Cell tower triangulation is a popular technique for determining the location of a mobile device. However, cell tower triangulation methods require the knowledge of the actual locations of cell towers. Because the locations of cell towers are not publicly available, these methods often need to use estimated tower locations obtained through wardriving. This paper provides the first large scale study of the accuracy of two existing methods for cell tower localization using wardriving data. The results show that naively applying these methods results in very large localization errors. We analyze the causes for these errors and conclude that one can localize a cell accurately only if it falls within the area covered by the wardriving trace. We further propose a bounding technique to select the cells that fall within the area covered by the wardriving trace and identify a cell combining optimization that can further reduce the localization error by half.

Author Keywords

Cell localization, cell tower, received signal strength

ACM Classification Keywords

K.8.m Personal Computing: Miscellaneous

General Terms

Experimentation,Measurement, Performance

INTRODUCTION

Many mobile applications rely on cell tower triangulation to determine their position [4]. Even when Global Positioning System (GPS) is available, cell tower triangulation offers a significantly faster time to first fix and lower energy consumption. Cell tower triangulation methods [9, 5], however, require the knowledge of the actual locations of cell towers. This is also the case for many more general localization techniques such as Particle filters [7], lateration [10] and Bayesian networks [11]

Although there are several public sources of cell tower locations, we found these sources to be both incomplete and inaccurate [1, 2, 3]. For instance, in the area we studied, these websites had data about less than 10% of cell towers from the service provider that we studied. Because the locations of cell towers are not readily available, a common way to estimate cell tower positions is through wardriving [5]. In wardriving, a vehicle drives within the target area, recording signals emanating from nearby cell towers (or WiFi access points) and the locations these signals were received at [8]. Using this dataset, one can estimate the locations of cell towers with algorithms such as weighted centroid [5, 6] or strongest received signal strength (strongest RSS) [12]. It has been shown for WiFi localization algorithms that the accuracy of such estimated access point locations directly affects the accuracy of mobile device localization algorithms [8]. However, weare not aware of any prior work that studied cell tower localization algorithm performance at a large scale.

Validating these algorithms requires the knowledge of the actual locations of cell towers. Chen et al. [5] used the weighted centroid method based on the GSM wardriving data. They reported a cell localization accuracy by physically visiting 6 cell towers, which is not representative in a metropolitan area. Varshavsky et al. [12] used the strongest RSS method to localize cell towers for their study, but did not validate the cell localization accuracy. The closest work to ours is that of Kim et al. [8], in which they studied the accuracy of estimated  WiFi access point (AP) locations from wardriving data and compared them with the actual AP locations.

They showed that the estimated AP locations have a median error of 40 meters and that this error has a significant effect on the accuracy of estimating locations of mobile users. In contrast, we study the accuracy of cell tower localization algorithms and show that the different cellular radio characteristics such as frequency, antenna height and propagation environment, make the problem of cell tower localization different fromWiFi AP localization

To perform the study, we obtained access to the actual locations of cell towers in the greater Los Angeles area and a wardriving trace that covers a total area of 1396.2 km2 in the downtown, residential and rural areas of Los Angeles covered by 54 cell towers, each with multiple cell sectors. Our study of cell tower localization algorithms on this dataset lead to the following contributions: (a) We characterize the

performance of the weighted centroid and the strongest RSS algorithms in the greater Los Angeles area and show that if used naively these algorithms have very large localization errors of more than 40km. (b)We show that one can hope to localize a cell tower accurately only if it falls within the area covered by the wardriving trace. (c) We propose a bounding technique to select the towers that fall within the area covered by the wardriving trace and study their performance.  (d) Finally, we identify a cell combining optimization that can further reduce localization errors by half

DATA DESCRIPTION

We obtained access to a wardriving trace that covers three areas in the greater Los Angeles area. The Downtown trace covers an area of 3.5km×4.2km in the downtown Los Angeles. The Residential trace covers an area of 6.3km×17km in the southern part of the Los Angeles County. The Rural trace covers an area of 35.4km×36km in the Victor Valley of San Bernardino County.

The wardriving trace was collected over a period of 2 months in February and March of 2009. The GSM signal strength measurements and their locations were recorded every 2 seconds and the speed of the car averaged about 32kmph. In total, we have 2,613,465 received signal strength (RSS) readings from 105,271 unique locations, resulting, on average, in 24.8 RSS readings from different cells per location. Each cell tower has 2, 3 or 6 cells attached to it, depending on the characteristics of the area and the coverage requirements. We know which cells belong to which cell tower and the actual location of each cell tower.

EVALUATION OF CELL LOCALIZATION ALGORITHMS

In this section, we describe two cell tower localization algorithms that have been used in the recent literature [12, 5, 6] and evaluate their performance on our wardriving trace.

Strongest RSS: Strongest RSS estimates a cell’s location as the location of the measurement with the strongest observed RSS from that cell. This approach works well when a cell is located close to the road where wardriving measurements were collected, but often fails otherwise.

Weighted Centroid: The Centroid algorithm estimates the cell’s location as the geometric center of the positions of the measurements where the cell was observed at. We used the Weighted Centroid method, in which the coordinates of each position are weighted by the signal strength observed at that

position. The accuracy of the Weighted Centroid algorithm is sensitive to the density and homogeneity of measurements around the cell.

Evaluation: We tested the performance of both the Strongest RSS and Weighted Centroid algorithms on our wardriving traces in the three areas. Figure 1 plots the cumulative distribution function (CDF) of the difference between the estimated and the actual cell tower locations. Note that the error showed on the X-axis is in kilometers.

The figure shows that both Strongest RSS and Weighted Centroid perform very poorly. The median error in the Downtown area is 2.75kmfor Strongest RSS and 2.83km for Weighted Centroid. Interestingly, the Strongest RSS out performs Weighted Centroid significantly in the rural area, achieving 0.7km median error vs. 7km, respectively. We believe this is because many of the cells in the rural area are located near roads and Strongest RSS seems to work well in this case.

Our conclusion is that blindly applying these algorithms to estimate cell tower positions results in very large errors. In the next section, we investigate the causes for these large errors.

BOUNDING TECHNIQUE

To investigate what causes the large errors shown in Figure 1, we looked at the relationship between the strongest signal strength at which a cell could be heard and the localization error of the Strongest RSS algorithm for that cell. The results for the residential area are plotted in Figure 2. The figure shows that once the strongest observed RSS drops below roughly -60dBm, the localization error of the Strongest RSS algorithm increases significantly. Moreover, once the RSS drops below roughly -60dBm, the correlation between the strongest received signal strength and the distance to the cell also becomes weak, meaning that it is hard to predict the distance to the cell by relying just on the received signal strength.

Our conclusion is that both the Strongest RSS and the Weighted Centroid cannot estimate accurate locations for all the cells they observe because the estimated locations produced by both these algorithms are, by design, located within the wardriving area. Therefore, cells that are located outside the wardriving area cause large localization errors

Figure 2 shows the cells that fall inside the Residential area

as crosses (denoted as inside cells) and cells that fall outside the area as triangles (denoted as outside cells). The key observation is that the outside cells are the cause for large localization errors for the Strongest RSS algorithm and most of them have the strongest RSSs lower than -60dBm. Unfortunately, filtering cells using a simple threshold on the strongest RSS does not work well. First, taking a closer look at Figure 2 (b) reveals that some outside cells have the strongest RSS values that are higher than -60dBm, but still cause large errors for the Strongest RSS algorithm. Second, applying the threshold of -60dBmcuts out some of the inside cells as well.

The challenge is then to determine which cells are inside cells by just looking at the wardriving trace without having access to the actual locations of the cells. To address this challenge, we propose the bounding technique, which has three steps: RSS Thresholding, Boundary Filtering, and Tower-based Regrouping.

RSS Thresholding: Based on our findings presented in Figure 2, we filter out all cells whose strongest RSS is lower than a certain cutoff threshold. The purpose of applying this filtering is to eliminate as many outside cells as possible, while still keeping most of the inside cells. We use the threshold of -60dBm in this paper.

Boundary Filtering: The boundary filtering technique is based on the observation that the outside cells will have their strongest RSS values on the boundary or the perimeter of the wardriving area. This is because the nearest point from the outside cell to the wardriving area is the wardriving boundary. We tested this hypothesis by plotting the locations of the outside cells relative to the wardriving area and validated that this is indeed true.

Unfortunately, applying both the RSS Thresholding and the Boundary Filtering may still leave some of the outside cells in and filter some of the inside cells out due to RSS fluctuations. For instance, certain outside cells may have measurements of strongest RSS readings reside within the wardriving area and pass both the RSS Thresholding and the Boundary Filtering steps.

Tower-based Regrouping: The Tower-based Regrouping technique takes advantage of the relationship between a cell tower and its cells. We discovered two ways to identify cells belonging to the same cell tower. First, we found that clustering cells geographically based on their estimated positions can identify which cells belong to the same cell tower when the distance between cell towers is large. This technique works well in Residential and Rural areas, but often fails in the Downtown area due to the closer distance between cell towers. Second, we discovered that the prefix of a cell ID up to the last digit is the same for all cells belonging to the same cell tower, at least in the dataset that we studied. We used the latter technique in this paper.

The basic idea of Tower-based Regrouping is that although there may be some outside cells passing both RSS Thresholding and the Boundary Filtering steps, the other cells belonging to the same cell tower will most likely fail these two tests. Thus, if most of the cells belonging to the same cell tower are filtered out, the Tower-based Regrouping step filters out the outlier outside cell as well. Similarly, if most inside cells passed the RSS Thresholding and the Boundary Filtering steps, the outlier that was eliminated is added back. We found this technique to be very successful at determining inside cells.

Evaluation: We tested our bounding technique on the wardriving traces in the three areas and found that it eliminated all outside cells and kept all inside cells when the RSS threshold was between -67 and -58 dBm. In comparison, the basic RSS threshold technique using -60dBm, cuts out 20% of inside cells and leaves in 12% of outside cells in the downtown area. We also tested the performance of the Strongest RSS and the Weighted Centroid algorithms on the inside cells identified by the bounding technique. Figure 3 shows the CDF of the localization error for the Strongest RSS and Weighted Centroid algorithms in the Downtown, Residential and Rural areas. The results show that the performance of both algorithms has dramatically improved, with Strongest RSS significantly outperforming Weighted Centroid. The median error of the Strongest RSS algorithm in the Residential area is 139m vs. 1357m for the Weighted Centroid. In the next section, we show how the cell to cell tower relationship can help improve the localization accuracy even further.

CELL COMBINING OPTIMIZATION

So far, we estimated the positions of cells belonging to the same cell tower independently from each other, even though they share the same physical location. In this Section, we show that merging wardriving traces of cells that share a common cell tower into a single trace and estimating the position of the cell tower itself can improve localization results significantly.

During merging of traces, if a measurement contained readings from several cells belonging to the same cell tower we chose the readingwith the strongest RSS. The effect ofmerging traces of cells belonging to the same cell tower into a single trace is illustrated in Figure 4. The figure shows that merging traces removes the measurement bias due to the directionality of a cell and results in a homogeneous coverage of the area around the cell tower. However, the improvements in the localization accuracy for the Strongest RSS and Weighted Centroid algorithms come from different sources. Strongest RSS performs better because in a merged trace the locations of all cells belonging to the same cell tower are estimated at the position where any one of the cells was heard the strongest. Weighted Centroid, on the other hand, performs better because the homogenous coverage of the area around the cell tower allows it to calculate the cell location by utilizing positions of measurements in all directions around the cell tower.

Figure 5 shows the median error of Strongest RSS and Weighted Centroid in the Residential area before and after applying the cell combining optimization on top of the bounding technique while varying the number of strongest K RSS readings used by the algorithms. We make three observations. First, before the cell combining, Strongest RSS always performs better than Weighted Centroid. This is caused by the non-homogeneity of measurements around the cell, which has a big adverse effect on Weighted Centroid. Second, the localization results after the cell combining significantly outperform those before the cell combining. Weighted Centroid benefits more from the integrated signal measurement map and achieves a very low medium error around 55m when K = 60. Third, there is a tradeoff between RSS quality and RSS quantity. Figure 5 (b) shows that the localization error of Weighted Centroid presents a decreasing trend first and then increases as K increases. The lowest medium error of 55m is achieved when K = 60.

Finally, we present the localization accuracy improvement after applying the cell combining optimization in Table 1. In all three areas, we observed over 30% accuracy improvement in the median error and above 33% improvement in the 90th percentile error, indicating that the cell combining optimization can improve the cell localization accuracy significantly.

CONCLUSION

Accurately estimating locations of cell towers is important for many existing mobile phone localization algorithms. In this work, we conducted the first large scale study of the accuracy of the popular Strongest RSS and Weighted Centroid algorithms based on a large wardriving trace that covers downtown, residential and rural areas around greater Los Angeles. We showed that naively applying these algorithms results in very large localization errors. We analyzed the causes for these errors and concluded that one can hope to localize a cell accurately only if it falls within the area covered by the wardriving trace. We further proposed a bounding technique to select the cells that fall within the area covered by the wardriving trace and studied its performance. Finally, we identified a cell combining optimization that can further reduce localization errors by half.

## Acknowledgements

This research was supported in part by NSF grant CNS-0954020.

## Reference

1. http://www.antennasearch.com/.

2. Mobiledia. http://www.cellreception.com/.

3. Opencellid. www.opencellid.org.

4. Skyhook Wireless. http://www.skyhookwireless.com/.

5.M.Y. Chen et al. Practical metropolitan-scale positioning for gsm phones. In Ubicomp,     pages 225–242, 2006.

6.Y.-C. Cheng, Y. Chawathe, A. LaMarca, and J. Krumm. Accuracy characterization for      metropolitan-scale Wi-Fi localization. In ACM Mobisys, pages 233–245, 2005.

7. J. Hightower and G. Borriello. Particle filters for location estimation in ubiquitous            computing computing: A case study”. In Ubicomp, pages 88–106, 2004.

8. M. Kimetal. Risks of using AP locations discovered through war driving. In Pervasive,        pages 67–82, 2006.

9. A. LaMarca et al. Place lab: Device positioning using radio beacons in the wild. In             Pervasive, pages 116–133, 2005.

10. K. Langendoen and N. Reijers. Distributed localization in wireless sensor networks:          a quantitative comparison. Comput. Networks, 43(4):499–518, 2003.

11. D. Madigan, E. Einahrawy, R. Martin, W.-H. Ju, P. Krishnan, and A. Krishnakumar.            Bayesian indoor positioning systems. In IEEE INFOCOM, pages 324–331, 2005.

12. A. Varshavsky, D. Pankratov, J. Krumm, and E. Lara. Calibree: Calibration-free                  localization using relative distance estimations. In Pervasive, pages 146–161, 2008.

To View:

## Forum For Accuracy Characterization of Cell Tower Localization

Forum For Accuracy Characterization of Cell Tower Localization

Forum not available

## Spot Localization using PHY Layer Information

ABSTRACT

This paper explores the viability of precise indoor localization using physical layer information in WiFi systems. We find evidence that channel responses from multiple OFDM subcarriers can be a promising location signature. While these signatures certainly vary over time and environmental mobility, we notice that their core structure preserves certain properties that are amenable to localization. We attempt to harness these opportunities through a functional system called PinLoc, implemented on off-the-shelf Intel 5300 cards. We evaluate the system in a busy engineering building, a crowded student center, a cafeteria, and at the Duke University museum, and demonstrate localization accuracies in the granularity of 1m x 1m boxes, called “spots”. Results from 100 spots show that PinLoc is able to localize users to the correct spot with 89% mean accuracy, while incurring less than 6% false positives. We believe this is an important step forward, compared to the best indoor localization schemes of today, such as Horus.

Categories and Subject Descriptors

C.2.1 [Network Architecture and Design]: Wireless communication

General Terms

Design, Experimentation, Performance

Keywords

Wireless, Localization, Cross-Layer, Application

1. INTRODUCTION

Precise indoor localization has been a long standing problem. While the frontier of localization technology has advanced  over time, new kinds of location based applications are raising the bar. For instance, the advertising industry is beginning to expect location accuracies at the granularity of an aisle in a grocery shop [1]. Museums are expecting user locations at the granularity of paintings [2], so tourists can automatically receive information about the paintings they stop at. In addition to such high accuracy demands, these applications are inherently intolerant to small errors. If a localization scheme incorrectly places a user in the adjacent aisle in the grocery store, or displays information about the adjacent painting, the purpose of localization is entirely defeated. This is unlike traditional applications – say GPS based driving directions where small errors are tolerable. As a consequence, new localization schemes will need to meet strict quality standards, without incurring a heavy cost of installation and maintenance. We refer to this problem as spot localization, where a device in a specific 1m x 1m box needs to be accurately identified. Localizing the device outside the box will be useless, irrespective of whether the estimated location is close or far away from the box.

The state of the art in indoor localization is quite sophisticated. Different schemes optimize distinct objectives, including accuracy [3–5], computation [4, 6], ease of calibration [7, 8], energy [9], etc. While the literature is rich, we sample few of the representative schemes to outline the frontier of today’s location technology. Cricket [10] achieves high accuracy using special (ultrasound-based) infrastructure installed on ceilings. Noting the difficulties of installing special hardware, RADAR, Place Labs and Horus [4, 6, 8] explored the feasibility of using signal strengths from existing WiFi APs. While RADAR and Horus both rely on signal calibration, EZ [7] recently demonstrated the ability to eliminate calibration at the expense of accuracy degradation. Summarizing all these schemes, we find that the state of the art achieves median location error of 4m and 7m, with and without calibration, respectively [7]. While this accuracy can enable a variety of applications, there are others that need precision at the granularity of “1m x 1m”. This paper targets such high accuracies while ensuring that the calibration complexity is no worse than RADAR or Horus. We call our proposal PinLoc, as an acronym for Precise indoor Localization.

PinLoc’s main idea is simple. While most WiFi based localization schemes operate with signal strength based information  at the MAC layer, we recognize the possibility of leveraging detailed physical (PHY) layer information. Briefly, the intuition is that the multipath signal components arrive at a given location with distinct values of phase and magnitude. When aggregated over multiple OFDM sub-carriers in 802.11 a/g/n, these rich data poses as a fingerprint of that location. Since we define each spot as a cluster of locations, war-driving each spot produces an array of location fingerprints. A training algorithm runs on each array of fingerprints to learn the statistical attributes of that spot. Later, when a mobile device arrives at a spot, it computes a fingerprint (from a sequence of overheard beacons), and classifies it to one of the spots by matching against the learnt attributes. We find that devices are reliably classified to the correct spot, despite movements of people and other objects in the environment. Our war-driving effort is comparable to RADAR or Horus we mounted a laptop on a Room ba robot and programmed it to move randomly within each spot for around 4 minutes. Finally, where several other schemes are strongly reliant on multiple APs, PinLoc offers reasonable performance even in WiFi-sparse environments. In some cases, PinLoc is able to localize even with signals from a single AP.

At first glance, our findings seemed too good to be true. We expected the signal phases to be sensitive to the orientation of the laptop, human movements, and/or structural changes in the environment (such as repositioning of chairs, boxes, shelves). We suspected that frequent war-driving would be necessary to adapt to such environmental perturbations. While these concerns were natural, we were surprised to find that the fingerprints actually preserved statistical properties even under perturbations. For instance, although the channel response at a specific location varied with time and environmental dynamism, they could be consistently organized around a set of few tight clusters. When combined across 30 subcarriers and different APs (i.e., high-dimensional data), we found that even the sets of clusters could be reasonably unique. Further, since spots are composed of many “distinct locations”, the fingerprint of a spot is a string of channel responses from multiple distinct locations inside that spot. Thus, even if the channel response from one distinct location is not unique, the probability that the string of channel responses appears in more than one spot is far lower. These and other factors (discussed later) together contribute to PinLoc’s robustness. RSSI, on the other hand, is an average of the magnitudes on each sub-carrier, which hides fine-grained information about that location, ultimately limiting the accuracy of localization.

Harnessing the above opportunities into a working system (using off-the-shelf wireless cards) forms the core of PinLoc. The detailed PHY layer information is first extracted from the driver and sanitized using a phase correction operation. The sanitized parameters are then fed to a machine learning algorithm that models the channel response distribution. Later, during system tests, the channel parameters are extracted from received WiFi beacons, and classified to one of the wardriven spots. To address energy issues, PinLoc disables active scanning, and only uses beacons from APs in the same channel. Finally, the individual modules are combined into a full system, and tested over a variety of scenarios. The results are promising – with less than 4 minutes of wardriving perspot, we observe 89% mean accuracy and false positives consistently below 6%. From the application’s perspective, Pin Loc was tested in the modern art wing of Duke University’s museum. Spots in front of each of 10 paintings were localized with high accuracy.

To the best of our knowledge, no prior work has demonstrated PHY layer-based WiFi localization on off-the-shelf platforms. Zhang et. al. [11] used signal amplitudes and phases on USRP platforms to demonstrate location distinction. We note that location distinction detects when a node’s location has changed (e.g., for security purposes), but does not need to establish uniqueness for each location. Localization is naturally a far stricter problem, especially when the target is sub-meter accuracies. PinLoc makes an early effort towards this goal – the main contributions may be summarized as follows.

• We target the problem of spot localization where success is defined as the ability to place a device within a 1m x 1m area, called spots. We break away from RSSI based schemes and explore the feasibility of using detailed PHY layer information.
• We utilize the per-subcarrier frequency response as features of a location, and rely on machine learning algorithms to classify a device to one of the trained spots. We use off-the-shelf Intel 5300 cards; the entire system relies on existing WiFi deployments, and requires no special installation.
• We evaluate PinLoc at varying accuracy standards, namely, discriminating between seats in a lab, chairs in a cafeteria, and adjacent paintings in a museum. We observe consistent accuracies under mobile/dynamic environments, outperforming Horus [4], the most accurate RSSI based localization

The subsequent sections expand on each of these contributions, beginning with definition and applications, followed by measurement, design, and evaluation.

2. LOCATIONS, SPOTS, AND APPLICATIONS

The above section loosely used the terms “locations” and “spots” – we clearly define them here. Locations are like pixels that define the resolution of our localization system. Each location is a small area that has a unique fingerprint. As we show later, the “size” of a location is approximately 2cm 2cm. Spots are larger boxes, say 1mx1m, and composed of multiple locations. We will see later how the fingerprint of a spot is essentially a combination of fingerprints from all locations in that spot.

Apps for Spot Localization.

The ability to localize users to the granularity of a spot is obviously useful — for instance, precisely tracking a user’s indoor location can empower numerous applications. What might be less obvious is that a reasonable number of applications may be enabled even if only a few spots are reliably identified. For instance: (1) Advertising agencies may post discounts on to the user’s phone when she pauses in front of select products in the store. (2) Spot localization may be applicable to geofencing – students at different desks may see different sets of exam questions. (3) Logical locations [12] refer to places (Starbucks, Airport, public library) as opposed to geographic coordinates (latitude/longitude from GPS). Since such places are often adjacent to each other, separated only by a wall, it has been difficult to tell the exact place in which a person is located. Spot localization may be able to detect when a user enters/exits through a door, thereby identifying the place of the user. In a related application, a blind person could come to an approximate location and be prompted with “the door in front of you is Dr. Brown’s eye clinic”. This may also aid security applications, the content from a server may be downloadable only when a person is inside a restricted area [13]. (4) Even identifying which aisle a person is in could be achieved if the entry and exit points of an aisle are spot-localized. We argue that PinLoc is particularly amenable to these applications since the war-driving effort gets proportionally reduced with fewer spots. While the same reduction may apply to all war-driving based schemes in general, their error margins ma y still be an issue. Horus [4], for example, incurs 4m mean error, which may not discriminate between adjacent grocery aisles or between two adjacent wide-screen TVs at Best Buy. PinLoc is tasked to achieve this discrimination.

## HYPOTHESES AND MEASUREMENTS

This section aims to show through experiments that PHY layer channel information from existing WiFi deployments can be an indicator of location. We identify the essential hypotheses that must hold, and verify them through measurements. The next section draws on the findings to design and implement the machine learning components in PinLoc. We begin our discussion with a brief background on channel frequency response.

3.1 Background

Most modern digital radios use OFDM communication, and transmit signals across orthogonal subcarriers at different frequencies. Each transmitted symbol X ( ) is modulated on a different subcarrier f, and the quality of the received symbol Y () will depend on the channel H( ):

Y ( ) = H( f) X ( )                               (1)

Vector H = (H( )) =1; ;F is called channel frequency response (CFR), and it is a complex vector that describes the channel performance at each subcarrier. A 802.11 a/g/n receiver implements F = 48 data sub-carriers, and includes a channel estimation circuit as a part of the hardware. The Intel 5300 [14] card, released recently with a publicly downloadable driver, exposes the per-subcarrier CFR to the user. Figure 2(a,c) shows examples of some CFR vectors.

Two important properties of the CFR are of interest in PinLoc. (1) The CFR changes entirely once a transmitter or a receiver moves more than a fraction of a wavelength [15]. Since the WiFi wavelength is about 12 cm, the CFR offers a possibility to discriminate between two nearby locations. (2) Even when the device is static at a specific location, the CFR experiences channel fading due to changes in the environment at different time-scales. This introduces randomness in the CFRs, injecting ambiguity in signatures. However, it is unclear whether this randomness is completely unpredictable or whether it exhibits some statistical structure that lends itself to localization. The following hypotheses and measurements are designed to answer these and related questions.

3.2 Hypotheses

We present 3 main hypotheses that need to hold if CFRs are to be used for PinLoc.

1. The CFRs at each location appear random, but actually  exhibit a statistical structure.      This structure is preserved over time and environmental changes.

2. The “size” of the location (over which the CFR structure is defined and preserved) is          small.

3.The CFR structure of a given location is different from the structures of all other               locations, with high probability. The probability increases when aggregating the               CFRs from multiple APs.

Towards verifying these hypotheses, we first describe our testbed environment and experiment methodology, followed by the measurements and findings.

3.3 Experiment Methodology

Our initial experiments were performed in a relatively busy engineering building (see floorplan in Figure 1). We consider a set of 15 different spots in our lab and the adjacent classroom. The center of these spots are approximately 2m apart from each other. We place a laptop equipped with the Intel 5300 WiFi card [14] at each of these spots, and associate them to existing WiFi APs (the same set of WiFi APs are audible from each of these spots). The laptops are made to download packets through each of the nearby APs (using iPerf) – the corresponding channel frequency responses (CFR) are recorded for each packet. The Intel 5300 card firmware exports a subset of the CFRs (30 out of 48 data subcarriers), and we only use these for our scheme.

Figure 1: Engineering building floorplan. Different sets of spots shown in different colors – our initial measurements in this section uses only one set of 15 spots (shown in green).
.
For each location we perform 4-6 measurements at different times during busy office hours. During the measurements in the lab (a 10m x 10m area), there were between 3 to 5 people who frequently walked in and out. Classroom measurements were performed during and between classes (classroom capacity of 24 seats) – one measurement coincided with all students exiting the classroom at the end of a class.

Figure 2: (a) The amplitudes and phases of the channel responses H of 50 (out of 20000) packets sent over the same link (we see two unique clusters, U1 and U2); (b) PDF of the complex value of the same 20000 channel responses H(f) for a single subcarrier f = 20; (c) The amplitudes and phases of the channel responses H of 50 packets at a different client location.

## Measurement and Verification

(Hypo. 1) The CFRs at each location appear random but actually exhibit a statistical structure over time.

Testing on a Single Location: Figure 2(a) shows the channel frequency responses (CFR) recorded on a laptop at a fixed location (the laptop received 20; 000 packets from a specific AP over a period of 100s, but for visual clarity, we only show 50 CFRs from 50 randomly selected packets). We observe two emerging clusters, denoted with two vectors U1 and U2 – CFRs belonging to the same cluster are not identical but appear as noisy realizations of the cluster mean. This is an outcome of fading, caused by different electro-magnetic propagation effects and/or environmental changes.

We now take subcarrier f = 20, gather all its CFRs from all 20; 000 packets, and plot the empirical probability density function (PDF) in Figure 2(b). The CFRs are complex numbers, and hence we plot the Real (Re) and Imaginary (Im) values on X and Y axes – darker colors represent higher values of the PDF. We again see that two dominant clusters emerge,each cluster appearing as a complex Gaussian random variable, with means U1(f) and U2(f) and variances V1(f) and V2(f), respectively. Of course, this is only a visual indication – we will carefully model this later in section 4.2.

Figure 2(c) shows the outcome of the same experiment, but with the laptop placed at a different location. We find only a single cluster of CFRs, and the shapes of the CFRs differ distinctly from those in Figure 2(a). These few representative clusters hint at the possible existence of complex but invariant structures in per-location CFR, motivating further investigation. Figure 3 pictorially explains the cluster computation process.

Temporal stability of CFR Clusters: We now test whether the observations from these two locations generalize to a larger number of locations, under various environmental changes. Figure 4 shows the distribution of the number of representative CFR clusters from 30 distinct links in total. For each of the 30 links we aggregate all the available data, collected over 2 days – this adequately captures the links’ temporal fluc-

Figure 3: The process of creating clusters (which together form the location fingerprint), and how test CFRs are cross-correlated with the fingerprint.

tuations. We use the clustering algorithm, described later in section 4.3, to identify representative CFR clusters. Evident from Figure 4 (a), more than 80% of links experience 4 CFR clusters or less. However, we still see a substantial number of links with a large number of clusters, even up to 19. This could well suggest that the CFR structure is quite random in dynamic scenarios (e.g., in the classroom), and thus, PinLoc may only be applicable in very static environments.

To verify this, we next look at frequency of occurrence of different clusters. Figure 4(b) shows that the distribution is highly non-uniform, with a strong predominance of the more frequent clusters (i.e., the frequent clusters occur very frequently, and the vice versa). Evidently, the fourth-most frequent cluster occurs no more than 10% of the cases in any

Figure 4: (a) CDF of the number of CFR clusters observed at 30 different client locations (i.e., 30 distinct links); (b) CDF of the probability of seeing the n-th most frequent CFR cluster.

spot, and the 5th, 6th, ... 19th clusters are almost rare. This suggests that even if a few spots experience large number of clusters, we are not very likely to see most of them, both during the training and localization.

Figure 5: CFR cross correlation in presence of human beings at a location for 2 different APs at 2.4GHz

Impact of environmental changes: To understand the impact of environment changes on CFR clusters, we perform two controlled experiments. First, we study the effect of human mobility on CFR stability. We place a laptop at a fixed location and start gathering CFRs from two different APs. We run the experiment during night, and observe a single CFR cluster for both links. Then, we position a human at an increasing distance d from the laptop. We plot the CDF of cross-correlation of each received CFR with the CFR cluster observed without the human (Figure 3). We define cross-correlation1 of two CFR cluster means a and b as

Figure 5(a) shows high correlation, suggesting that human obstructions may not create a significant change to the CFRs from AP6230. This is probably because the human does not alter any of the strong signal paths between the laptop and the AP. The link to AP 5A10, however, changes with human movement; nevertheless, the change is substantial only when the human is very close to the laptop (1 foot away). Even then the median cross-correlation is larger than 0:75. In all other cases, the cross-correlation remains high, suggesting that human movements around the device may not distort the CFRs too much from its core structure.

Figure 6: (a) CFR cross correlation for (a) door open vs. closed and (b) original metal shelf vs. moved; for various distances from the laptop

## Measurement and Verification - Part II

Next, we study the effects of moving objects in the environment – doors, chairs, metal shelves, etc. We place a laptop at a fixed location and gather CFRs from different APs. Figure 6(a) shows that wooden door obstructions do not induce a significant change to the CFRs. Our measurements also 1We use cross-correlation as a metric here for illustrational purposes, and we design a more accurate metric in section 4.

show similar results for other non-metallic objects. However, a repositioned metallic cupboard (approximately 3:5 feet high and 3 feet wide) altered the CFRs significantly – Figure 6(b) shows the impact. Importantly, however, these alterations are localized only around the shelf’s original and final locations;spots more than 4 meters away from the shelf are much less perturbed, and need not be re-calibrated. While the above results are from controlled experiments, section 5 reports results from uncontrolled settings (student center, cafe, museum), with hundreds of mobile humans and shifting objects.

(Hypo. 2) The “size” of the location (over which the CFR structure is defined and stable) is small.

Precise localization will require the CFR structure to vary over space. If the structure varies in large granularities (say, multiple meters), PinLoc’s accuracy will naturally be bounded by that granularity.

Thus to understand the “size” of each location, we move a test location increasingly further away from a reference location, and compute their respective CFR cross-correlations. Various existing channel measurements [11,15] show that the channel changes entirely once a receiver is moved more than a wavelength, which is 12cm in the case of WiFi. Figure 8 shows that the cross-correlation drifts apart with increasing distance, and is quite low even above 2cm. While this may suggest that localization is feasible at 2cm resolution, we will see later that multiple locations in a room may exhibit matching fingerprints. This is why we will require PinLoc to collect multiple fingerprints from very nearby locations (within a 1m 1m spot). The combined fingerprint from a specific “spot” is much less likely to occur in other spots, and hence, we will attain reliable localization at the granularity of spots.

(Hypo. 3) The CFR structure of a given location is different from the structures of all other locations.

To evaluate the (dis)similarities of CFRs among different locations, we divide the measured data into a training and a test set. Each location has a set of CFR clusters pertaining to an AP, represented by their mean and variance and learnt from the training set. For a test CFR from a location L, we use correlation to find the best matching CFR cluster from L’s training data. The correlation value, denoted as Sown, indicates similarity of the test CFR with the trained fingerprint at

Figure 8: Cross-correlation drifts away for CFRs that are apart by 2cms or more.

the same location. Now, for all fingerprints of all other locations, we find the one that exhibits maximum similarity of this test CFR – we denote this similarity as Sothers. If a device’s measured CFR is more similar to a different location than its own, we will naturally misclassify the device’s location.

Note that Sown and Sothers are computed per-AP.  One may ask: Figure 7 shows that a packet may be classified to one out of 8 different spots. In reality, the system will need to discriminate between many more spots – will the system scale to such scenarios? We note that PinLoc does not need to discriminate between all spots in a large area. Prior work has used WiFi SSIDs alone to localize devices to around 10m x 10m regions in indoor environments [8]. PinLoc will leverage such schemes to first compute a coarse-grained macrolocation, and then discriminate only between the spots inside that macro-location. Having verified these hypotheses, we now design the full localization scheme, including CFR clustering and matching over multiple sub-carriers. Thereafter, we evaluate PinLoc’s performance in section 5.

## DESIGN AND IMPLEMENTATION

Figure 9 shows the architectural overview of PinLoc. During war-driving, a Roomba-mounted laptop moves randomly through a spot for around 4 minutes. Recall from sections 1 and 2, that a spot is composed of many “locations” (Figure 10), hence, the laptop measures the CFR of every location it hits. Of course, due to the Roomba’s random mobility, the laptop may not be able to collect CFRs from all locations in a spot, making our war-driving process far less painstaking. The wardriven data is then sanitized through a phase correction operation and fed to a clustering algorithm, which outputs a few dominating CFR clusters per-location. As mentioned earlier, these clusters are expressed as cluster means and variances, and together form the training set. The training set from different locations in the same spot are stored under the corresponding spot database.

During the real-time localization phase, each mobile node passively records strings of CFRs it receives from AP beacons (string length of 4 denoted with shaded squares in Figure 10). The mobile either sends these CFRs to a PinLoc server, or requests the spot databases for candidate spots in its (known) macro-location. The next step is matching. A single CFR reading will not match exactly the ones from the spot-databases due to random fluctuations, but on average, they are more likely to fall in the correct cluster than a wrong one. To improve the accuracy, instead of matching a single reading, we match the string of consecutive CFR readings. Each of these 4 CFRs may match well with a location from a random incorrect spot, but it is unlikely that all the CFRs from a string will match better with locations from the same wrong spot.

Results from the next section will confirm this robustness of spot localization. However, before presenting the results, we discuss the sanitizing, clustering, and matching modules in details.

4.1 Data Sanitization (Phase and Time Lag)

This is not a problem for a conventional OFDM receiver that only needs to remove, but not learn the channel response.

4.2 Modeling the channel response

We see from a sample measurement, presented in section 3, that the channel responses look like noisy replicas of a few representative clusters. However, it may not be obvious how to identify clusters and the main challenge of the classification algorithm is how to deal with the measurement noise. We make a reasonable assumption to model the noise (also called fast-fading) as a complex Gaussian noise, which corresponds to Rayleigh fading [15]. We first verify this assumption visually by looking at the samples across subcarriers, such as the one illustrated in Figure 2 (b). We then take a few samples from the measurements and verify using QQ plots that the distribution fits well to a complex Gaussian and that the real and the imaginary parts are i.i.d. We also assume that the noise is independent across subcarriers.

which we shall use as a distance metric in the classification algorithm. Note that d is indeed a distance metric, but it also has a probabilistic interpretation from (3), which we will use later while combining multiple readings to improve classification accuracy.

4.3 Clustering algorithm

Recall that we model the data at each location as a Gaussian mixture distribution, with K clusters with means and variances (Uk; Vk). We denote with wk the probability that an observed packet belongs to a particular cluster k, which  corresponds to how often we “see” cluster k in our training data. According to the Gaussian mixture model, this probability is independent across packets. Thus, the parameters of our model are (wk;Uk; Vk); k = 1; :::;K. The classical approach to estimate these parameters is the expectation-maximization algorithm [16]. Instead, we estimate the parameters using variational Bayesian inference [16]. Variational inference is provided by the Infer.NET [17] framework that we use to implement the clustering algorithm. It is particularly convenient here because it tends to prune unneeded clusters. Instead of estimating the number of clusters K by running the algorithm multiple times with different values of K, we perform one run with K = 10 and drop the clusters with small weights wk. Some locations may actually have more than 10 clusters but this is rare and discarding these extra clusters has little impact on performance.

We note that another potential clustering algorithm that could be used here is k-means clustering algorithm. However, our approach takes into account that the noise has a Gaussian distribution and hence can perform a more accurate clustering. For further discussion on the drawbacks of the k-means clustering, please see [16, Chapter 9].

4.4 Classification algorithm

Our classification algorithm is composed of two parts. First, PinLoc computes a macro-location based on WiFi SSIDs alone [8], and shortlists the spots within this macro-location; we call these spots the candidate set, C. The second task is to pick one spot from C, or to declare that the device is not in any of these spots. To this end, the WiFi device overhears beacons from the APs as it roams around (we discuss the energy implications in section 5.5). Let A be the set of all APs and let AP(P) denote the AP which transmitted packet P. We define the distance between a given packet P and a spot Si as

where Zi is a set of representative CFR clusters learned from spot Si. Then, for all values of i, we compute the minimum of d(P; Si) – this outputs the most likely spot that the user is located at, based on the CFRs from packet P. The operation repeats for every packet received within a short time window (typically 30 packets from 3 APs), and the spot that is picked most often (highest vote) is identified. PinLoc does not immediately declare the highest voted spot as the user’s location. If the highest vote count is small, it suggests low confidence and the possibility that the user is not located at any of the trained spots4. Thus, PinLoc ensures that the number of votes is above a rejection threshold before announcing that spot as the user’s location. If the number of votes is below the threshold, then PinLoc announces the location as “not-a-spot”. The rejection threshold can be selected from the training data and a application-specified false positive rate. We use 15% of the number of possible votes as the threshold in our evaluation. E.g., if the maximum number of votes are 30, PinLoc announces a location as “not-a-spot”, if the highest matching spot obtain less than 5 votes.

4We note that, in order to deal with the outliers, we use the majority voting scheme with the distance function (5) instead of the direct probabilistic interpretation (4).

## EVALUATION

We evaluate PinLoc across 100 different spots using war driven training data and several test samples. Our wardriving approach is explained next.

5.1 War-Driving

The channel model and clustering algorithm from section 4.2 can be applied only to data from the same location. As we have seen in section 3, the size of a location that has the same representative CFR clusters is about 2cm x 2cm. It is important to get enough data from the same location to be able to accurately learn the channel responses. Also, it is important to collect responses from many (2cm x 2cm) locations within the 1m x1m spot.

Our current war-driving procedure is in 2D. We transmit a packet from an AP every 1ms. A Roomba robot, mounted with a laptop, moves at a programmed speed of 30cm/s. Figure 11 shows an example of war-driving at the museum. We receive about 60 packets during a 2cm stretch. We divide all packets received in one spot into batches of 60 packets and run clustering algorithm on each batch.

Figure 11: PinLoc war-driving at different spots in the museum. The Roomba robot mounted with a laptop, and 4 virtual wall devices at the corners of the spot.

There are three important questions that arise about wardriving. The first question is do we need to record every single representative CFR cluster during war-driving? As we show later in section 5, PinLoc’s performance is not too sensitive to war-driving accuracy – the accuracy degrades gracefully as the Roomba is made to war-drive for shorter durations.

The second question is do we need to war-drive in a particular fashion? During the localization we match each sample against all learned representative CFR clusters to find the best fit, thus it is not important in which order we store the clusters in the learning database. One can swipe through a spot in any direction, multiple times and in multiple rounds. Moreover, war-driving does not have to be done in any particular channel conditions (e.g. during busy or off-peak hours). In fact, the accuracy increases if we accumulate channel samples from diverse channel conditions.

The third question is why not define a spot as a single location (2cm x  2cm)? Recall from Figure 10 that a string of CFRs improves the probability of spot localization. This is because a single test CFR may match with a different random spot; but it is unlikely that a string of test CFRs will all match better with the same wrong spot. In leveraging this diversity gain, we find that strings of length 5 are reasonable. Given that AP beacons are spaced 100ms apart in WiFi b/g/n, and human indoor walking speed is 1m=sec, getting 10 beacons implies that the person moves almost 1m. Hence, we fix our spotsize to 1m 1m. Of course, even if a person is not walking, we assume that there will be inherent swaying motions of the body in the granularity of few cms – such motions will be sufficient for PinLoc.

5.2 Experiment Design

We evaluate PinLoc in 4 environments: (1) Hudson Engineering building with faculty offices and classrooms, (2) busier Bryan Student Center, (3) Twinnies cafe, and (4) Duke University Nasher Museum. Both training and test samples are taken from the laptop, mounted on the Roomba robot. In all the experiments, the laptop associates to all APs on the same frequency channel, and receives beacon packets from them for a duration of 1s. This 1s duration ensures that a mobile user (walking at 1m/s) will remain inside the same spot while receiving all the beacons. For the engineering building scenario, we take test samples during daytime with more than 100 people around. The cafeteria experiments are performed during busy lunch hours, with more than 50 people present at any time, along with a high churn. For the museum measurements PinLoc was trained for 10 large paintings in one wing of the museum, and tested for these spots. The measurements were obtained on a slightly less busy day to minimize interference with real visitors. Both training and test samples are taken at the same height (we discuss the ramifications of height in section 6)

Metrics: Our goal is to show that we can accurately localize test samples to the corresponding spots, but also to detect when a sample does not belong to any spot. We use the following two metrics to evaluate PinLoc. (1) Accuracy – the fraction of cases in which the user was localized to the correct spot. (2) False positives (FP) – the fraction of cases in which the users were localized to an incorrect spot/non-spot. In other words, false positives also take into account the cases where PinLoc localizes the user to a trained spot, even though the user was not located at any of these spots.

Comparison with RSSI: We also evaluate whether we can use RSSI to achieve a similar accuracy with the same number of APs. For this, we compare PinLoc measurements to simplify war-driving. In order to provide a fair comparison, we modify Horus to use the same war-driven training set we use in PinLoc. We define the similarity analogously to (5), replacing correlation with a difference between two RSSI. This is consistent with equation (3) in Horus [4], where the localization metric is a joint probability of seeing different access points. We compare the modified Horus algorithm with PinLoc by using the same test data across the same  number of APs.

5.3 PinLoc accuracy and false positives

Figure 12 reports results from the engineering building experiments. PinLoc achieves nearly 90% mean accuracy across 50 spots (Figure 12(a)), consistently outperforming Horus. The false positives (FP) are also maintained to less than 7%, compared to more than 25% in Horus (Figure 12(b)). RSSI based algorithm is significantly worse than PinLoc, since it is represented with a single real number. CFR is represented with 30 complex numbers and contains much richer information. In this comparison, we were tuned to a single channel, and observed 1 to 4 APs on that channel. Of course, PinLoc’s, as well as Horus’s, performance will improve if more APs are available. But this may require scanning across channels which may increase energy consumption.

Figure 12(c) zooms into the performance of PinLoc and shows the accuracy/FP on a per-WiFi macro location5. The number of spots per WiFi region is shown on top of the bars. Student center has a more dynamic environment with cafes and shops – we evaluate PinLoc at 34 spots across 3 WiFi macro locations. Figure 13(a) shows that even in such an active environment PinLoc can maintain low false positive(7:3%) and high accuracy(86%). An obvious question is whether such performance will degrade if adjacent spots need to be identified (i.e., center of spots are 1m apart). Figure 13(b) shows that PinLoc is able to discriminate in such settings with a reasonable average accuracy of 82%.

Similar accuracy/FP graphs are plotted for the cafeteria and museum in Figure 14. The mean accuracy for the cafeteria case is 90:07% and the mean FP is 4:5%. For the museum, the mean accuracy is 90:28%, and the mean FP, 4:1%. In all four scenarios, PinLoc achieves high accuracy/low FP for most of the spots, except around 20% where the performance drops. Careful investigation showed that these spots received packets at low SNR from many APs. To probe this further and understand their ramifications, we next perform an analysis across various system parameters.

## Impact of Parameters

Recall that a macro-location is derived from WiFi SSIDs; Pin-Loc discriminates only between candidate spots inside the macro-location.

Figure 13: Pinloc performance in student center (a) Accuracy, false pos., (b) Performance of adjacent spots.

Impact of number of test packets

A user extracts CFRs from beacon packets that are transmitted by APs every 100ms. Thus, assuming up to 1m=s walking speeds, the user dwells for at least 1s inside a spot, thereby receiving 10 beacons per-AP. Figure 15 (a) shows the variation of accuracy and FP with the number of received beacons per- AP. With the typical size of 10 packets per AP, PinLoc achieves mean accuracy of 89% and false positives of 7% across 50 spots in the engineering building; 15 packets raise them to 91% and 5%, respectively. Figure 15 (a) also shows low accuracy (68%) and high FP (14%) if only 1 packet per AP is used. This is because one single reading may randomly match with an incorrect spot. However, even with 5 packets, we find the accuracy above 82%, and FP less than 10%. This indicates

Figure 14: PinLoc performance in cafeteria and museum (a) Accuracy and FP per spot in cafeteria. (b) Accuracy and FP per-spot in the museum.

that even at higher user mobility, or with failure in beacon reception, PinLoc can sustain a reasonably good performance.

Impact of the number of APs

PinLoc’s performance in sparse WiFi environments is of interest. To this end, we analyze the performance for varying number of APs. While collecting the test data, we were tuned to a single channel, and observed 1 to 4 APs on that channel (we did not scan across channels to limit energy consumption). We divide the spots into different categories depending on how many APs are visible within each spot – Figure 15 (b) shows the results. An encouraging observation is that even when only a single AP is visible, PinLoc can perform spot localization with accuracy of over 85% and false positives below 7%. This is in contrast to other WiFi-based localization methods methods that need at least 3 APs to attain reasonable precision. Furthermore, as the number of visible APs increases, the performance improves quickly.

Impact of war-driving

It is important to understand how long we need to war-drive to achieve high localization accuracy. The tradeoff is that short war-driving will record fewer CFRs, incurring the possibility of overlooking an important CFR cluster. To understand the impact, we run PinLoc on different training sets, drawn from different war-driving durations. Figure 15 (c) plots the accuracy and the FP (per-spot) as a function of this duration. Evidently, a few minutes of war-driving per-spot suffices; we observe reasonable performance when using only 1 minute of war-driving data.

Impact of mobility

We turn to the cafeteria scenario to analyze the effect of the mobility on the accuracy of localization. We take one hour of test data for three spots in the cafeteria. We perform localization on each batch of 10 beacon packets and plot its success or failure in Figure 16. For all three spots, we see that the time instants when localization failed are short and uniformly spread over the measurement interval. The mean accuracy was 85% with 7% false positives. Thus, even in a very busy environment such as the cafeteria, we are able to provide localization without prolonged disruption.

Figure 16: Success of PinLoc localization over time for three spots and over an interval of 1 hour.

Impact of old training data

We concede that PinLoc will need a fresh round of war-driving for spots that are affected by significant environmental changes (e.g., metallic shelves). But with “small” structural changes will PinLoc’s war-driven training data remain valid over days and months? To evaluate this we tested 5 spots in our engineering building 7 months after wardriving them. Figure 17 shows a moderate median accuracy of 73% per spot in this scenario. Depending on application requirements, war-driving can be periodically scheduled to improve accuracy. This may not be hard since war-driving can be automatically be done using a Roomba robot.

Figure 17: Accuracy of 5 spots tested 7 months after training.

5.5 Energy consumption

PinLoc is designed with energy efficiency in mind. Contrary to existing schemes that rely on power-hungry active scanning, PinLoc uses only beacons from APs in a single channel. For this, it synchronizes with the beacon-schedules of these APs and periodically wakes up to collect the CFRs. It sends this information to a central server for computing location. The amount of such data is low, approximately 1200 bytes per second for 2 APs and 1m=s speed, and can be easily batched with existing traffic that communicates to the location based service. Consequently, the data upload energy is marginal. Now, for the energy footprint of beacon reception, we performed measurements on Google Nexus One phones, using the Monsoon Power Meter. We found that receiving beacons from 2 APs every 100ms incurs an additional 5:28mW power on average. This may be negligible compared to 1326:72mW on average to stream YouTube video. We omit the details in the interest of space but argue that PinLoc’s low energy overhead makes it a practical proposition.

## LIMITATIONS AND DISCUSSION

• Antenna’s orientation: PinLoc’s war-driving and testing were all performed with a laptop on a Roomba robot, in a 2D plane. The antenna is placed in the laptop’s lid and the lid was closed, hence the antenna was parallel to the testing plane. During wardriving and testing, Roomba robot took random turns, constantly pointing the antenna in different directions. Hence we can conclude our results are robust to the antenna’s orientation on the 2D plane.

Figure 18: Beacon energy consumption from the Monsoon Power Meter tool.

• Height, 3D war-driving, and phone mobility: In reality, users will carry their phones at different heights and 3D war-driving would be necessary. We discuss the effects of 3D wardriving with help of two simple micro-benchmarks

(1) We manually war-drive a 1m1m50cm box for 15 minutes (a person holds a laptop in hand and moves it in random directions across the box), and we collect CFR samples from 4 APs. We then collect test data within the same box by moving the laptop around it and rotating the laptop. The person that holds the laptop also moves around it. We compare the test data with the manually war-driven 3D data and with the previously war-driven 2D data for 9 different locations. We plot the localiz ation accuracy in time in Figure 19(a). The mean accuracy is 84%. We see that the accuracy is similar to the one reported in section 5 hence we can speculate that with suitable extensions, PinLoc might scale real-world scenarios.

(2) To understand the complexity of 3D wardriving, we measure the vertical “size” of a location (the z axis analogue of Figure 8). We see that cross-correlation between CFRs drifts apart at 10cm or more (Figure 19(b)). This suggests that 3D war-driving may be feasible, perhaps with a height-varying tripod on top of a Roomba. We were unable to construct a robot for 3D wardriving. We leave 3D localization to future work, as well as issues that may emerge from phone orientation, or users inserting their phones in pockets and bags.

• Dependency on particular hardware cards: We run the experiments reported in section 5 alternating among 4 laptops (each with an Intel 5300 card). Since Intel 5300 was the only available card that exposes CFR information, we were not able to test PinLoc with cards from different vendors. We speculate that cross-platform calibration would be necessary, similar to existing RSSI based schemes [7, 18].
• Localization and MIMO: We did not explore MIMO capabilities in our current test-bed. A MIMO receiver would provide as many CFR samples as the number of receive antennas, and could be highly valuable for localization. We leave this for future investigation.

Figure 19: (a) Accuracy of a 3D spot over time. (b) Correlation drifts away with height differences of 10cm or more.

7. RELATEDWORK

The topic of indoor localization has seen a variety of approaches that may be broadly classified as active and passive, and subclassified into RF and sensing based techniques. We sample some of the key ideas, and discriminate PinLoc from them.

RF signal strength-based localization has been the dominant theme of localization for more than a decade. RADAR [8] performs detailed site surveys a priori and then utilizes the SSIDs and received signal strength reported by wireless devices to generate location fingerprints. Horus [4] and LEASE [19] propose enhancements to RADAR by exploiting the structure of RSSI. Place Lab [6] and the Active Campus project [20] attempt to reduce the overhead of calibration – they show that collecting signal information from WiFi and GSM base stations through war-driving is adequate for reasonably accurate localization. Finally, Patwari and Kasera [11] have also recently explored the use of signal characteristics of a wireless channel to achieve location distinction – that is, they reliably identify when a device has moved from one location to another.

Time-based techniques utilize time delays in signal propagation to estimate distances between wireless transmit-receiver pairs. Examples include GPS [21], PinPoint [22], and work by Werb et. al. [23]. The TPS system uses difference in time of arrival of multiple RF signals from transmitters at known location [24]. Similarly, the PAL system [25,26] uses time difference of arrival between UWB signals at multiple receivers to determine location. The Cricket system [10, 27] and AHLoS [28] utilize propagation delays between ultrasound and RF signals to estimate location of wireless devices. Such a solution requires installation of ultrasound detectors on wireless devices, limiting their applicability.

Angle-of-arrival based techniques utilize multiple antennas to estimate the angle at which signals are received, and then employ geometric or signal phase relationships to calculate bearings to other devices with respect to the device’s own axis [29–31]. Besides positioning, such methods can also provide orientation capabilities. However, these techniques require extremely sophisticated antenna systems (4 to 8 antennas) and non-trivial signal processing capabilities, unforeseeable on mobile devices in the near future. PinLoc’s reliance onWiFi alone, along with the ability to utilize PHY layer information from off-the-shelf interfaces [32], makes it a potential candidate for immediate deployment.

8. CONCLUSIONS

This paper shows that PHY layer information, exported by off-the-shelf Intel 5300 cards, offer new opportunities to localize WiFi devices in indoor environments. We leverage the observation that multipath signals exhibit stable patterns in the manner in which they combine at a given location, and these patterns can lend themselves to meter-scale localization. Evaluation results from the engineering building, cafeteria, student center, and the university museum, demonstrate a mean accuracy of 89% for 100 spots. From the application’s perspective, PinLoc could enable product advertisements on a shopping aisle, or offer information on each exhibit at a museum. We believe this is a step forward in the area of indoor localization, even though some more work is necessary before it is ready for real-world deployment.

## ACKNOWLEDGMENT

We sincerely thank Dr. Anthony Lamarca for shepherding our paper, as well as the anonymous reviewers for their valuable feedback. This work is supported in part by the NSF CNS- 0747206 grant.

## References

[2] E. Bruns et al. Enabling mobile phones to support large-scale museum guidance. Multimedia, IEEE, 2007.

[3] N. Priyantha, A. Chakraborty, and H. Balakrishnan. The cricket location-support system. In MOBICOM, 2000.

[4] M. Youssef and A. Agrawala. The horus WLAN location determination system. In MobiSys, 2005.

[5] Chen Y. et al. Fm-based indoor localization. In Mobisys. ACM, 2012.

[6] Yu-Chung Cheng, Yatin Chawathe, Anthony LaMarca, and John Krumm. Accuracy characterization for metropolitan-scale wi-fi localization. In MobiSys, 2005.

[7] K. Chintalapudi, A. Iyer, and V. Padmanabhan. Indoor localization without the pain. In MOBICOM, 2010.

[8] V. Bahl et al. RADAR: An in-building rf-based user location and tracking system. In INFOCOM, 2000.

[9] I. Constandache, S. Gaonkar, M. Sayler, R. Roy Choudhury, and L. Cox. Enloc: Energy-efficient localization for mobile phones. In IEEE Infocom Mini Conference, 2009.

[10] N. B. Priyantha. The cricket indoor location system. PhD thesis, MIT, 2005.

[11] Junxing Zhang et al. Advancing wireless link signatures for location distinction. In MobiCom. ACM, 2008.

[12] M. Azizyan, I. Constandache, and R. R. Choudhury. SurroundSense: Mobile phone localization via ambience fingerprinting. In MOBICOM, 2009.

[13] A. Sheth, S. Seshan, and D. Wetherall. Geo-fencing: Confining Wi-Fi coverage to physical boundaries. Pervasive Computing, 2009.

[14] Intel Research. Intel 5300 mimo channel measurement tool.

http://ils.intel-research.net/ 80211n-channel-measurement-tool.

[15] D. Tse and P. Viswanath. Fundamentals of Wireless Communication. Cambridge University Press, 2005

[16] C. Bishop. Pattern Recognition and Machine Learning. Springer, 2006.

[17] T. Minka, J.M. Winn, J.P. Guiver, and D.A. Knowles. Infer.NET 2.4, 2010. Microsoft Research Cambridge.
http://research.microsoft.com/infernet.

[18] A. Haeberlen et al. Practical robust localization over large-scale 802.11 wireless networks. In MOBICOM. ACM, 2004.

[19] P. Krishnan and others. A system for LEASE: Location estimation assisted by stationary emitters for indoor rf wireless networks. In IEEE Infocom, March 2004.

[20] William G. Griswold et al. Activecampus: Experiments in community-oriented ubiquitous computing Computer, 2004.

[21] The global positioning systems (wikipedia entry).
http://en.wikipedia.org/wiki/Global_Positioning_System.

[22] M. Youssef et al. Pinpoint: An asynchronous time-based location determination system. In ACM Mobisys, June 2006.

[23] J. Werb and C. Lanzl. Designing a positioning system for finding things and people indoors. IEEE Spectrum, 35(9), September 1998.

[24] X. Cheng et al. TPS: A time-based positioning scheme for outdoor sensor networks. In IEEE Infocom, March 2004.

[25] S.J. Fontana, R.J.and Gunderson. Ultra-wideband precision asset location system. In IEEE Conference on Ultra Wideband Systems and Technologies, May 2002.

[26] R.J. Fontana, E. Richley, and J. Barney. Commercialization of an ultra wideband precision asset location system. In IEEE Conference on Ultra Wideband Systems and Technologies, November 2003.

[27] A. Smith et al. Tracking moving devices with the cricket location system. In ACM Mobisys, June 2004.

[28] A. Savvides and M. Han, C.C. Srivastava. Dynamic fine-grained localization in ad-hoc networks of sensors. In ACM MobiCom, 2001.

[29] D. Niculescu and B. Nath. Ad hoc positioning system  (APS) using AoA. In IEEE Infocom, 2003.

[30] D. Niculescu and B. Nath. VOR base stations for indoor 802.11 positioning. In ACM MobiCom, September 2004.

[31] J. Xiong and K. Jamieson. SecureAngle: improving wireless security using angle-of-arrival information. In ACM HotNets, 2010.

[32] S. Sen et al. Precise Indoor Localization using PHY Layer Information. In ACM HotNets, 2011.

To View:

## Forum for Spot Localization using PHY Layer Information

Forum for Spot Localization using PHY Layer Information

Forum not available

## Quiz

Quiz

Quiz not avilable

## Assignment

1.Let us consider that a ship is trying to find its location via GPS. For its purpose knowing only the x and y coordinates would suffice. It has three satellites S1, S2 and S3 to help it. Let the positions of the three satellites S1, S2 and S3 be (0,0),(9,15) and (21,0) respectively. The distance of the device from S1, S2 and S3 are respectively 13 units, 5 units and 20 units respectively. Find the position of the ship. You can ignore the clock bias of the satellite and the GPS devise.

2. Consider an OFDM carrier consisting of 5 subcarriers (f1,...,f5). The phases (in degree) of the channel responses of the subcarriers are respectively 60, 80, 85, 90 and 100. The CFRs received cannot be directly used for calibration as any unknown phase  and lag  can distort the CFR. The phase of the channel response of subcarrier

where z is the noise and is the genuine channel response. Assuming zf is very small, how can the responses be sanitized and what should be the value of the post-processed phase.

Assignment not available

## Solutions

Page view resticted for students