Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Feb 22;11(2):e0149105.
doi: 10.1371/journal.pone.0149105. eCollection 2016.

Inferring Stop-Locations from WiFi

Affiliations

Inferring Stop-Locations from WiFi

David Kofoed Wind et al. PLoS One. .

Abstract

Human mobility patterns are inherently complex. In terms of understanding these patterns, the process of converting raw data into series of stop-locations and transitions is an important first step which greatly reduces the volume of data, thus simplifying the subsequent analyses. Previous research into the mobility of individuals has focused on inferring 'stop locations' (places of stationarity) from GPS or CDR data, or on detection of state (static/active). In this paper we bridge the gap between the two approaches: we introduce methods for detecting both mobility state and stop-locations. In addition, our methods are based exclusively on WiFi data. We study two months of WiFi data collected every two minutes by a smartphone, and infer stop-locations in the form of labelled time-intervals. For this purpose, we investigate two algorithms, both of which scale to large datasets: a greedy approach to select the most important routers and one which uses a density-based clustering algorithm to detect router fingerprints. We validate our results using participants' GPS data as well as ground truth data collected during a two month period.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. A visualization of a single day of WiFi scans as a matrix.
Each row in the matrix corresponds to an access point and each column to a point in time. A cell in the matrix is filled if the access point was observed at that specific time. Columns which correspond to transitions between stop-locations (labelled according to ground truth) are colored in gray. The rows are ordered by the first time an access point is observed.
Fig 2
Fig 2. Visualization of merge step for density based clustering.
By merging two routers when one of them is a complete subset of the other, we reduce the number of routers in the data set. Here, merging is illustrated for a single day of data. The resulting reduction is from 357 to 29 routers. Note that the first stop-location has been reduced to a single router.
Fig 3
Fig 3. Six examples of the distribution of routers in a cluster.
Each plot corresponds to a single-cluster obtained from DBSCAN. In a plot, each bar (a maximum of 100 bars is shown) corresponds to an access-point, and its height corresponds to the proportion (0 to 1) of the samples in the cluster where the router was present. In most of the clusters, 1–10 routers are all present 100% of the time.
Fig 4
Fig 4. An example of how the stop-locations inferred by the different methods compare to the ground truth stop-locations.
The bottom timeline (red) is the stop-locations as reported by the ground truth. The first time line (blue) is the one obtained using DBSCAN on WiFi. The second time line (yellow) is the one obtained using the greedy router selection, and the third timeline (orange) is the one obtained using GPS data.
Fig 5
Fig 5. During the ground truth stop between time gstart and gend (labeled S1), the GPS-method reports cluster G1, the Top-router method reports cluster T1 and the DBSCAN-method reports cluster D2.
Now we want to compare the geographical median of S1 to clusters G1, T1 and D2. We do this by—for each method—computing the distance between the geographical median of the gps-samples collected during S1 and the geographical median of the gps-samples collected during for example G1, excluding the ones collected during S1 (to avoid overfitting). In the figure, this is depicted by comparing samples from S1 to samples from the non-grayed-out G1.
Fig 6
Fig 6. We only make the comparison of medians for the ground truth stops where all methods report stops with at least 70% overlap.
In this figure the first example (on the left) is used for comparison whereas the second (on the right) is not since the GPS method does not report a sufficiently overlapping stop. gstart and gend refers to the starting and stopping times of the ground truth stop-location.
Fig 7
Fig 7. The distribution of distances between the true stop median position and the median position reported by the three methods.
The histograms in the right column are log-log versions of the figures in the left column. As seen, most error-distances are less than 100 meters, but a few large errors of around 2000 meters are reported by all methods.
Fig 8
Fig 8. The three approaches produce a different number of points of interest.
Density based clustering of GPS data (left) produces the lowest number of stop locations, followed by greedy selection of routers (middle), and DBSCAN (right). All the stops from GPS are reflected using WiFi data, but WiFi based methods identify locations with a higher spatial resolution.
Fig 9
Fig 9. Two examples of stop-locations found using WiFi data which are not geographically stationary.
Each plot shows one stop location inferred from WiFi data, each circle shows a single GPS estimation associated with the location. The two stop-locations are most likely based on access points which are present in a train or a bus.

Similar articles

Cited by

References

    1. Gonzalez MC, Hidalgo CA, Barabasi AL. Understanding individual human mobility patterns. Nature. 2008;453(7196):779–782. 10.1038/nature06958 - DOI - PubMed
    1. Eubank S, Guclu H, Kumar VA, Marathe MV, Srinivasan A, Toroczkai Z, et al. Modelling disease outbreaks in realistic urban social networks. Nature. 2004;429(6988):180–184. 10.1038/nature02541 - DOI - PubMed
    1. Hufnagel L, Brockmann D, Geisel T. Forecast and control of epidemics in a globalized world. Proceedings of the National Academy of Sciences of the United States of America. 2004;101(42):15124–15129. 10.1073/pnas.0308344101 - DOI - PMC - PubMed
    1. Crandall DJ, Backstrom L, Cosley D, Suri S, Huttenlocher D, Kleinberg J. Inferring social ties from geographic coincidences. Proceedings of the National Academy of Sciences. 2010;107(52):22436–22441. 10.1073/pnas.1006155107 - DOI - PMC - PubMed
    1. Wang D, Pedreschi D, Song C, Giannotti F, Barabasi AL. Human Mobility, Social Ties, and Link Prediction. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD’11. New York, NY, USA: ACM; 2011. p. 1100–1108. Available from: http://doi.acm.org/10.1145/2020408.2020581 - DOI

Publication types

Grants and funding

This work was supported by Villum Foundation, http://villumfoundation.dk/C12576AB0041F11B/0/4F7615B6F43A8EA5C1257AEF003D9930?OpenDocument, Young Investigator programme 2012, High Resolution Networks (SL) and University of Copenhagen, http://dsin.ku.dk/news/ucph_funds/, through the UCPH2016 Social Fabric grant (SL). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

LinkOut - more resources