Location-Based Service (LBS) data, collected from personal mobile devices, have enabled significant advances in understanding human mobility patterns over the past decade. Extracting insights from these datasets typically involves using complex data-mining algorithms to detect, filter, and cluster stay locations. However, LBS datasets are often massive—ranging from tens to hundreds of gigabytes per day—posing serious computational challenges for traditional data processing tools. Libraries such as Pandas operate in a single-machine environment and require the entire dataset to fit into memory, making them unsuitable for processing LBS data at scale [3]. sparkmobility allows students and researchers to process large LBS dataset with improved memory management.
Abstract:
Publication date:
December 12, 2025
Publication type:
Conference Paper
Citation:
Cao, S., & Gonzalez, M. C. (2025). sparkmobility: A Spark-based Python Library for Processing, Modeling, and Analyzing Large Mobility Datasets. Proceedings of the 33rd ACM International Conference on Advances in Geographic Information Systems, 1296–1297. https://doi.org/10.1145/3748636.3766538