cellularXdataset.github.io

CellularX: A Spatial-Temporal Dataset for User-Centric Sim2Real Learning in Telco Network

We introduce cellularX, the first spatial-temporal dataset focusing on user-level network experiences within telco networks. As opposed to most existing datasets that only offer cell-level Key Performance Indicators (KPIs), cellularX fills the gap by offering user-grained multi-dimensional KPI data. In particular, cellularX provides a synthetic dataset for simulation-to-reality (sim2real) research to address the challenge of scarcity of real-world data for specific scenarios. Additionally, a real-world dataset collected from almost one thousand users is open-sourced. Both datasets are capable of assisting user-level network experience modeling and monitoring, e.g., anomaly detection, anomaly prediction, and root cause analysis.

The potential applications of cellularX include

1) Sim2real study: cellularX provides a controlled, flexible simulation platform and a set of real-world and simulated data that can be used to generate low-cost training data, while helping to understand the reality gap in sim2real learning and facilitate a fair comparison of sim2real algorithm. 2) AIOps for telco network: CellularX provides real-world KPI data, viewed as a snapshot of user access and network experience. Its unique multi-dimensional indicators enhance the suitability for supporting various AIOps tasks, including root cause analysis and anomaly prediction.

Data Composition

We propose CellularX, a large dataset focusing on user-level network eXperience in cellular networks. It provides two sub-datasets, cellularXsim and cellularXreal, enabling a comprehensive view of the user-level network experience.

map_sim2real
map_real

By integrating these two aspects, cellularX provides a richer understanding of the user-level network experience in wireless cellular networks. To the best of our knowledge, cellularX is the first dataset that focuses on user-level network experience.

License

This dataset is licensed under a Creative Commons License Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) License . This means that you are free to use, copy, modify, and distribute the dataset for non-commercial purposes as long as you provide attribution to the original author. The license restricts the use of the dataset for any commercial purposes without obtaining prior permission from the author. Please note that this license ensures the dataset’s availability for academic and non-profit research, but prohibits its usage for commercial gain.

Author statement: we bear all responsibility in case of violation of rights, etc., and confirmation of the data license.

Citation Guidelines

BiBTeX:

TODO

Authorship

Dataset Owners

Team(s)

[anonymous]

Author(s)

[anonymous]

Dataset Overview

Data Subject(s)

Dataset Snapshot

Category Data
Dataset Name CellularXreal, CellularXsim
Size of Dataset x MB, 2.0 MB
Number of Instances x, ~18,000

Above: Summary of CellularX dataset

Content Description

Sensitivity of Data

Sensitivity Type(s)

Field(s) with Sensitive Data

Intentional Collected Sensitive Data

No sensitive data was intentionally collected.

Unintentionally Collected Sensitive Data

All user devices involved in the data set have been anonymized, and the data set only collected data from cells in a part of the city block, so any user’s behavior and identity cannot be inferred using any additional methods.

Dataset Version and Maintenance

Maintenance Status

**Limited Maintenance: ** The data will not be updated, but any technical issues will be addressed.

Version Details

Current Version: 1.0

Last Updated: 06/2023

Release Date: N/A

Maintenance Plan

CellularX is collected in the real world at one time for academic purposes, and maintenance will be limited.

Feedback: For feedback, reach out to shaoyu@tongji.edu.cn.

Example of Data Points

Primary Data Modality

Typical Data Point

Below are examples of kind data in the cellularXreal dataset.

Time series data in cellularXreal dataset

image-20230531162027544

Below are examples of kind data in the cellularXsim dataset.

image-20230531162310523

16681685521414_.pic

image-20230531191915972

Data Fields

CellularXreal dataset

Time Series Data

KPI Name Description Unit Type
RSRP RSRP of service cell dBm Integer
ULThrp Uplink experience rate Mbps Float
DLThrp Downlink experience rate Mbps Float
DLPrbNum The number of downlink resource blocks / Integer
ULPrbNum The number of uplink resource blocks / Integer
DLBLER Downlink block error rate % Float
ULBLER Uplink block error rate % Float
ULSINR Uplink SINR dB Float

Geospatial Data

KPI Name Description Unit Type
CellID ID of cellular / String
Average RSRP Average of Service Cell RSRP dBm Float
lon Longitude of the cell / Float
lat Latitude of the cell / Float
azimuth Azimuth of outdoor cell, or indoor cell / String

CellularXsim dataset

Simulated Time Series Data

KPI Name Description Unit Type
# Receiver Point (#) Index of sample point in the path / Integer
X(m) Coordinates of sampling points m Float
Y(m) Coordinates of sampling points m Float
Z(m) Coordinates of sampling points m Float
Distance (m) The distance from the sampling point to the start of the path m Float
Strongest Power (dBm) Maximum signal power at sampling point dBm Float
Total Power With Phase (dBm) The total power at the sampling point dBm Float
Best SINR (dB) Best SINR at sampling point dB Float
RSSI (dBm) Reference Singal Strength Indicator dBm Float
RSRP (dBm) Reference Singal Receiving Power dBm Float
RSRQ (dB) Reference Signal Receiving Quality dB Float
Strongest power transmitter (Tx #) Base station index with maximum power / Integer

Collected Time Series Data

KPI Name Description Unit Type
LATITUDE Longitude of the sample point / Float
LONGITUDE Latitude of the sample point / Float
TYPE Type of cell / String
TAC Tracking Area Code / Integer
PCI Physical Cell Identifier / Integer
ECI E-UTRAN Cell Identifier / Integer
EARFCN E-UTRA Absolute Radio Frequency Channel Number / Integer
RSSI Received Signal Strength Indicator dBm Float
RSRP Reference Singal Receiving Power dBm Float
RSRQ Reference Signal Receiving Quality dB Float
SINR Signal to Interference plus Noise Ratio / Float

Geospatial Data

KPI Name Description Unit Type
cell(ECI) E-UTRAN Cell Identifier / Integer
lat Longitude of the cell / Float
lon Latitude of the cell / Float
radius Positioning error radius m Integer
BSIndexInSimulator Index of base station in Wireless Insite simulator / Integer
BSNameInSimulator Name of base station in Wireless Insite simulator / Integer

Motivations

Purpose(s)

Domain(s) of Application

AIOps, Prediction, Anomaly Detection, Root Cause Analysis

Motivating Factor(s)

Provenance

Collection

Method(s) Used

Methodology Detail(s)

Artificially Generated:

Collection Cadence

Static: Data was collected once from single or multiple sources.

Data Processing

The raw data of cellularXreal contains a large number of data points filled with default values, indicating that they were not collected. To avoid ambiguity, we replaced these default values with null values.

Extended Use

Use in ML or AI Systems

Dataset Use(s)

Usage Guideline(s)

When studying sim2real with the cellularXsim dataset, we have the following recommendations for the construction of training, validation, and testing sets. Recall that we manually design four paths in the study area when we constructed the data set. CellularXsim provides real-world data generated by user equipment while moving along these paths, as well as corresponding simulated data generated along the same routes. We recommend that these data can be used as training and validation sets. CellularXsim also includes a set of real-world data generated by randomly roaming within the study area. We encourage the use of this randomly sampled data as a testing set to validate the effectiveness of machine learning models, in order to avoid unfair comparisons caused by specific optimizations tailored to the given paths.

Limitations and Societal Impact

Societal Impact

Positive Impact

Negative Impact

Limitations