Precision-Guaranteed GNN-Based Predictive Compression for Large-Scale Infrastructure Data

Overview

Developed a precision-controlled predictive compression framework leveraging Graph Neural Networks (GCNs) to reduce storage requirements of large-scale spatio-temporal infrastructure data while guaranteeing user-defined decompression accuracy.

The method replaces traditional transform-based compression with a learned spatio-temporal prediction engine that selectively stores only high-error data points.


Technical Architecture

  • Global one-step-ahead predictive model trained across all monitoring nodes
  • Graph Convolutional Network (3 layers × 64 hidden dimensions) :contentReference
  • Sequence length: 4 timesteps for spatio-temporal context :contentReference
  • Delaunay triangulation–based graph construction (spatial datasets) :contentReference
  • k-Nearest Neighbor graph (pattern-similarity–based traffic dataset) :contentReference
  • Iterative threshold-controlled predictive discard algorithm (Algorithm 1) :contentReference
  • Explicit decompression error bound enforcement

Loss function: Mean Squared Error
Evaluation metrics: MAE, MSE, compression ratio, parameter count

GCN achieved highest accuracy with only ~4.7k parameters compared to millions for CNN/LSTM models


Key Capabilities

  • Direct control over maximum decompression error
  • Real-time or post-collection compression applicability
  • Low-parameter model footprint for storage-efficient deployment
  • Reduced error accumulation via threshold-based reset logic
  • Superior compression ratio compared to WT, DCT, PCA, and other DL models

Datasets Evaluated

1. NREL Solar PV Generation (U.S., 5,000+ plants, 5-min resolution) 2. Caltrans Traffic Speeds (700 sensors, 5-min resolution) 3. ComEd Energy Consumption (≈22,000 zip-code nodes, 30-min resolution)

Each dataset spans 1–2 years and includes thousands of monitoring locations.


Empirical Impact

  • Discarded up to 78% of datapoints at selected thresholds (traffic dataset example)
  • Achieved compression ratios as low as 0.22 while maintaining bounded error
  • Consistently outperformed wavelet transform (WT), discrete cosine transform (DCT), PCA, and deep learning baselines
  • Demonstrated robustness across heterogeneous infrastructure domains

Results confirm that spatio-temporal graph modeling materially improves predictive compression efficiency compared to both classical and sequence-only deep learning approaches.


Engineering Deliverables

  • PyTorch implementation of GCN-based compression framework
  • Modular graph-construction utilities
  • Multi-architecture benchmarking pipeline (GCN, GT, CNN, LSTM, MLP)
  • Threshold-controlled compression engine
  • Reproducibility scripts and dataset preprocessing modules

Relevance

Positions graph learning as a storage-efficiency technology for smart infrastructure systems, where sensor density and sampling frequency increasingly challenge data management capabilities.

Framework bridges machine learning, infrastructure analytics, and systems-level storage optimization.


Khayambashi, K., & Alemazkoor, N. (2024).
Graph neural networks for precision-guaranteed compression of large-scale data.
IEEE International Conference on Big Data.

View Publication