NN4SysBench

Learned Index

A database index is a data structure that improves data retrieval by linking database keys to their storage positions. Learned indexes replace traditional structures like B-Trees with ML models that predict storage locations based on database keys. For more information, check out the paper.

Neural Network

Original Paper

The original paper introduces Recursive Model Index (RMI), a framework that leverages a set of machine learning models to construct a hierarchical graph. The final implementation opts for linear models instead of neural networks to enhance execution performance. The initial naive design employs a two-layer fully connected neural network with 32 neurons. However, training a neural network to accurately model the database proves to be challenging.

Our Implementation

We use neural networks to learn database index. We have two sizes of neural networks:
1. LINN (Lightweight Neural Network):
- Width: 128 neurons per hidden layer.
- Depth: 4 layers (3 fully connected layers followed by ReLU activations and 1 output layer).
2. DeepLINN (Deep Lightweight Neural Network):
- Width: 128 neurons per hidden layer.
- Depth: 6 layers (5 fully connected layers with ReLU activations).

We face the same challenge as the original paper. We train the models via a training-verification loop.

Specification

All predicted data locations are error-bounded.

Performance of the Verifier

Verifier	Type	Safe	Time	Timeout
abcrown	lindex_0	10	3.12916079	0
marabou	lindex_0	10	0.621874303	0
abcrown	lindex_1	10	6.74826283	0
abcrown	lindex_2	10	72.5362575	0
abcrown	lindex_deep_0	10	3.16443444	0
marabou	lindex_deep_0	10	1.201228401	0
abcrown	lindex_deep_1	10	7.58243066	0
abcrown	lindex_deep_2	7	92.5729578	3

Learned Bloom Filter

A bloom filter is a probabilistic data structure that has been widely used in many computer systems. Bloom filters test whether an element (for example, a string) is in a pre-defined set. Bloom filters allow false positives: it may return true for an element that is not in the set. The inputs are being-tested elements, and the outputs are booleans, whether the elements are in the set. For more information, check out the paper.

Neural Network

Original Paper

The original paper introduces Learned Bloom Filter, a compact replacement for traditional Bloom filters that leverages a binary classifier to distinguish between keys and non-keys. The initial implementation uses a recurrent neural network (GRU) with 16 hidden units and a 32-dimensional character embedding to process string inputs. While the model achieves high classification accuracy, it cannot guarantee zero false negatives. To address this limitation, the final design incorporates an auxiliary Bloom filter to catch any missed keys, thereby ensuring correctness while reducing overall memory consumption.

Our Implementation

We use neural networks to learn Bloom filters. Our architecture is as follows:
- Width: 256 neurons per hidden layer.
- Depth: 4 layers (3 fully connected layers followed by ReLU activations, and 1 final output layer).
- Output: A sigmoid activation is applied to produce a probability indicating key membership.
This model takes a 2-dimensional input and predicts whether a given key belongs to the target set.

Specification

The false positive and false negative rates are bounded.

Performance of the Verifier

Verifier	Type	Safe	Unsafe	Time	Timeout
abcrown	bloom_filter	10	0	3.38892075	0
marabou	bloom_filter	10	0	2.23303003	0

Learned Cardinality

database cardinality estimation predicts the number of rows returned by a database query (i.e., a SQL statement), which will then influence the query optimization plans. Traditional cardinality estimation relies on heuristics and domain knowledge; whereas, the learned cardinalities learn from the trace data. The inputs of learned cardinalities are SQL queries, and the outputs are estimated number of returned rows. For more information, check out the paper.

Neural Network

Original Paper

The paper proposes MSCN (Multi-Set Convolutional Network) for cardinality estimation. It uses three two-layer neural networks to process different parts of a SQL query: tables, joins, and predicates. Each part goes through a shared MLP:

- Linear → ReLU → Linear → ReLU

The outputs from each set are averaged, then merged and passed into a final output network:

- Linear → ReLU → Linear → Sigmoid

This design helps the model handle different query structures and capture complex data patterns. It works better than traditional sampling methods, especially when no qualifying samples exist. The model is small (about 2–3MB) and trained with q-error as the loss.

Our Implementation

We use neural networks to learn database query patterns. We implemented four variations of our SetConv architecture:
1. MSCN-128d:
- Width: 128 neurons per hidden layer
- Depth: 2 hidden layers in each feature path + 2 layers for combined features
- Training: 10,000 queries, 10 epochs, batch size of 1024, learning rate of 0.01
2. MSCN-128d-dual:
- Width: 128 neurons per hidden layer
- Depth: 2 hidden layers in each feature path + 2 layers for combined features
- Processing: Dual parallel paths with subtraction operation for final output
- Training: 10,000 queries, 10 epochs, batch size of 1024, learning rate of 0.01
3. MSCN-2048d:
- Width: 2048 neurons per hidden layer
- Depth: 2 hidden layers in each feature path + 2 layers for combined features
- Training: 10,000 queries, 10 epochs, batch size of 1024, learning rate of 0.01
4. MSCN-2048d-dual:
- Width: 2048 neurons per hidden layer
- Depth: 2 hidden layers in each feature path + 2 layers for combined features
- Processing: Dual parallel paths with subtraction operation for final output
- Training: 10,000 queries, 10 epochs, batch size of 1024, learning rate of 0.01
Our model processes input features of varying dimensions: sample features (based on table vectors), predicate features (combining column vectors, operation vectors, and values), and join features. Each feature type is processed through its own two-layer MLP with ReLU activations before being combined. We face the same challenges as the original paper and train our models via a training-verification loop to ensure model accuracy and generalization.

Specification

The predicted cardinalities are close to the ground-truth cardinalities from the database. A larger-ranged query returns larger cardinality.

Performance of the Verifier

Verifier	Type	Safe	Time	Timeout
abcrown	mscn_128d	10	12.3426136	0
abcrown	mscn_128d_dual	10	35.0842626	0
abcrown	mscn_2048d	3	150.021686	7
abcrown	mscn_2048d_dual	6	125.487949	4

Learned Internet Congestion Control

Congestion control protocols are to ensure efficient data transmission and prevent packet congestion in a network. Learned congestion control studies the patterns of congestion and non-congestion conditions, takes network conditions as input, and decides sending rates in the near future. For more information, check out the paper.

Neural Network

Original Paper

The neural network (NN) described in the Aurora paper is a small, fully connected neural network with the following architecture:
- Two hidden layers consisting of 32 and 16 neurons, respectively.
- The activation function used is the hyperbolic tangent (tanh).

Input: The input is a tensor with K history, each time step contains the following features:
1. Latency gradient: latency/time
2. Latency ratio: Current MI's mean latency / observed mean latency of any MI
3. Sending ratio: packets sent / packets acknowledged by the receiver
All the input tensors are flattened into a 1D tensor.

Output:
1. Change of Sending rate: positive value means increase, negative value means decrease

Our Implementation

In our benchmark, considering the limitation of current verifiers, we implement a compatible version of this neural network architecture. Implementation can be found in our repository.

To create different difficulty of the verification, we provide 3 different sizes of the model:
- Small model: 2 hidden layers of 16 → 8 neurons, tanh nonlinearity.
- Mid model (same as original paper): 2 hidden layers of 32 → 16 neurons, tanh nonlinearity.
- Big model: 2 hidden layers of 32 → 16 neurons, tanh nonlinearity.

Specification

CongestCtrl_spec101/102: When observing good (bad) networking conditions, the sender does not decrease (increase) packet sending rates.
CongestCtrl_spec2: When observing better networking conditions, the sender increases packet sending rates by either the same or a larger amount.
CongestCtrl_spec3: When the networking condition changes from bad to good, the sender eventually increases packet sending rates.

Performance of the Verifier

Verifier	Type	Safe	Unsafe	Time
abcrown	aurora_big_101	0	10	2.19588738
marabou	aurora_big_101	0	10	0.102064294
abcrown	aurora_big_102	6	4	2.838494590
marabou	aurora_big_102	6	4	0.09996135
abcrown	aurora_big_2	0	10	2.32072963
marabou	aurora_big_2	0	10	0.102983946
abcrown	aurora_big_3	0	10	2.22639322
marabou	aurora_big_3	0	10	0.23421416
abcrown	aurora_big_4	0	10	2.25798789
abcrown	aurora_mid_101	10	0	2.96464823
marabou	aurora_mid_101	10	0	0.042553444
abcrown	aurora_mid_102	0	10	2.37557136
marabou	aurora_mid_102	0	10	0.053586597
abcrown	aurora_mid_2	0	10	2.24732022
marabou	aurora_mid_2	0	10	0.049091978
abcrown	aurora_mid_3	0	10	2.24376578
marabou	aurora_mid_3	0	10	0.113317223
abcrown	aurora_mid_4	10	0	3.97856824

Learned Adaptive Bitrate

In video streaming, adaptive bitrate algorithms are used on client-side video players to decide the resolution (e.g., 720P) to download for the next video chunk. Learned adaptive bitrate uses neural networks to select future video chunks based on observations collected by client video players. For more information, check out the paper.

Neural Network

Original Paper

The neural network (NN) in the Pensieve paper is a deep feedforward neural network used for adaptive bitrate control in video streaming. It takes multiple input features related to video playback, network conditions, and device performance. These inputs are processed through several hidden layers to predict the optimal video bitrate that maximizes quality while minimizing interruptions like buffering.

Input:

The input is a tensor with N history time steps, each time step contains the following features:

Last chunk bitrate: bitrate/max bitrate, ex: 360P/1080P
Past chunk download time
Next chunk sizes
Current buffer size
Number of chunks left: eg: 43/48
Past chunk throughput:

All the input tensors are flattened into a 1D tensor.

Output:

optimal video bitrate

Our Implementation

In our benchmark, considering the limitation of current verifiers, we implement a compatible version of this neural network architecture. We implement 2 versions, benchmark version and a version for marabou the verifier. Implementation can be found in our repository.

Benchmark Version

The model is almost the same as the original paper, except that we remove the softmax and argmax layers, and all the input tensors are flattened into a 1D tensor. The input and output can be referred to in the Original Paper section.
To create different difficulty of the verification, we provide 3 different sizes of the model:

Small model:

First Layer: 4 parallel fully connected layers, each containing 128 neurons
Second Layer: A fully connected (linear) layer with 128 neurons
Output Layer: Fully connected (linear) layer with 6 neurons

Mid model (same as original paper):

First Layer: 3 parallel fully connected layers, each containing 128 neurons, and a 1D convolution layer with 128 filters and kernel size 4. These 4 layers take the input features in parallel.
Second Layer: A fully connected (linear) layer with 128 neurons
Output Layer: A fully connected (linear) layer with 6 neurons

Big model:

First Layer: 3 parallel fully connected layers, each containing 128 neurons, and a 1D convolution layer with 128 filters and kernel size 4. These 4 layers are parallel.
Second Layer: A fully connected (linear) layer with 256 neurons
Output Layer: Fully connected (linear) layer with 6 neurons

Marabou Version

Main difference:

Use torch.split() to split the input tensor into 2 parts.

Specification

AdaptiveBitrate_spec1/2: When facing good (bad) downloading conditions, the video streaming system should not pick the worst (best) video resolution.
AdaptiveBitrate_spec3: Better downloading conditions implies better resolutions

Performance of the Verifier

Verifier	Type	Safe	Time
abcrown	pensieve_big_1	10	5.99909349
abcrown	pensieve_big_2	10	6.06507417
abcrown	pensieve_big_3	10	8.87928039
abcrown	pensieve_mid_1	10	6.26007705
abcrown	pensieve_mid_2	10	6.02888779
abcrown	pensieve_mid_3	10	8.6088345
abcrown	pensieve_small_1	10	5.0981046
marabou	pensieve_small_1	10	1.90850646
abcrown	pensieve_small_2	10	5.00802932
marabou	pensieve_small_2	10	1.91794796
abcrown	pensieve_small_3	10	6.80163713

Learned Distributed System Scheduler

The System Scheduler is designed for distributed computing systems like Spark. It manages a cluster of machines running multiple jobs, each with tasks that have dependencies. Using neural networks, a learned scheduler optimizes task scheduling to achieve high-level objectives, such as minimizing job completion time. It takes inputs like the status of jobs, tasks, and the cluster, and outputs the next tasks to run. For more information, check out the paper.

Neural Network

Original Paper

The original neural network architecture consists of three main components:

1. GCN (Graph Convolutional Network)
- Models dependencies between tasks and jobs
- Captures graph structure of task interdependencies

2. GSN (Graph Scheduling Network)
- Builds on the GCN output
- Focuses on scheduling optimization
- Predicts optimal task schedules using graph-structured data

3. FC (Fully Connected) Layers
- Refines representations from GCN and GSN
- Outputs final scheduling decisions
- Determines next tasks to execute

These components work together to enable efficient task scheduling in distributed systems through dynamic, data-driven decisions.

Our Implementation

In our benchmark, considering the limitation of current verifiers, we implement a compatible version of this neural network architecture. We implement 2 versions, benchmark version and a version for marabou the verifier. Implementation can be found in our repository.

Benchmark Version

We integrate the 3 components into a single neural network. The input is the status of jobs, tasks, and the cluster, and the output is the next tasks to execute.

Input:

The input to the model is a tensor that is split into several segments:

node_inputs: Features of the nodes
Shape: [batch_size, Max_Node, 5]
node_valid_mask: A mask indicating the validity of nodes
Shape: [batch_size, 1, Max_Node]
gcn_mats: Graph convolution matrices for message passing
Shape: [batch_size, 8, Max_Node, Max_Node]
gcn_masks: Masks for each graph convolution operation
Shape: [batch_size, 8, Max_Node, 1]
summ_mats: Summarization matrices for aggregating graph information
Shape: [batch_size, Max_Node, Max_Node]
running_dags_mat: Running DAG state matrix
Shape: [batch_size, 1, Max_Node]
dag_summ_backward_map: A backward map for DAG summarization
Shape: [batch_size, Max_Node, Max_Node]

All the input tensors are flattened into a 1D tensor.

Output:

Node Outputs: Prediction scores for each node. Invalid nodes are masked with value 10000.0
Shape: [1, node_num]
Job Outputs (Ignored): Represents executor scores for each job (not used)
Shape: [1, job_num, total_executor_num]

Marabou Version

Most of the architecture is the same as the benchmark version, with a few key differences such as the use of a different graph convolution layer (GraphLayer_marabou) and learnable parameters for matrices like gcn_mats and gcn_masks.

Specification

LearnedSched_spec1: If job A's input depends on job B's output and job B is not finished, then job A should not be scheduled.
LearnedSched_spec2: A user cannot get their jobs scheduled earlier by requesting more resources for them.

Performance of the Verifier

Verifier	Type	Safe	Unsafe	Time	Timeout
abcrown	decima_mid_1	10	0	9.7232395	0
marabou	decima_mid_1	10	0	10.77956162	0
abcrown	decima_mid_2	8	1	26.29458285	1
marabou	decima_mid_2	0	0	180.0	10

Applications

Input:

Output:

Small model:

Mid model (same as original paper):

Big model:

Input:

Output: