Benchmarking¶
Periodically we will be publishing results of benchmarking the Synapse Python Client compared to directly working with AWS S3. The purpose of these benchmarks is to make data driven decisions on where to spend time optimizing the client. Additionally, it will give us a way to measure the impact of changes to the client.
Results¶
07/02/2024: Downloading files from Synapse¶
These benchmarking results were collected due to the following changes:
- The download algorithm for the client was re-written to focus on error handling and handling of multi-threaded downloads orchestrated by AsyncIO.
- The
synapseutils.syncFromSynapse()
function was refactored to use this new logic
The results were created on a t3a.micro
EC2 instance with a 200GB disk size running
in us-east-1. The script that was run can be found in docs/scripts/downloadBenchmark.py
.
Average transfer time result:
- <= 100 MiB per file:
8% Decrease
- >= 1 GiB per file:
31% Decrease
Test | Total Transfer Size | v4.4.0 .syncFromSynapse() | v4.3.0 .syncFromSynapse() | v4.4.0 .Get() | v4.3.0 .Get() |
---|---|---|---|---|---|
10 File/10GiB ea | 100GiB | 1910s 39% Decrease |
3155s | 2186s 26% Decrease |
2958s |
1 File/10GiB ea | 10GiB | 229s 21% Decrease |
289s | 214s 31% Decrease |
308s |
10 File/1GiB ea | 10GiB | 174s 47% Decrease |
330s | 224s 24% Decrease |
295s |
100 File/100 MiB ea | 10GiB | 161s 3% Increase |
156s | 228s 13% Decrease |
262s |
10 File/100 MiB ea | 1GiB | 15s 12% Decrease |
17s | 24s 11% Decrease |
27s |
100 File/10 MiB ea | 1GiB | 24s 14% Decrease |
28s | 69s 9% Decrease |
76s |
1000 File/1 MiB ea | 1GiB | 98s 8% Decrease |
106s | 309s 1% Decrease |
312s |
05/10/2024: Uploading files to Synapse¶
These benchmarking results were collected due to the following changes:
- The upload algorithm for the Synapseutils
syncToSynapse
being re-written to take advantage of the new AsyncIO upload algorithm for individual files. - An updated limit on concurrent file transfers to match
max_threads * 2
Test | Total Transfer Size | Synapseutils | OOP Models Interface | syn.Store() | S3 Sync CLI |
---|---|---|---|---|---|
10 File/10GiB ea | 100GiB | 1656.64s | 1656.77s | 1674.63s | 1519.75s |
1 File/10GiB ea | 10GiB | 166.83s | 166.41s | 167.21 | 149.55s |
10 File/1GiB ea | 10GiB | 168.74s | 167.15s | 184.78s | 166.39s |
100 File/100 MiB ea | 10GiB | 158.98 | 125.98s | 293.07s | 162.57s |
10 File/100 MiB ea | 1GiB | 16.55s | 14.37s | 29.23s | 19.18s |
100 File/10 MiB ea | 1GiB | 15.92s | 15.49s | 129.90s | 18.66s |
1000 File/1 MiB ea | 1GiB | 135.77s | 137.15s | 1021.32s | 26.03s |
A high level overview of the differences between each of the upload methods:¶
- OOP Models Interface: Uploads all files and 8MB chunks of each file in parallel using a new upload algorithm
- Synapseutils: Uploads all files and 8MB chunks of each file in parallel using a new upload algorithm
- syn.Store(): Uploads files sequentally, but 8MB chunks in parallel using a new upload algorithm
- S3 Sync CLI: Executing the
aws s3 sync
command through Pythonsubprocess.run()
04/01/2024: Uploading files to Synapse¶
These benchmarking results bring together some important updates to the Upload logic. It
has been re-written to bring a focus to concurrent file uploads and more effecient use
of available threads. As a result of this change it is not reccommended to
increase max_threads
manually. Based on the available CPU cores this python package
will use multiprocessing.cpu_count() + 4
. For this testing the default thread size
for the machine testing took place on was 6
.
The results were created on a t3a.micro
EC2 instance with a 200GB disk size running
in us-east-1. The script that was run can be found in docs/scripts/uploadBenchmark.py
.
Some insights:¶
- The use of the Object-Orientated interfaces/Models Interface results in the best performance out of the box and will scale based on the hardware it's run on.
- Increasing the number of threads did increase performance for the previous upload logic and Synapseutils functionality. It may also increase performance for new uploads, however, it was not tested again. Increasing the number of threads in use also has inconsistent stability as found in previous benchmarks.
- The new upload algorithm follows a similar pattern to the S3 client upload times.
Test | Total Transfer Size | OOP Models Interface | syn.Store(), New Upload | S3 Sync CLI | syn.Store(), Old Upload | Synapseutils | Synapseutils (25 Threads) | syn.Store(), Old Upload (25 Threads) |
---|---|---|---|---|---|---|---|---|
10 File/10GiB ea | 100GiB | 1652.61s | 1680.27s | 1515.31s | 2174.65s | 2909.62s | 1658.34s | 1687.17s |
1 File/10GiB ea | 10GiB | 168.39s | 167s | 152.35s | 223s | 255.99s | 169s | 166s |
10 File/1GiB ea | 10GiB | 168.78s | 172.48s | 155.52s | 224.59s | 291.72s | 167.9s | 175.99s |
100 File/100 MiB ea | 10GiB | 124.14s | 248.57s | 150.46s | 320.50s | 227.75s | 170.82s | 294.69s |
10 File/100 MiB ea | 1GiB | 15.13s | 28.38s | 18.14s | 33.74s | 26.64s | 17.69s | 32.80s |
100 File/10 MiB ea | 1GiB | 19.24s | 141.23s | 19.59s | 139.31s | 48.50s | 18.34s | 138.14s |
1000 File/1 MiB ea | 1GiB | 152.65s | 1044.42s | 25.03s | 1101.90s | 340.07s | 100.94s | 1106.40s |
A high level overview of the differences between each of the upload methods:¶
- OOP Models Interface: Uploads all files and 8MB chunks of each file in parallel using a new upload algorithm
- Synapseutils: Uploads all files in parallel and 8MB chunks of each file in parallel using the old upload algorithm
- syn.Store(), New Upload: Uploads files sequentally, but 8MB chunks in parallel using a new upload algorithm
- syn.Store(), Old Upload: Uploads files sequentally, but 8MB chunks in parallel using the old upload algorithm
12/12/2023: Downloading files from Synapse¶
The results were created on a t3a.micro
EC2 instance with a 200GB disk size running in us-east-1. The script that was run can be found in docs/scripts/downloadBenchmark.py
and docs/scripts/uploadTestFiles.py
.
During this download test I tried various thread counts to see what performance looked like at different levels. What I found was that going over the default count of threads during download of large files (10GB and over) led to signficantly unstable performance. The client would often crash or hang during execution. As a result the general reccomendation is as follows:
- For files over 1GB use the default number of threads:
multiprocessing.cpu_count() + 4
- For a large number of files 1GB and under 40-50 threads worked best
Test | Thread Count | Synapseutils Sync | syn.getChildren + syn.get | S3 Sync | Per file size |
---|---|---|---|---|---|
25 Files 1MB total size | 40 | 1.30s | 5.48s | 1.49s | 40KB |
775 Files 10MB total size | 40 | 19.17s | 161.46s | 12.02s | 12.9KB |
10 Files 1GB total size | 40 | 14.74s | 21.91s | 11.72s | 100MB |
10 Files 100GB total size | 6 | 3859.66s | 2006.53s | 1023.57s | 10GB |
10 Files 100GB total size | 40 | Wouldn't complete | Wouldn't complete | N/A | 10GB |
12/06/2023: Uploading files to Synapse, Varying thread count, 5 annotations per file¶
The results were created on a t3a.micro
EC2 instance with a 200GB disk size running in us-east-1. The script that was run can be found in docs/scripts
. The time to create the files on disk is not included.
This test includes adding 5 annotations to each file, a Text, Integer, Floating Point, Boolean, and Date.
S3 was not benchmarked again.
As a result of these tests the sweet spot for thread count is around 50 threads. It is not recommended to go over 50 threads as it resulted in signficant instability in the client.
Test | Thread Count | Synapseutils Sync | os.walk + syn.store | Per file size |
---|---|---|---|---|
25 Files 1MB total size | 6 | 10.75s | 10.96s | 40KB |
25 Files 1MB total size | 25 | 6.79s | 11.31s | 40KB |
25 Files 1MB total size | 50 | 6.05s | 10.90s | 40KB |
25 Files 1MB total size | 100 | 6.14s | 10.89s | 40KB |
775 Files 10MB total size | 6 | 268.33s | 298.12s | 12.9KB |
775 Files 10MB total size | 25 | 162.63s | 305.93s | 12.9KB |
775 Files 10MB total size | 50 | 86.46s | 304.40s | 12.9KB |
775 Files 10MB total size | 100 | 85.55s | 304.71s | 12.9KB |
10 Files 1GB total size | 6 | 27.17s | 36.25s | 100MB |
10 Files 1GB total size | 25 | 22.26s | 12.77s | 100MB |
10 Files 1GB total size | 50 | 22.24s | 12.26s | 100MB |
10 Files 1GB total size | 100 | Wouldn't complete | Wouldn't complete | 100MB |
11/14/2023: Uploading files to Synapse, Default thread count¶
The results were created on a t3a.micro
EC2 instance with a 200GB disk size running in us-east-1.
The script that was run can be found in docs/scripts
. The time to create the files on disk is not included.
This test uses the default number of threads in the client: multiprocessing.cpu_count() + 4
Test | Synapseutils Sync | os.walk + syn.store | S3 Sync | Per file size |
---|---|---|---|---|
25 Files 1MB total size | 10.43s | 8.99s | 1.83s | 40KB |
775 Files 10MB total size | 243.57s | 257.27s | 7.64s | 12.9KB |
10 Files 1GB total size | 27.18s | 33.73s | 16.31s | 100MB |
10 Files 100GB total size | 3211s | 3047s | 3245s | 10GB |