Benchmarking¶
Periodically we will be publishing results of benchmarking the Synapse Python Client compared to directly working with AWS S3. The purpose of these benchmarks is to make data driven decisions on where to spend time optimizing the client. Additionally, it will give us a way to measure the impact of changes to the client.
Results¶
04/01/2024: Uploading files to Synapse¶
These benchmarking results bring together some important updates to the Upload logic. It
has been re-written to bring a focus to concurrent file uploads and more effecient use
of available threads. As a result of this change it is not reccommended to
increase max_threads
manually. Based on the available CPU cores this python package
will use multiprocessing.cpu_count() + 4
. For this testing the default thread size
for the machine testing took place on was 6
.
The results were created on a t3a.micro
EC2 instance with a 200GB disk size running
in us-east-1. The script that was run can be found in docs/scripts/uploadBenchmark.py
.
Some insights:¶
- The use of the Object-Orientated interfaces/Models Interface results in the best performance out of the box and will scale based on the hardware it's run on.
- Increasing the number of threads did increase performance for the previous upload logic and Synapseutils functionality. It may also increase performance for new uploads, however, it was not tested again. Increasing the number of threads in use also has inconsistent stability as found in previous benchmarks.
- The new upload algorithm follows a similar pattern to the S3 client upload times.
Test | Total Transfer Size | OOP Models Interface | syn.Store(), New Upload | S3 Sync CLI | syn.Store(), Old Upload | Synapseutils | Synapseutils (25 Threads) | syn.Store(), Old Upload (25 Threads) |
---|---|---|---|---|---|---|---|---|
10 File/10GiB ea | 100GiB | 1652.61s | 1680.27s | 1515.31s | 2174.65s | 2909.62s | 1658.34s | 1687.17s |
1 File/10GiB ea | 10GiB | 168.39s | 167s | 152.35s | 223s | 255.99s | 169s | 166s |
10 File/1GiB ea | 10GiB | 168.78s | 172.48s | 155.52s | 224.59s | 291.72s | 167.9s | 175.99s |
100 File/100 MiB ea | 10GiB | 124.14s | 248.57s | 150.46s | 320.50s | 227.75s | 170.82s | 294.69s |
10 File/100 MiB ea | 1GiB | 15.13s | 28.38s | 18.14s | 33.74s | 26.64s | 17.69s | 32.80s |
100 File/10 MiB ea | 1GiB | 19.24s | 141.23s | 19.59s | 139.31s | 48.50s | 18.34s | 138.14s |
1000 File/1 MiB ea | 1GiB | 152.65s | 1044.42s | 25.03s | 1101.90s | 340.07s | 100.94s | 1106.40s |
A high level overview of the differences between each of the upload methods:¶
- OOP Models Interface: Uploads all files and 8MB chunks of each file in parallel using a new upload algorithm
- Synapseutils: Uploads all files in parallel and 8MB chunks of each file in parallel using the old upload algorithm
- syn.Store(), New Upload: Uploads files sequentally, but 8MB chunks in parallel using a new upload algorithm
- syn.Store(), Old Upload: Uploads files sequentally, but 8MB chunks in parallel using the old upload algorithm
12/12/2023: Downloading files from Synapse¶
The results were created on a t3a.micro
EC2 instance with a 200GB disk size running in us-east-1. The script that was run can be found in docs/scripts/downloadBenchmark.py
and docs/scripts/uploadTestFiles.py
.
During this download test I tried various thread counts to see what performance looked like at different levels. What I found was that going over the default count of threads during download of large files (10GB and over) led to signficantly unstable performance. The client would often crash or hang during execution. As a result the general reccomendation is as follows:
- For files over 1GB use the default number of threads:
multiprocessing.cpu_count() + 4
- For a large number of files 1GB and under 40-50 threads worked best
Test | Thread Count | Synapseutils Sync | syn.getChildren + syn.get | S3 Sync | Per file size |
---|---|---|---|---|---|
25 Files 1MB total size | 40 | 1.30s | 5.48s | 1.49s | 40KB |
775 Files 10MB total size | 40 | 19.17s | 161.46s | 12.02s | 12.9KB |
10 Files 1GB total size | 40 | 14.74s | 21.91s | 11.72s | 100MB |
10 Files 100GB total size | 6 | 3859.66s | 2006.53s | 1023.57s | 10GB |
10 Files 100GB total size | 40 | Wouldn't complete | Wouldn't complete | N/A | 10GB |
12/06/2023: Uploading files to Synapse, Varying thread count, 5 annotations per file¶
The results were created on a t3a.micro
EC2 instance with a 200GB disk size running in us-east-1. The script that was run can be found in docs/scripts
. The time to create the files on disk is not included.
This test includes adding 5 annotations to each file, a Text, Integer, Floating Point, Boolean, and Date.
S3 was not benchmarked again.
As a result of these tests the sweet spot for thread count is around 50 threads. It is not reccomended to go over 50 threads as it resulted in signficant instability in the client.
Test | Thread Count | Synapseutils Sync | os.walk + syn.store | Per file size |
---|---|---|---|---|
25 Files 1MB total size | 6 | 10.75s | 10.96s | 40KB |
25 Files 1MB total size | 25 | 6.79s | 11.31s | 40KB |
25 Files 1MB total size | 50 | 6.05s | 10.90s | 40KB |
25 Files 1MB total size | 100 | 6.14s | 10.89s | 40KB |
775 Files 10MB total size | 6 | 268.33s | 298.12s | 12.9KB |
775 Files 10MB total size | 25 | 162.63s | 305.93s | 12.9KB |
775 Files 10MB total size | 50 | 86.46s | 304.40s | 12.9KB |
775 Files 10MB total size | 100 | 85.55s | 304.71s | 12.9KB |
10 Files 1GB total size | 6 | 27.17s | 36.25s | 100MB |
10 Files 1GB total size | 25 | 22.26s | 12.77s | 100MB |
10 Files 1GB total size | 50 | 22.24s | 12.26s | 100MB |
10 Files 1GB total size | 100 | Wouldn't complete | Wouldn't complete | 100MB |
11/14/2023: Uploading files to Synapse, Default thread count¶
The results were created on a t3a.micro
EC2 instance with a 200GB disk size running in us-east-1.
The script that was run can be found in docs/scripts
. The time to create the files on disk is not included.
This test uses the default number of threads in the client: multiprocessing.cpu_count() + 4
Test | Synapseutils Sync | os.walk + syn.store | S3 Sync | Per file size |
---|---|---|---|---|
25 Files 1MB total size | 10.43s | 8.99s | 1.83s | 40KB |
775 Files 10MB total size | 243.57s | 257.27s | 7.64s | 12.9KB |
10 Files 1GB total size | 27.18s | 33.73s | 16.31s | 100MB |
10 Files 100GB total size | 3211s | 3047s | 3245s | 10GB |