Skip to content

Benchmarking

Periodically we will be publishing results of benchmarking the Synapse Python Client compared to directly working with AWS S3. The purpose of these benchmarks is to make data driven decisions on where to spend time optimizing the client. Additionally, it will give us a way to measure the impact of changes to the client.

Results

07/02/2024: Downloading files from Synapse

These benchmarking results were collected due to the following changes:

  • The download algorithm for the client was re-written to focus on error handling and handling of multi-threaded downloads orchestrated by AsyncIO.
  • The synapseutils.syncFromSynapse() function was refactored to use this new logic

The results were created on a t3a.micro EC2 instance with a 200GB disk size running in us-east-1. The script that was run can be found in docs/scripts/downloadBenchmark.py.

Average transfer time result:

  • <= 100 MiB per file:

    8% Decrease

  • >= 1 GiB per file:

    31% Decrease

Test Total Transfer Size v4.4.0 .syncFromSynapse() v4.3.0 .syncFromSynapse() v4.4.0 .Get() v4.3.0 .Get()
10 File/10GiB ea 100GiB 1910s

39% Decrease

3155s 2186s

26% Decrease

2958s
1 File/10GiB ea 10GiB 229s

21% Decrease

289s 214s

31% Decrease

308s
10 File/1GiB ea 10GiB 174s

47% Decrease

330s 224s

24% Decrease

295s
100 File/100 MiB ea 10GiB 161s

3% Increase

156s 228s

13% Decrease

262s
10 File/100 MiB ea 1GiB 15s

12% Decrease

17s 24s

11% Decrease

27s
100 File/10 MiB ea 1GiB 24s

14% Decrease

28s 69s

9% Decrease

76s
1000 File/1 MiB ea 1GiB 98s

8% Decrease

106s 309s

1% Decrease

312s

05/10/2024: Uploading files to Synapse

These benchmarking results were collected due to the following changes:

  • The upload algorithm for the Synapseutils syncToSynapse being re-written to take advantage of the new AsyncIO upload algorithm for individual files.
  • An updated limit on concurrent file transfers to match max_threads * 2
Test Total Transfer Size Synapseutils OOP Models Interface syn.Store() S3 Sync CLI
10 File/10GiB ea 100GiB 1656.64s 1656.77s 1674.63s 1519.75s
1 File/10GiB ea 10GiB 166.83s 166.41s 167.21 149.55s
10 File/1GiB ea 10GiB 168.74s 167.15s 184.78s 166.39s
100 File/100 MiB ea 10GiB 158.98 125.98s 293.07s 162.57s
10 File/100 MiB ea 1GiB 16.55s 14.37s 29.23s 19.18s
100 File/10 MiB ea 1GiB 15.92s 15.49s 129.90s 18.66s
1000 File/1 MiB ea 1GiB 135.77s 137.15s 1021.32s 26.03s

A high level overview of the differences between each of the upload methods:

  • OOP Models Interface: Uploads all files and 8MB chunks of each file in parallel using a new upload algorithm
  • Synapseutils: Uploads all files and 8MB chunks of each file in parallel using a new upload algorithm
  • syn.Store(): Uploads files sequentally, but 8MB chunks in parallel using a new upload algorithm
  • S3 Sync CLI: Executing the aws s3 sync command through Python subprocess.run()

04/01/2024: Uploading files to Synapse

These benchmarking results bring together some important updates to the Upload logic. It has been re-written to bring a focus to concurrent file uploads and more effecient use of available threads. As a result of this change it is not reccommended to increase max_threads manually. Based on the available CPU cores this python package will use multiprocessing.cpu_count() + 4. For this testing the default thread size for the machine testing took place on was 6.

The results were created on a t3a.micro EC2 instance with a 200GB disk size running in us-east-1. The script that was run can be found in docs/scripts/uploadBenchmark.py.

Some insights:

  • The use of the Object-Orientated interfaces/Models Interface results in the best performance out of the box and will scale based on the hardware it's run on.
  • Increasing the number of threads did increase performance for the previous upload logic and Synapseutils functionality. It may also increase performance for new uploads, however, it was not tested again. Increasing the number of threads in use also has inconsistent stability as found in previous benchmarks.
  • The new upload algorithm follows a similar pattern to the S3 client upload times.
Test Total Transfer Size OOP Models Interface syn.Store(), New Upload S3 Sync CLI syn.Store(), Old Upload Synapseutils Synapseutils (25 Threads) syn.Store(), Old Upload (25 Threads)
10 File/10GiB ea 100GiB 1652.61s 1680.27s 1515.31s 2174.65s 2909.62s 1658.34s 1687.17s
1 File/10GiB ea 10GiB 168.39s 167s 152.35s 223s 255.99s 169s 166s
10 File/1GiB ea 10GiB 168.78s 172.48s 155.52s 224.59s 291.72s 167.9s 175.99s
100 File/100 MiB ea 10GiB 124.14s 248.57s 150.46s 320.50s 227.75s 170.82s 294.69s
10 File/100 MiB ea 1GiB 15.13s 28.38s 18.14s 33.74s 26.64s 17.69s 32.80s
100 File/10 MiB ea 1GiB 19.24s 141.23s 19.59s 139.31s 48.50s 18.34s 138.14s
1000 File/1 MiB ea 1GiB 152.65s 1044.42s 25.03s 1101.90s 340.07s 100.94s 1106.40s

A high level overview of the differences between each of the upload methods:

  • OOP Models Interface: Uploads all files and 8MB chunks of each file in parallel using a new upload algorithm
  • Synapseutils: Uploads all files in parallel and 8MB chunks of each file in parallel using the old upload algorithm
  • syn.Store(), New Upload: Uploads files sequentally, but 8MB chunks in parallel using a new upload algorithm
  • syn.Store(), Old Upload: Uploads files sequentally, but 8MB chunks in parallel using the old upload algorithm

12/12/2023: Downloading files from Synapse

The results were created on a t3a.micro EC2 instance with a 200GB disk size running in us-east-1. The script that was run can be found in docs/scripts/downloadBenchmark.py and docs/scripts/uploadTestFiles.py.

During this download test I tried various thread counts to see what performance looked like at different levels. What I found was that going over the default count of threads during download of large files (10GB and over) led to signficantly unstable performance. The client would often crash or hang during execution. As a result the general reccomendation is as follows:

  • For files over 1GB use the default number of threads: multiprocessing.cpu_count() + 4
  • For a large number of files 1GB and under 40-50 threads worked best
Test Thread Count Synapseutils Sync syn.getChildren + syn.get S3 Sync Per file size
25 Files 1MB total size 40 1.30s 5.48s 1.49s 40KB
775 Files 10MB total size 40 19.17s 161.46s 12.02s 12.9KB
10 Files 1GB total size 40 14.74s 21.91s 11.72s 100MB
10 Files 100GB total size 6 3859.66s 2006.53s 1023.57s 10GB
10 Files 100GB total size 40 Wouldn't complete Wouldn't complete N/A 10GB

12/06/2023: Uploading files to Synapse, Varying thread count, 5 annotations per file

The results were created on a t3a.micro EC2 instance with a 200GB disk size running in us-east-1. The script that was run can be found in docs/scripts. The time to create the files on disk is not included.

This test includes adding 5 annotations to each file, a Text, Integer, Floating Point, Boolean, and Date.

S3 was not benchmarked again.

As a result of these tests the sweet spot for thread count is around 50 threads. It is not recommended to go over 50 threads as it resulted in signficant instability in the client.

Test Thread Count Synapseutils Sync os.walk + syn.store Per file size
25 Files 1MB total size 6 10.75s 10.96s 40KB
25 Files 1MB total size 25 6.79s 11.31s 40KB
25 Files 1MB total size 50 6.05s 10.90s 40KB
25 Files 1MB total size 100 6.14s 10.89s 40KB
775 Files 10MB total size 6 268.33s 298.12s 12.9KB
775 Files 10MB total size 25 162.63s 305.93s 12.9KB
775 Files 10MB total size 50 86.46s 304.40s 12.9KB
775 Files 10MB total size 100 85.55s 304.71s 12.9KB
10 Files 1GB total size 6 27.17s 36.25s 100MB
10 Files 1GB total size 25 22.26s 12.77s 100MB
10 Files 1GB total size 50 22.24s 12.26s 100MB
10 Files 1GB total size 100 Wouldn't complete Wouldn't complete 100MB

11/14/2023: Uploading files to Synapse, Default thread count

The results were created on a t3a.micro EC2 instance with a 200GB disk size running in us-east-1. The script that was run can be found in docs/scripts. The time to create the files on disk is not included.

This test uses the default number of threads in the client: multiprocessing.cpu_count() + 4

Test Synapseutils Sync os.walk + syn.store S3 Sync Per file size
25 Files 1MB total size 10.43s 8.99s 1.83s 40KB
775 Files 10MB total size 243.57s 257.27s 7.64s 12.9KB
10 Files 1GB total size 27.18s 33.73s 16.31s 100MB
10 Files 100GB total size 3211s 3047s 3245s 10GB