Skip to content

Benchmarking

Periodically we will be publishing results of benchmarking the Synapse Python Client compared to directly working with AWS S3. The purpose of these benchmarks is to make data driven decisions on where to spend time optimizing the client. Additionally, it will give us a way to measure the impact of changes to the client.

Results

12/12/2023: Downloading files from Synapse

The results were created on a t3a.micro EC2 instance with a 200GB disk size running in us-east-1. The script that was run can be found in docs/scripts/downloadBenchmark.py and docs/scripts/uploadTestFiles.py.

During this download test I tried various thread counts to see what performance looked like at different levels. What I found was that going over the default count of threads during download of large files (10GB and over) led to signficantly unstable performance. The client would often crash or hang during execution. As a result the general reccomendation is as follows:

  • For files over 1GB use the default number of threads: multiprocessing.cpu_count() + 4
  • For a large number of files 1GB and under 40-50 threads worked best
Test Thread Count Synapseutils Sync syn.getChildren + syn.get S3 Sync Per file size
25 Files 1MB total size 40 1.30s 5.48s 1.49s 40KB
775 Files 10MB total size 40 19.17s 161.46s 12.02s 12.9KB
10 Files 1GB total size 40 14.74s 21.91s 11.72s 100MB
10 Files 100GB total size 6 3859.66s 2006.53s 1023.57s 10GB
10 Files 100GB total size 40 Wouldn't complete Wouldn't complete N/A 10GB

12/06/2023: Uploading files to Synapse, Varying thread count, 5 annotations per file

The results were created on a t3a.micro EC2 instance with a 200GB disk size running in us-east-1. The script that was run can be found in docs/scripts. The time to create the files on disk is not included.

This test includes adding 5 annotations to each file, a Text, Integer, Floating Point, Boolean, and Date.

S3 was not benchmarked again.

As a result of these tests the sweet spot for thread count is around 50 threads. It is not reccomended to go over 50 threads as it resulted in signficant instability in the client.

Test Thread Count Synapseutils Sync os.walk + syn.store Per file size
25 Files 1MB total size 6 10.75s 10.96s 40KB
25 Files 1MB total size 25 6.79s 11.31s 40KB
25 Files 1MB total size 50 6.05s 10.90s 40KB
25 Files 1MB total size 100 6.14s 10.89s 40KB
775 Files 10MB total size 6 268.33s 298.12s 12.9KB
775 Files 10MB total size 25 162.63s 305.93s 12.9KB
775 Files 10MB total size 50 86.46s 304.40s 12.9KB
775 Files 10MB total size 100 85.55s 304.71s 12.9KB
10 Files 1GB total size 6 27.17s 36.25s 100MB
10 Files 1GB total size 25 22.26s 12.77s 100MB
10 Files 1GB total size 50 22.24s 12.26s 100MB
10 Files 1GB total size 100 Wouldn't complete Wouldn't complete 100MB

11/14/2023: Uploading files to Synapse, Default thread count

The results were created on a t3a.micro EC2 instance with a 200GB disk size running in us-east-1. The script that was run can be found in docs/scripts. The time to create the files on disk is not included.

This test uses the default number of threads in the client: multiprocessing.cpu_count() + 4

Test Synapseutils Sync os.walk + syn.store S3 Sync Per file size
25 Files 1MB total size 10.43s 8.99s 1.83s 40KB
775 Files 10MB total size 243.57s 257.27s 7.64s 12.9KB
10 Files 1GB total size 27.18s 33.73s 16.31s 100MB
10 Files 100GB total size 3211s 3047s 3245s 10GB