Files in Synapse¶
Synapse files can be created by uploading content from your local computer or linking to digital files on the web.
Files in Synapse always have a “parent”, which could be a project or a folder. You can organize collections of files into folders and sub-folders, just as you would on your local computer.
Note: You may optionally follow the Uploading data in bulk tutorial instead. The bulk tutorial may fit your needs better as it limits the amount of code that you are required to write and maintain.
This tutorial will follow a Flattened Data Layout. With this example layout:
.
├── biospecimen_experiment_1
│ ├── fileA.txt
│ └── fileB.txt
├── biospecimen_experiment_2
│ ├── fileC.txt
│ └── fileD.txt
├── single_cell_RNAseq_batch_1
│ ├── SRR12345678_R1.fastq.gz
│ └── SRR12345678_R2.fastq.gz
└── single_cell_RNAseq_batch_2
├── SRR12345678_R1.fastq.gz
└── SRR12345678_R2.fastq.gz
Tutorial Purpose¶
In this tutorial you will:
- Upload several files to Synapse
- Print stored attributes about your files
- List all Folders and Files within my project
Prerequisites¶
- Make sure that you have completed the Folder tutorial.
- The tutorial assumes you have a number of files ready to upload. If you do not, create test or dummy files. You may also use these dummy files used during the creation of these tutorials. These are text files with example file extensions that a researcher may be using.
1. Upload several files to Synapse¶
First let's retrieve all of the Synapse IDs we are going to use¶
# Step 1: Upload several files to Synapse
import os
import synapseclient
import synapseutils
from synapseclient import File
syn = synapseclient.login()
# Retrieve the project ID
my_project_id = syn.findEntityId(
name="My uniquely named project about Alzheimer's Disease"
)
# Retrieve the IDs of the folders I want to upload to
batch_1_folder = syn.findEntityId(
parent=my_project_id, name="single_cell_RNAseq_batch_1"
)
batch_2_folder = syn.findEntityId(
parent=my_project_id, name="single_cell_RNAseq_batch_2"
)
biospecimen_experiment_1_folder = syn.findEntityId(
parent=my_project_id, name="biospecimen_experiment_1"
)
biospecimen_experiment_2_folder = syn.findEntityId(
parent=my_project_id, name="biospecimen_experiment_2"
Next let's create all of the File objects to upload content¶
# Create a File object for each file I want to upload
biospecimen_experiment_1_a_2022 = File(
path=os.path.expanduser("~/my_ad_project/biospecimen_experiment_1/fileA.txt"),
parent=biospecimen_experiment_1_folder,
)
biospecimen_experiment_1_b_2022 = File(
path=os.path.expanduser("~/my_ad_project/biospecimen_experiment_1/fileB.txt"),
parent=biospecimen_experiment_1_folder,
)
biospecimen_experiment_2_c_2023 = File(
path=os.path.expanduser("~/my_ad_project/biospecimen_experiment_2/fileC.txt"),
parent=biospecimen_experiment_2_folder,
)
biospecimen_experiment_2_d_2023 = File(
path=os.path.expanduser("~/my_ad_project/biospecimen_experiment_2/fileD.txt"),
parent=biospecimen_experiment_2_folder,
)
batch_1_scrnaseq_file_1 = File(
path=os.path.expanduser(
"~/my_ad_project/single_cell_RNAseq_batch_1/SRR12345678_R1.fastq.gz"
),
parent=batch_1_folder,
)
batch_1_scrnaseq_file_2 = File(
path=os.path.expanduser(
"~/my_ad_project/single_cell_RNAseq_batch_1/SRR12345678_R2.fastq.gz"
),
parent=batch_1_folder,
)
batch_2_scrnaseq_file_1 = File(
path=os.path.expanduser(
"~/my_ad_project/single_cell_RNAseq_batch_2/SRR12345678_R1.fastq.gz"
),
parent=batch_2_folder,
)
batch_2_scrnaseq_file_2 = File(
path=os.path.expanduser(
"~/my_ad_project/single_cell_RNAseq_batch_2/SRR12345678_R2.fastq.gz"
),
parent=batch_2_folder,
Finally we'll store the files in Synapse¶
# Upload each file to Synapse
biospecimen_experiment_1_a_2022 = syn.store(obj=biospecimen_experiment_1_a_2022)
biospecimen_experiment_1_b_2022 = syn.store(obj=biospecimen_experiment_1_b_2022)
biospecimen_experiment_2_c_2023 = syn.store(obj=biospecimen_experiment_2_c_2023)
biospecimen_experiment_2_d_2023 = syn.store(obj=biospecimen_experiment_2_d_2023)
batch_1_scrnaseq_file_1 = syn.store(obj=batch_1_scrnaseq_file_1)
batch_1_scrnaseq_file_2 = syn.store(obj=batch_1_scrnaseq_file_2)
batch_2_scrnaseq_file_1 = syn.store(obj=batch_2_scrnaseq_file_1)
Each file being uploaded has an upload progress bar:
##################################################
Uploading file to Synapse storage
##################################################
Uploading [####################]100.00% 2.0bytes/2.0bytes (1.8bytes/s) SRR12345678_R1.fastq.gz Done...
2. Print stored attributes about your files¶
# Step 2: Print stored attributes about your file
batch_1_scrnaseq_file_1_id = batch_1_scrnaseq_file_1.id
print(f"My file ID is: {batch_1_scrnaseq_file_1_id}")
print(f"The parent ID of my file is: {batch_1_scrnaseq_file_1.parentId}")
print(f"I created my file on: {batch_1_scrnaseq_file_1.createdOn}")
print(
f"The ID of the user that created my file is: {batch_1_scrnaseq_file_1.createdBy}"
)
You'll notice the output looks like:
My file ID is: syn53205687
The parent ID of my file is: syn53205629
I created my file on: 2023-12-28T21:55:17.971Z
The ID of the user that created my file is: 3481671
My file was last modified on: 2023-12-28T21:55:17.971Z
3. List all Folders and Files within my project¶
Now that your project has a number of Folders and Files let's explore how we can traverse the content stored within the Project.
# Step 3: List all Folders and Files within my project
for directory_path, directory_names, file_name in synapseutils.walk(
syn=syn, synId=my_project_id, includeTypes=["file"]
):
for directory_name in directory_names:
print(
f"Directory ({directory_name[1]}): {directory_path[0]}/{directory_name[0]}"
)
for file in file_name:
print(f"File ({file[1]}): {directory_path[0]}/{file[0]}")
The result of walking your project structure should look something like:
Directory (syn60109540): My uniquely named project about Alzheimer's Disease/biospecimen_experiment_1
Directory (syn60109543): My uniquely named project about Alzheimer's Disease/biospecimen_experiment_2
Directory (syn60109534): My uniquely named project about Alzheimer's Disease/single_cell_RNAseq_batch_1
Directory (syn60109537): My uniquely named project about Alzheimer's Disease/single_cell_RNAseq_batch_2
File (syn60115444): My uniquely named project about Alzheimer's Disease/biospecimen_experiment_1/fileA.txt
File (syn60115457): My uniquely named project about Alzheimer's Disease/biospecimen_experiment_1/fileB.txt
File (syn60115472): My uniquely named project about Alzheimer's Disease/biospecimen_experiment_2/fileC.txt
File (syn60115485): My uniquely named project about Alzheimer's Disease/biospecimen_experiment_2/fileD.txt
File (syn60115498): My uniquely named project about Alzheimer's Disease/single_cell_RNAseq_batch_1/SRR12345678_R1.fastq.gz
File (syn60115513): My uniquely named project about Alzheimer's Disease/single_cell_RNAseq_batch_1/SRR12345678_R2.fastq.gz
File (syn60115526): My uniquely named project about Alzheimer's Disease/single_cell_RNAseq_batch_2/SRR12345678_R1.fastq.gz
File (syn60115539): My uniquely named project about Alzheimer's Disease/single_cell_RNAseq_batch_2/SRR12345678_R2.fastq.gz
Results¶
Now that you have created your files you'll be able to inspect this on the Files tab of your project in the synapse web UI. It should look similar to:
Source code for this tutorial¶
Click to show me
"""
Here is where you'll find the code for the File tutorial.
"""
# Step 1: Upload several files to Synapse
import os
import synapseclient
import synapseutils
from synapseclient import File
syn = synapseclient.login()
# Retrieve the project ID
my_project_id = syn.findEntityId(
name="My uniquely named project about Alzheimer's Disease"
)
# Retrieve the IDs of the folders I want to upload to
batch_1_folder = syn.findEntityId(
parent=my_project_id, name="single_cell_RNAseq_batch_1"
)
batch_2_folder = syn.findEntityId(
parent=my_project_id, name="single_cell_RNAseq_batch_2"
)
biospecimen_experiment_1_folder = syn.findEntityId(
parent=my_project_id, name="biospecimen_experiment_1"
)
biospecimen_experiment_2_folder = syn.findEntityId(
parent=my_project_id, name="biospecimen_experiment_2"
)
# Create a File object for each file I want to upload
biospecimen_experiment_1_a_2022 = File(
path=os.path.expanduser("~/my_ad_project/biospecimen_experiment_1/fileA.txt"),
parent=biospecimen_experiment_1_folder,
)
biospecimen_experiment_1_b_2022 = File(
path=os.path.expanduser("~/my_ad_project/biospecimen_experiment_1/fileB.txt"),
parent=biospecimen_experiment_1_folder,
)
biospecimen_experiment_2_c_2023 = File(
path=os.path.expanduser("~/my_ad_project/biospecimen_experiment_2/fileC.txt"),
parent=biospecimen_experiment_2_folder,
)
biospecimen_experiment_2_d_2023 = File(
path=os.path.expanduser("~/my_ad_project/biospecimen_experiment_2/fileD.txt"),
parent=biospecimen_experiment_2_folder,
)
batch_1_scrnaseq_file_1 = File(
path=os.path.expanduser(
"~/my_ad_project/single_cell_RNAseq_batch_1/SRR12345678_R1.fastq.gz"
),
parent=batch_1_folder,
)
batch_1_scrnaseq_file_2 = File(
path=os.path.expanduser(
"~/my_ad_project/single_cell_RNAseq_batch_1/SRR12345678_R2.fastq.gz"
),
parent=batch_1_folder,
)
batch_2_scrnaseq_file_1 = File(
path=os.path.expanduser(
"~/my_ad_project/single_cell_RNAseq_batch_2/SRR12345678_R1.fastq.gz"
),
parent=batch_2_folder,
)
batch_2_scrnaseq_file_2 = File(
path=os.path.expanduser(
"~/my_ad_project/single_cell_RNAseq_batch_2/SRR12345678_R2.fastq.gz"
),
parent=batch_2_folder,
)
# Upload each file to Synapse
biospecimen_experiment_1_a_2022 = syn.store(obj=biospecimen_experiment_1_a_2022)
biospecimen_experiment_1_b_2022 = syn.store(obj=biospecimen_experiment_1_b_2022)
biospecimen_experiment_2_c_2023 = syn.store(obj=biospecimen_experiment_2_c_2023)
biospecimen_experiment_2_d_2023 = syn.store(obj=biospecimen_experiment_2_d_2023)
batch_1_scrnaseq_file_1 = syn.store(obj=batch_1_scrnaseq_file_1)
batch_1_scrnaseq_file_2 = syn.store(obj=batch_1_scrnaseq_file_2)
batch_2_scrnaseq_file_1 = syn.store(obj=batch_2_scrnaseq_file_1)
batch_2_scrnaseq_file_2 = syn.store(obj=batch_2_scrnaseq_file_2)
# Step 2: Print stored attributes about your file
batch_1_scrnaseq_file_1_id = batch_1_scrnaseq_file_1.id
print(f"My file ID is: {batch_1_scrnaseq_file_1_id}")
print(f"The parent ID of my file is: {batch_1_scrnaseq_file_1.parentId}")
print(f"I created my file on: {batch_1_scrnaseq_file_1.createdOn}")
print(
f"The ID of the user that created my file is: {batch_1_scrnaseq_file_1.createdBy}"
)
print(f"My file was last modified on: {batch_1_scrnaseq_file_1.modifiedOn}")
# Step 3: List all Folders and Files within my project
for directory_path, directory_names, file_name in synapseutils.walk(
syn=syn, synId=my_project_id, includeTypes=["file"]
):
for directory_name in directory_names:
print(
f"Directory ({directory_name[1]}): {directory_path[0]}/{directory_name[0]}"
)
for file in file_name:
print(f"File ({file[1]}): {directory_path[0]}/{file[0]}")