EntityViews¶
EntityViews in Synapse allow you to create a queryable view that provides a unified selection of entities stored in different locations within your Synapse project. This can be particularly useful for managing and querying metadata across multiple files, folders, or projects that you manage.
Views display rows and columns of information, and they can be shared and queried with SQL. Views are queries of other data already in Synapse. They allow you to see groups of entities including files, tables, folders, or datasets and any associated annotations about those items.
Annotations are an essential component to building a view. Annotations are labels that you apply to your data, stored as key-value pairs in Synapse. They help users search for and find data, and they are a powerful tool used to systematically group and describe things in Synapse.
This tutorial will follow a Flattened Data Layout. With a project that has this example layout:
.
└── single_cell_RNAseq_batch_1
├── SRR12345678_R1.fastq.gz
└── SRR12345678_R2.fastq.gz
Tutorial Purpose¶
In this tutorial you will:
- Create a EntityView with a number of columns
- Query the EntityView
- Update rows in the EntityView
- Update the scope of your EntityView
- Update the types of entities in your EntityView
Prerequisites¶
- This tutorial assumes that you have a project in Synapse with one or more files/folders. It does not need to match the given structure in this tutorial, but, if you do not have this already set up you may reference the Folder and File tutorials.
- Pandas must be installed as shown in the installation documentation
1. Find the synapse ID of your project¶
First let's set up some constants we'll use in this script, and find the ID of our project
import pandas as pd
from synapseclient import Synapse
from synapseclient.models import (
Column,
ColumnType,
EntityView,
Project,
ViewTypeMask,
query,
)
syn = Synapse()
syn.login()
# First let's get the project we want to create the EntityView in
my_project = Project(name="My uniquely named project about Alzheimer's Disease").get()
project_id = my_project.id
2. Create a EntityView with Columns¶
Now, we will create 4 columns to add to our EntityView. Recall that any data added to these columns will be stored as an annotation on the underlying File.
# Next let's add some columns to the EntityView, the data in these columns will end up
# being stored as annotations on the files
columns = [
Column(name="species", column_type=ColumnType.STRING),
Column(name="dataType", column_type=ColumnType.STRING),
Column(name="assay", column_type=ColumnType.STRING),
Column(name="fileFormat", column_type=ColumnType.STRING),
]
Next we're going to store what we have to Synapse and print out the results
# Then we will create a EntityView that is scoped to the project, and will contain a row
# for each file in the project
view = EntityView(
name="My Entity View",
parent_id=project_id,
scope_ids=[project_id],
view_type_mask=ViewTypeMask.FILE,
columns=columns,
).store()
print(f"My EntityView ID is: {view.id}")
# When the columns are printed you'll notice that it contains a number of columns that
# are automatically added by Synapse in addition to the ones we added
print(view.columns.keys())
3. Query the EntityView¶
# Query the EntityView
results_as_dataframe: pd.DataFrame = query(
query=f"SELECT id, name, species, dataType, assay, fileFormat, path FROM {view.id} WHERE path like '%single_cell_RNAseq_batch_1%'",
include_row_id_and_row_version=False,
)
print(results_as_dataframe)
The result of querying your File View should look like:
id name species dataType...
0 syn1 SRR12345678_R1.fastq.gz Homo sapiens geneExpression
1 syn2 SRR12345678_R1.fastq.gz Homo sapiens geneExpression
4. Update rows in the EntityView¶
Now that we know the data is present in the EntityView, let's go ahead and update the annotations on these Files. The following code sets all returned rows to a single value. Since the results were returned as a Pandas DataFrame you have many options to search through and set values on your data.
# Finally let's update the annotations on the files in the project
results_as_dataframe["species"] = ["Homo sapiens"] * len(results_as_dataframe)
results_as_dataframe["dataType"] = ["geneExpression"] * len(results_as_dataframe)
results_as_dataframe["assay"] = ["SCRNA-seq"] * len(results_as_dataframe)
results_as_dataframe["fileFormat"] = ["fastq"] * len(results_as_dataframe)
view.update_rows(
values=results_as_dataframe,
primary_keys=["id"],
wait_for_eventually_consistent_view=True,
)
A note on wait_for_eventually_consistent_view
: EntityViews in Synapse are eventually
consistent, meaning that updates to data may take some time to be reflected in the
view. The wait_for_eventually_consistent_view
flag allows the code to pause until
the changes are fully propagated to your EntityView. When this flag is set to True
a
query is automatically executed on the view to determine if the view contains the
updated changes. It will allow your next query on your view to reflect any changes that
you made. Conversely, if this is set to False
, there is no guarantee that your next
query will reflect your most recent changes.
5. Update the scope of your EntityView¶
As your project expands or contracts you will need to adjust the containers you'd like
to include in your view. In order to accomplish this you may modify the scope_ids
attribute on your view.
# Over time you may have a need to add or remove scopes from the EntityView, you may
# use `add` or `remove` along with the Synapse ID of the scope you wish to add/remove
view.scope_ids.add("syn1234")
# view.scope_ids.remove("syn1234")
view.store()
6. Update the types of Entities included in your EntityView¶
You may also want to change what types of Entities may be included in your view. To
accomplish this you'll be modifying the view_type_mask
attribute on your view.
# You may also need to add or remove the types of Entities that may show up in your view
# You will be able to specify multiple types using the bitwise OR operator, or a single value
view.view_type_mask = ViewTypeMask.FILE | ViewTypeMask.FOLDER
# view.view_type_mask = ViewTypeMask.FILE
view.store()
Results¶
Now that you have created and updated your File View, you can inspect it in the Synapse web UI. It should look similar to:
Source code for this tutorial¶
Click to show me
"""
Here is where you'll find the code for the EntityView tutorial.
"""
import pandas as pd
from synapseclient import Synapse
from synapseclient.models import (
Column,
ColumnType,
EntityView,
Project,
ViewTypeMask,
query,
)
syn = Synapse()
syn.login()
# First let's get the project we want to create the EntityView in
my_project = Project(name="My uniquely named project about Alzheimer's Disease").get()
project_id = my_project.id
# Next let's add some columns to the EntityView, the data in these columns will end up
# being stored as annotations on the files
columns = [
Column(name="species", column_type=ColumnType.STRING),
Column(name="dataType", column_type=ColumnType.STRING),
Column(name="assay", column_type=ColumnType.STRING),
Column(name="fileFormat", column_type=ColumnType.STRING),
]
# Then we will create a EntityView that is scoped to the project, and will contain a row
# for each file in the project
view = EntityView(
name="My Entity View",
parent_id=project_id,
scope_ids=[project_id],
view_type_mask=ViewTypeMask.FILE,
columns=columns,
).store()
print(f"My EntityView ID is: {view.id}")
# When the columns are printed you'll notice that it contains a number of columns that
# are automatically added by Synapse in addition to the ones we added
print(view.columns.keys())
# Query the EntityView
results_as_dataframe: pd.DataFrame = query(
query=f"SELECT id, name, species, dataType, assay, fileFormat, path FROM {view.id} WHERE path like '%single_cell_RNAseq_batch_1%'",
include_row_id_and_row_version=False,
)
print(results_as_dataframe)
# Finally let's update the annotations on the files in the project
results_as_dataframe["species"] = ["Homo sapiens"] * len(results_as_dataframe)
results_as_dataframe["dataType"] = ["geneExpression"] * len(results_as_dataframe)
results_as_dataframe["assay"] = ["SCRNA-seq"] * len(results_as_dataframe)
results_as_dataframe["fileFormat"] = ["fastq"] * len(results_as_dataframe)
view.update_rows(
values=results_as_dataframe,
primary_keys=["id"],
wait_for_eventually_consistent_view=True,
)
# Over time you may have a need to add or remove scopes from the EntityView, you may
# use `add` or `remove` along with the Synapse ID of the scope you wish to add/remove
view.scope_ids.add("syn1234")
# view.scope_ids.remove("syn1234")
view.store()
# You may also need to add or remove the types of Entities that may show up in your view
# You will be able to specify multiple types using the bitwise OR operator, or a single value
view.view_type_mask = ViewTypeMask.FILE | ViewTypeMask.FOLDER
# view.view_type_mask = ViewTypeMask.FILE
view.store()