Synapse Python Client Documentation¶
synapseclient package provides an interface to Synapse, a collaborative workspace
for reproducible data intensive research projects, providing support for:
- integrated presentation of data, code and text
- fine grained access control
- provenance tracking
synapseclient package lets you communicate with the cloud-hosted Synapse service to access data and create
shared data analysis projects from within Python scripts or at the interactive Python console. Other Synapse clients
exist for R,
the web. The Python client can also be used from the
The synapseclient package is available from PyPI. It can be installed or upgraded with pip:
(sudo) pip install (--upgrade) synapseclient[pandas, pysftp]
The dependencies on pandas and pysftp are optional. The Synapse
synapseclient.table feature integrates with
Pandas. Support for sftp is required for users of SFTP file storage. Both require native libraries to be compiled or
installed separately from prebuilt binaries.
Source code and development versions are available on Github. Installing from source:
git clone git://github.com/Sage-Bionetworks/synapsePythonClient.git cd synapsePythonClient
You can stay on the master branch to get the latest stable release or check out the develop branch or a tagged revision:
git checkout <branch or tag>
- Next, either install the package in the site-packages directory
python setup.py installor
python setup.py developto make the installation follow the head without having to reinstall:
python setup.py <install or develop>
Python 2 Support¶
The sun is setting on Python 2. Many major open source Python packages are moving to require Python 3.
The Synapse engineering team will step down Python 2.7 support to only bug fixes, and require Python 3 on new feature releases. Starting with Synapse Python client version 2.0 (will be released in Q1 2019), Synapse Python client will require Python 3.
Connecting to Synapse¶
To use Synapse, you’ll need to register for an account. The Synapse website can authenticate using a Google account, but you’ll need to take the extra step of creating a Synapse password to use the programmatic clients.
Once that’s done, you’ll be able to load the library, create a
Synapse object and login:
import synapseclient syn = synapseclient.Synapse() syn.login('my_username', 'my_password')
For more information, see:
Several components of the synapseclient can be imported as needed:
from synapseclient import Activity from synapseclient import Entity, Project, Folder, File, Link from synapseclient import Evaluation, Submission, SubmissionStatus from synapseclient import Wiki
Synapse identifiers are used to refer to projects and data which are represented by
objects. For example, the entity syn1899498 represents a tab-delimited
file containing a 100 by 4 matrix. Getting the entity retrieves an object that holds metadata describing the matrix,
and also downloads the file to a local cache:
entity = syn.get('syn1899498')
View the entity’s metadata in the Python console:
This is one simple way to read in a small matrix:
rows =  with open(entity.path) as f: header = f.readline().split('\t') for line in f: row = [float(x) for x in line.split('\t')] rows.append(row)
View the entity in the browser:
Organizing Data in a Project¶
You can create your own projects and upload your own data sets. Synapse stores entities in a hierarchical or tree structure. Projects are at the top level and must be uniquely named:
import synapseclient from synapseclient import Project, Folder, File, Link project = Project('My uniquely named project') project = syn.store(project)
Creating a folder:
data_folder = Folder('Data', parent=project) data_folder = syn.store(data_folder)
Adding files to the project:
test_entity = File('/path/to/data/file.xyz', description='Fancy new data', parent=data_folder) test_entity = syn.store(test_entity)
In addition to simple data storage, Synapse entities can be annotated with key/value metadata, described in markdown documents (wikis), and linked together in provenance graphs to create a reproducible record of a data analysis pipeline.
Annotating Synapse Entities¶
Annotations are arbitrary metadata attached to Synapse entities, for example:
test_entity.genome_assembly = "hg19"
Synapse provides tools for tracking ‘provenance’, or the transformation of raw data into processed results, by linking derived data objects to source data and the code used to perform the transformation.
Tables can be built up by adding sets of rows that follow a user-defined schema and queried using a SQL-like syntax.
Wiki pages can be attached to an Synapse entity (i.e. project, folder, file, etc). Text and graphics can be composed in markdown and rendered in the web view of the object.
An evaluation is a Synapse construct useful for building processing pipelines and for scoring predictive modelling and data analysis challenges.
By default, data sets in Synapse are private to your user account, but they can easily be shared with specific users, groups, or the public.
Accessing the API Directly¶
These methods enable access to the Synapse REST(ish) API taking care of details like endpoints and authentication. See the REST API documentation.
There is a companion module called synapseutils that provide higher level functionality such as recursive copying of content, syncing with Synapse and additional query functionality.