This notebook shows how to use TeselaGen Python API client to predict target candidates for CRISPR.
Here, we will call TeselaGen API with a genome sequence in the arguments. These arguments will also specify a position within the sequence indicating where to look for target sequences. The tool returns prediction output in a format that can be easily parsed into a pandas dataframe or plotted in a jupyter notebook.
First we do some imports:
import platform
from pathlib import Path
import pandas as pd
import fastaparser
from dna_features_viewer import GraphicFeature, GraphicRecord
from teselagen.api.evolve_client import EVOLVEClient
print(f"python version : {platform.python_version()}")
print(f"pandas version : {pd.__version__}")
# Connect to your teselagen instance by passing it as the 'host_url' argument of EVOLVECLient(host_url=host_url)
#host_url = "https://your.teselagen.instance.com"
#client_e = EVOLVEClient(host_url = host_url)
client_e = EVOLVEClient()
The method design_crispr_grnas
recieves the sequence as a string variable. Thus, if the sequence is in a FASTA file you have to read it first. Here we load a file that contains a dummy genome (dummy_organism.fasta
). You can replace this filepath with one of your interest, or you can also refer to the Hello_World_DESIGN_module
notebook and import sequences directly from the platform.
with open("dummy_organism.fasta") as fasta_file:
parser = fastaparser.Reader(fasta_file)
for seq in parser:
fasta_seq=seq.sequence_as_string()
print(f"Loaded sequence: {seq.id}-{seq.description}")
break
Now we have a sequence string loaded into our python environment. We are ready to use the tool for designing guide RNAs.
The CRISPR tool is hosted at Teselagen's EVOLVE platform. You can instatiate the client and directly call design_crispr_grnas
(as shown in the next code cell, which will promt you to login) or login before calling this method, as shown on Hello_World_TEST_module
With design_crispr_grnas
you will need to specify the reference or organism sequence with the sequence
argument. You will also need to specify the targeting sequence. For this you can use the target_sequence
argument, where you can specify the sequence as a string (of course, this string should also be contained by the reference sequence), or alternatively you can specify the indexes where the sequence is, with the argument target_indexes
(ex: target_indexes=[500, 600]
) with a list containing starting and ending indexes (count starts from zero). In the following example we will just use the target_sequence
argument:
res = client_e.design_crispr_grnas(
sequence=fasta_seq,
target_sequence='TAGCGGCCAGGATGCTTTACCCAATATCAGCGATGCCGAACGTATTTTTGCCGAACTTTTGACGGGACTCGCCGCCGCCCAGCCGGGGTTCCCGCTGGCGC')
print('Done!')
display(pd.DataFrame(res['guides']))
The algorithm returned a list of dictionaries that can be easily parsed into a pandas DataFrame, as shown above. Each row shows information from one candidate guide. This information includes the sequence
, its position (start
, and end
) the onTargetScore
and offTargetScore
(ranging from 0 to 100, the higher the better) the associated pam
sequence and a flag (forward
) that is set to False
if the sequence is in the backward stream.
And finally we use the dna_features_viewer library to plot the results. Here, the green annotation shows the main targeting sequence and the pink bars show the positions of the different guide candidates the algorithm found.
def show_crispr_grna_results(sequence, res, indexes=None):
targeting_seq_feat = []
if indexes is not None:
targeting_seq_feat = [GraphicFeature(start=indexes[0], end=indexes[1], color="#cffccc", label="Sequence", strand=+1)]
else:
indexes = [min([x['start'] for x in res]), max([x['end'] for x in res])]
record = GraphicRecord(
sequence = sequence,
features = targeting_seq_feat +
[GraphicFeature(
start=x['start'],
end=x['end']+1,
color="#ffcccc",
label=f"onTargetScore: {x['onTargetScore']}",
strand=+1 if x['forward'] else -1) for x in res])
record=record.crop((indexes[0]-10, indexes[1]+11)) # crop
ax, _ = record.plot(figure_width=20)
record.plot_sequence(ax)
show_crispr_grna_results(sequence=fasta_seq, res=res['guides'], indexes=res['target_indexes'])