CRISPR Tool example¶

This notebook shows how to use TeselaGen Python API client to predict target candidates for CRISPR.

Here, we will call TeselaGen API with a genome sequence in the arguments. These arguments will also specify a position within the sequence indicating where to look for target sequences. The tool returns prediction output in a format that can be easily parsed into a pandas dataframe or plotted in a jupyter notebook.

First we do some imports:

import platform
from pathlib import Path
import pandas as pd

import fastaparser
from dna_features_viewer import GraphicFeature, GraphicRecord

from teselagen.api.evolve_client import EVOLVEClient

print(f"python version     : {platform.python_version()}")
print(f"pandas version     : {pd.__version__}")

python version     : 3.6.9
pandas version     : 1.1.5

# Connect to your teselagen instance by passing it as the 'host_url' argument of EVOLVECLient(host_url=host_url)
#host_url = "https://your.teselagen.instance.com"
#client_e = EVOLVEClient(host_url = host_url)
client_e = EVOLVEClient()

Loading sequence from FASTA file¶

The method design_crispr_grnas recieves the sequence as a string variable. Thus, if the sequence is in a FASTA file you have to read it first. Here we load a file that contains a dummy genome (dummy_organism.fasta). You can replace this filepath with one of your interest, or you can also refer to the Hello_World_DESIGN_module notebook and import sequences directly from the platform.

with open("dummy_organism.fasta") as fasta_file:
    parser = fastaparser.Reader(fasta_file)
    for seq in parser:
        fasta_seq=seq.sequence_as_string()
        print(f"Loaded sequence: {seq.id}-{seq.description}")
        break

Loaded sequence: TG00001-Dummy organism complete genome

Now we have a sequence string loaded into our python environment. We are ready to use the tool for designing guide RNAs.

CRISPR Guide RNAs Tool¶

The CRISPR tool is hosted at Teselagen's EVOLVE platform. You can instatiate the client and directly call design_crispr_grnas (as shown in the next code cell, which will promt you to login) or login before calling this method, as shown on Hello_World_TEST_module

With design_crispr_grnas you will need to specify the reference or organism sequence with the sequence argument. You will also need to specify the targeting sequence. For this you can use the target_sequence argument, where you can specify the sequence as a string (of course, this string should also be contained by the reference sequence), or alternatively you can specify the indexes where the sequence is, with the argument target_indexes (ex: target_indexes=[500, 600]) with a list containing starting and ending indexes (count starts from zero). In the following example we will just use the target_sequence argument:

res = client_e.design_crispr_grnas(
            sequence=fasta_seq,
            target_sequence='TAGCGGCCAGGATGCTTTACCCAATATCAGCGATGCCGAACGTATTTTTGCCGAACTTTTGACGGGACTCGCCGCCGCCCAGCCGGGGTTCCCGCTGGCGC')
print('Done!')
display(pd.DataFrame(res['guides']))

Connection Accepted
Done!

The algorithm returned a list of dictionaries that can be easily parsed into a pandas DataFrame, as shown above. Each row shows information from one candidate guide. This information includes the sequence, its position (start, and end) the onTargetScore and offTargetScore (ranging from 0 to 100, the higher the better) the associated pam sequence and a flag (forward) that is set to False if the sequence is in the backward stream.

And finally we use the dna_features_viewer library to plot the results. Here, the green annotation shows the main targeting sequence and the pink bars show the positions of the different guide candidates the algorithm found.

def show_crispr_grna_results(sequence, res, indexes=None):
    targeting_seq_feat = []
    if indexes is not None:
        targeting_seq_feat = [GraphicFeature(start=indexes[0], end=indexes[1], color="#cffccc", label="Sequence", strand=+1)]
    else:
        indexes = [min([x['start'] for x in res]), max([x['end'] for x in res])]
    record = GraphicRecord(
        sequence = sequence, 
        features = targeting_seq_feat +
                 [GraphicFeature(
                      start=x['start'], 
                      end=x['end']+1, 
                      color="#ffcccc", 
                      label=f"onTargetScore: {x['onTargetScore']}",
                      strand=+1 if x['forward'] else -1) for x in res])
    record=record.crop((indexes[0]-10, indexes[1]+11)) # crop
    ax, _ = record.plot(figure_width=20)
    record.plot_sequence(ax)
    
show_crispr_grna_results(sequence=fasta_seq, res=res['guides'], indexes=res['target_indexes'])

	end	sequence	offTargetScore	start	forward	pam	onTargetScore
0	562	ATTTTTGCCGAACTTTTGAC	100	543	True	GGG	58.3
1	528	TGATATTGGGTAAAGCATCC	100	509	False	TGG	55.1
2	557	AAGTTCGGCAAAAATACGTT	100	538	False	CGG	51.2
3	584	GACTCGCCGCCGCCCAGCCG	100	565	True	GGG	49.9
4	596	CAGCGGGAACCCCGGCTGGG	100	577	False	CGG	47.2
5	599	CGCCAGCGGGAACCCCGGCT	100	580	False	GGG	44.7
6	572	GGCGAGTCCCGTCAAAAGTT	100	553	False	CGG	42.9