CRISPR Tool example

This notebook shows how to use TeselaGen Python API client to predict target candidates for CRISPR.

Here, we will call TeselaGen API with a genome sequence in the arguments. These arguments will also specify a position within the sequence indicating where to look for target sequences. The tool returns prediction output in a format that can be easily parsed into a pandas dataframe or plotted in a jupyter notebook.

First we do some imports:

In [1]:
import platform
from pathlib import Path
import pandas as pd

import fastaparser
from dna_features_viewer import GraphicFeature, GraphicRecord

from teselagen.api.evolve_client import EVOLVEClient

print(f"python version     : {platform.python_version()}")
print(f"pandas version     : {pd.__version__}")
python version     : 3.6.9
pandas version     : 1.1.5
In [2]:
# Connect to your teselagen instance by passing it as the 'host_url' argument of EVOLVECLient(host_url=host_url)
#host_url = "https://your.teselagen.instance.com"
#client_e = EVOLVEClient(host_url = host_url)
client_e = EVOLVEClient()

Loading sequence from FASTA file

The method design_crispr_grnas recieves the sequence as a string variable. Thus, if the sequence is in a FASTA file you have to read it first. Here we load a file that contains a dummy genome (dummy_organism.fasta). You can replace this filepath with one of your interest, or you can also refer to the Hello_World_DESIGN_module notebook and import sequences directly from the platform.

In [3]:
with open("dummy_organism.fasta") as fasta_file:
    parser = fastaparser.Reader(fasta_file)
    for seq in parser:
        fasta_seq=seq.sequence_as_string()
        print(f"Loaded sequence: {seq.id}-{seq.description}")
        break
Loaded sequence: TG00001-Dummy organism complete genome

Now we have a sequence string loaded into our python environment. We are ready to use the tool for designing guide RNAs.

CRISPR Guide RNAs Tool

The CRISPR tool is hosted at Teselagen's EVOLVE platform. You can instatiate the client and directly call design_crispr_grnas (as shown in the next code cell, which will promt you to login) or login before calling this method, as shown on Hello_World_TEST_module

With design_crispr_grnas you will need to specify the reference or organism sequence with the sequence argument. You will also need to specify the targeting sequence. For this you can use the target_sequence argument, where you can specify the sequence as a string (of course, this string should also be contained by the reference sequence), or alternatively you can specify the indexes where the sequence is, with the argument target_indexes (ex: target_indexes=[500, 600]) with a list containing starting and ending indexes (count starts from zero). In the following example we will just use the target_sequence argument:

In [4]:
res = client_e.design_crispr_grnas(
            sequence=fasta_seq,
            target_sequence='TAGCGGCCAGGATGCTTTACCCAATATCAGCGATGCCGAACGTATTTTTGCCGAACTTTTGACGGGACTCGCCGCCGCCCAGCCGGGGTTCCCGCTGGCGC')
print('Done!')
display(pd.DataFrame(res['guides']))
Connection Accepted
Done!
end sequence offTargetScore start forward pam onTargetScore
0 562 ATTTTTGCCGAACTTTTGAC 100 543 True GGG 58.3
1 528 TGATATTGGGTAAAGCATCC 100 509 False TGG 55.1
2 557 AAGTTCGGCAAAAATACGTT 100 538 False CGG 51.2
3 584 GACTCGCCGCCGCCCAGCCG 100 565 True GGG 49.9
4 596 CAGCGGGAACCCCGGCTGGG 100 577 False CGG 47.2
5 599 CGCCAGCGGGAACCCCGGCT 100 580 False GGG 44.7
6 572 GGCGAGTCCCGTCAAAAGTT 100 553 False CGG 42.9

The algorithm returned a list of dictionaries that can be easily parsed into a pandas DataFrame, as shown above. Each row shows information from one candidate guide. This information includes the sequence, its position (start, and end) the onTargetScore and offTargetScore (ranging from 0 to 100, the higher the better) the associated pam sequence and a flag (forward) that is set to False if the sequence is in the backward stream.

And finally we use the dna_features_viewer library to plot the results. Here, the green annotation shows the main targeting sequence and the pink bars show the positions of the different guide candidates the algorithm found.

In [5]:
def show_crispr_grna_results(sequence, res, indexes=None):
    targeting_seq_feat = []
    if indexes is not None:
        targeting_seq_feat = [GraphicFeature(start=indexes[0], end=indexes[1], color="#cffccc", label="Sequence", strand=+1)]
    else:
        indexes = [min([x['start'] for x in res]), max([x['end'] for x in res])]
    record = GraphicRecord(
        sequence = sequence, 
        features = targeting_seq_feat +
                 [GraphicFeature(
                      start=x['start'], 
                      end=x['end']+1, 
                      color="#ffcccc", 
                      label=f"onTargetScore: {x['onTargetScore']}",
                      strand=+1 if x['forward'] else -1) for x in res])
    record=record.crop((indexes[0]-10, indexes[1]+11)) # crop
    ax, _ = record.plot(figure_width=20)
    record.plot_sequence(ax)
    
show_crispr_grna_results(sequence=fasta_seq, res=res['guides'], indexes=res['target_indexes'])
In [ ]: