OS | Version | Chrome | Firefox | Microsoft Edge | Safari |
Linux | Ubuntu 18.04 | 87.0.4280 | Not tested | n/a | n/a |
MacOS | HighSierra | 87.0.4280 | 84.0 | n/a | 13.1.2 |
Windows | 10 | 87.0.4280 | 83.0 | 87.0.664.66 | n/a |
The CRISPR-Cas systems are not only appealing systems to study adaptive bacterial and archaeal immunity against viral infection, but also have been extensively investigated due to their applications in the development of powerful genome editing tools. Meanwhile, viruses have evolved multiple anti-defense mechanisms during the host-parasite arms race to specifically inhibit CRISPR-Cas, such as diverse anti-CRISPR proteins (Acrs), which have enormous potential to be developed as modulators of genome editing tools.
CRISPRimmunity, an updated CRISPR-oriented web-tool, will dissect the key molecular events during the co-evolution of CRISPR and anti-CRISPR mechanisms, including CRISPR-Cas system detection and annotation, Anti-CRSISPR (Acr) protein and Acr-associated (Aca) gene annotation, self-targeting spacer search, repeat classification, prophage detection, bacteria-phage interaction detection based on spacer-protospacer matching, and integrated pipelines for novel Class 2 effector protein screening or Acr prediction.
Users can upload a prokaryotic genome sequence file with any assembly level (complete, scaffold, contig and chromosome) , a genome file contains only one prokaryotic strain, the file format can be (Multi-)FASTA or (Multi-)GenBank format .
8 modifiable parameters are supplied to Users corresponding the analysis module, 6 parameters are flexible, 2 parameters are fixed.
Parameter definition | Default value | Corresponding module |
---|---|---|
The number of upstream or downstream proteins of the predicted CRISPR array | 10 | CRISPR-Cas System |
The neighbour range of the predicted CRISPR array to identify signature genes | 20,000 bp | CRISPR-Cas Classification |
The mismatch for results of identifying self-targeting | 2 | Self-targeting prediction |
The coverage for results of identifying self-targeting, the interval of the value is from 0 to 1 | 1 | Self-targeting prediction |
The mismatch for results of predicting interacting bacteriophages | 2 | Spacer targeting MGE detection |
The coverage for results of predicting interacting bacteriophages, the interval of the value is from 0 to 1 | 0.9 | Spacer targeting MGE detection |
The identity for results of predicting known Acr protein or Aca gene (fixed) | 1 | Known Acr protein and Aca gene annotation |
The coverage for results of predicting known Acr protein or Aca gene (fixed) | 1 | Known Acr protein and Aca gene annotation |
A table contains the detailed information of overview of prediction module and the description is shown in the following:
Header | Description |
---|---|
Contig_ID | Accession ID of the input prokaryotic genome |
Contig_def | Definition of the input prokaryotic genome |
CRISPR array number | The number of CRISPR array detected in the prokaryotic genome |
Signature genes | The Cas protein detected within 20kb of the CRISPR array |
Self-targeting spacer number | The number of spacer targeting its own prokaryotic genome |
MGE targeting spacer number | The number of spacer targeting the MGE genome included in our in-house MGE database |
Prophage number | The number of prophage region in the prokaryotic genome |
Anti-CRISPR protein number | The number of Anti-CRISPR protein verificated by experiment |
CRISPRimmunity will provide the CRISPR-Cas system prediction results for the prokaryotic genome. A prokaryotic genome may contain both CRISPR array(s) and Cas genes, or CRISPR array(s) only, or Cas gene(s) only, or none of them. The (sub)type of identified CRISPR–Cas systems will be reported based on (sub)type signature Cas genes, the (sub)type of identified repeats will be reported based on repeat profile established by STSS. In addition to provide results in text, CRISPRimmunity provides visualization of the predicted systems.
(Sub)types of predicted CRISPR–Cas system
Two classes and six main types of the CRISPR—Cas system based on the defense mechanism & cas genes involved will be detected.
The class 1 system performs the function by a multi-subunit Cas protein complex, and the class 2 system requires only a single Cas protein (Cas9, Cas12 or Cas13) in the crRNA-effector complex.
(Sub)type can be identified by CRISPRimmunity:
Class | Type | Subtype | Target | tracrRNA |
---|---|---|---|---|
1 | I | I-A, I-B, I-C, I-D, I-E, I-F, I-U | DNA | No |
1 | III | III-A, III-B, III-C, III-D | DNA,RNA | No |
1 | IV | IV-A | DNA? | No |
2 | II | II-A, II-B, II-C | DNA | Yes |
2 | V | V-A(cas12a), V-B(cas12b), V-C(cas12c), V-D(casY), V-E(casX), V-F1(cas14a), V-F2(cas14b), V-G(cas12g), V-H(cas12h), V-I(cas12i), V-J(cas12j), V-K(cas12k), V-U(c2c4,c2c8,c2c10,c2c9), Cas14 family(c-k, U) | DNA,RNA(cas12g) | Partial Yes(V-B, V-E, V-F(cas14a), V-G, V-K) |
2 | VI | VI-A(cas13a), VI-B(cas13b), VI-C(cas13c), VI-D(cas13d) | RNA | No |
A table contains the detailed information of CRISPR-Cas system detection and annotation module and the description is shown in the following:
Header | Description |
---|---|
CRISPR_ID | CRISPR locus ID of the input prokaryotic genome |
CRISPR_location | CRISPR location of the current CRISPR locus |
CRISPR_type | CRISPR type annotated by effector Cas protein(s) |
Repeat_type | Repeat type annotated by sequence alignment with known type repeat database |
Spacer_info | Spacer number of CRISPR array, detail spacer sequence can be obtained by click the icon |
Cas protein info | Cas proteins within 20kbp of CRISPR array upstream and downstream |
CRISPR-Cas_info | Click the Detail button to get the detailed information of CRISPR-Cas systems |
Self-targeting is suggested as a form of autoimmunity[1], and has been used as a genomic marker to screen CRISPR-Cas inhibitor genes in Listeria monocytogenes[2]. CRISPRimmunity performs sequence alignment on its own genome with each spacer identified by CRISPR-Cas system module, the target (must outside bounds of found arrays) passing the screening criteria will be reported.
A table contains the detailed information of Self-targeting analysis module and the description is shown in the following:
Header | Description |
---|---|
CRISPR_ID | CRISPR locus ID of the input prokaryotic genome. |
Spacer_info | Spacer information of the current CRISPR locus, including CRISPR locus ID, spacer ID, spacer start location, spacer length, prokaryotic genome ID, method for predicting CRISPR array. |
Spacer_region | The region of the spacer in the prokaryotic genome. |
Spacer_length | The length of the spacer. |
Hit_ID | The accession ID of the genome targeted by the spacer. |
Protospacer_location | The location of the genome targeted by the spacer. |
Mismatch | The mismatch base number of the sequence between spacer and protospacer. |
Identity | The identity of the sequence between spacer and protospacer. The interval of the value is from 0 to 1. |
Phage-host interactions are appealing systems to study bacterial adaptive evolution and are increasingly recognized as playing an important role in human health and diseases, which may contribute to novel therapeutic agents, such as phage therapy to combat multi-drug resistant infections. CRISPRimmunity performs sequence alignment on phage genome within the pre-established phage database with each spacer identified by CRISPR-Cas system module, the target passing the screening criteria will be reported. The phage genome database contained 10,463 complete phage genome sequences collected from Millardlab[3], which were extracted from the GenBank database on May 2019.
A table contains the detailed information of Bacteria-phage interaction detection module and the description is shown in the following:
Header | Description |
---|---|
CRISPR_ID | CRISPR locus ID of the input prokaryotic genome |
Spacer_info | Spacer information of the current CRISPR locus, including CRISPR locus ID, spacer ID, spacer start location, spacer length, prokaryotic genome ID, method for predicting CRISPR array. |
Spacer_region | The region of the spacer in the prokaryotic genome. |
Spacer_length | The length of the spacer. |
Hit_phage_ID | The accession ID of the phage genome targeted by the spacer. |
Hit_phage_def | The definition of the phage genome targeted by the spacer. |
Protospacer_location | The location of the genome targeted by the spacer. |
Mismatch | The mismatch base number of the sequence between spacer and protospacer. |
Identity | The identity of the sequence between spacer and protospacer. The interval of the value is from 0 to 1. |
Identifying relationship between bacteria and phage is based on the phenomenon that many phages insert their genomes into that of their hosts, the integrated phages are known as prophages. There are two key steps for prediction using prophage analysis. The first step is to identify the prophage region in the bacteria genome, and the second step is to annotate the prophage region using BLASTN or BLASTP method by checking the similarity of DNA or protein sequence between the prophage region and Uniprot virus genomes.
DBSCAN-SWA detects prophage by combining DBSCAN and Sliding Window Algorithm.It uses density-based spatial clustering of applications with noise (DBSCAN) [4] to predict clusters of phage or phage-like genes and uses sliding window algorithm to predict these regions based on these bacterium proteins with enough proteins that are not predicted in DBSCAN. More detail information about the DBSCAN-SWA could be seen in the article [5].
A table contains the detailed information of Prophage detection module and the description is shown in the following:
Header | Description |
---|---|
Region | The index of the prophage region. |
Region Position | The position of the prophage region. |
Protein_number | The number of proteins within the prophage region. |
Hit_taxonomy | Genus of the most like phage integrated into the prokaryotic genome, the probability values are marked in brackets. |
Key_proteins | Key phage-related proteins |
Att_site | The information of the attL and attR, the value is set NA if attL/attR can not be detected. |
Prophage annotation | Click the Detail button to get the detailed information of prophage region. |
A table contains the detailed information of prophage region and the description is shown in the following:
Header | Description |
---|---|
Protein_ID | Accession ID of the protein within the prophage region. |
Protein_Def | Definition of the protein within the prophage region. |
Hit_ID | Homologous protein in UniProt database. |
Hit_Def | The organism of the homologous protein. |
E-value | The E-value of the protein sequence alignment. |
Identity | The identity of the protein sequence alignment. The interval of the value is from 0 to 1. |
Three strategy are used for discovering the inhibitor of the CRISPR-Cas system. One strategy is to identify the homologous proteins of the known Anti-CRISPR proteins. Identify the novel Anti-CRISPR protein based on Acr-associated protein is another strategy. An alternative inhibitor discovery strategy, the presence of stable self-targeting CRISPR sequences is applied as a potential indicator of genomes or mobile genetic elements (MGEs) harboring CRISPR inhibitors[6-7].
A table contains the detailed information of Anti-CRISPR and Acr-associated (Aca) protein detection module and the description is shown in the following:
Header | Description |
---|---|
Acr_ID | Accession ID of the Anti-CRISPR protein candidate |
Acr_info | The information of the Anti-CRISPR protein candidate |
Acr size | The size of the Anti-CRISPR protein candidate |
Homology with known anti | The homologous known Anti-CRISPR protein of the Anti-CRISPR protein candidate |
Neighbor Aca | The information of the Aca in the neighbor of novel Anti-CRISPR protein candidate |
in_prophage | The location of the prophage region containing the novel Anti-CRISPR protein candidate |
ST in prophage | The number of self-targeting in prophage region |
Neighbor HTH | The information of the HTH in the neighbor of novel Anti-CRISPR protein candidate |
Predict by AcRanker | Whether the the Anti-CRISPR protein candidate be predicted by AcRanker |
Three strategy are used for discovering the inhibitor of the CRISPR-Cas system. One strategy is to identify the homologous proteins of the known Anti-CRISPR proteins. Identify the novel Anti-CRISPR protein based on Acr-associated protein is another strategy. An alternative inhibitor discovery strategy, the presence of stable self-targeting CRISPR sequences is applied as a potential indicator of genomes or mobile genetic elements (MGEs) harboring CRISPR inhibitors[6-7].
3 modifiable parameters are supplied to Users corresponding the analysis module.
Parameter definition | Default value |
---|---|
The identity for results of predicting homologs of known anti-CRISPR protein, the interval of the value is from 0 to 1. | 0.4 |
The coverage for results of predicting homologs of known anti-CRISPR protein, the interval of the value is from 0 to 1. | 0.7 |
The number of upstream and downstream proteins for predicting novel anti-CRISPR protein by applying "guilt by association". | 3 |
CRISPRimmunity predicts novel class 2 CRISPR-Cas loci using comparative genomics screening approach as displayed in the following flowchart
Users can upload or paste a prokaryotic genome sequence file with any assembly level (complete, scaffold, contig and chromosome) , a genome file contains only one prokaryotic strain, the file format can be (Multi-)FASTA or (Multi-)GenBank format .
4 modifiable parameters are supplied to Users corresponding the analysis module.
Parameter definition | Default value |
---|---|
The minimal protein size of the novel effector protein. | 500aa |
The number of proteins distant from CRISPR array to predict novel effector protein. | 5 |
The identity for finding homolog proteins of candidate novel effector proteins, the interval of the value is from 0 to 1. | 0.3 |
The coverage for finding homolog proteins of candidate novel effector proteins, the interval of the value is from 0 to 1. | 0.6 |
We collected 99 known Anti-CRISPR proteins from anti-CRISPRdb. These bacterial genomes were analysed using the integrated programs of CRISPRimmunity including CRISPR-Cas, novel effector protein identification, Self-Targeting, bacterial-phage interaction, prophage and Anti-CRISPR proteins.
Definiton
FASTA format is a text-based format for representing either nucleotide sequences or peptide sequences, in which nucleotides or amino acids are represented using single-letter codes[8-9]. A FASTA format sequence starts with a single comment line and is followed by sequence lines. A greater-than (“>”) symbol is used before the first character of the comment line to distinguish it from sequence lines.
Example
>sequenceID-001 description
AAGTAGGAATAATATCTTATCATTATAGATAAAAACCTTCTGAATTTGCTTAGTGTGTAT
ACGACTAGACATATATCAGCTCGCCGATTATTTGGATTATTCCCTG
A multi-FASTA file contains multiple FASTA formated sequences.
Example
>sequenceID-001 description
AAGTAGGAATAATATCTTATCATTATAGATAAAAACCTTCTGAATTTGCTTAGTGTGTAT
ACGACTAGACATATATCAGCTCGCCGATTATTTGGATTATTCCCTG
>sequenceID-002 description
CAGTAAAGAGTGGATGTAAGAACCGTCCGATCTACCAGATGTGATAGAGGTTGCCAGTAC
AAAAATTGCATAATAATTGATTAATCCTTTAATATTGTTTAGAATATATCCGTCAGATAA
TCCTAAAAATAACGATATGATGGCGGAAATCGTC
>sequenceID-003 description
CTTCAATTACCCTGCTGACGCGAGATACCTTATGCATCGAAGGTAAAGCGATGAATTTAT
CCAAGGTTTTAATTTG
Definiton
The GenBank(full) format file contains many biological features of the record, including Locus line, sequence length, molecule type, record definition, accession id and others. The GenBank(full) format (GenBank Flat File Format) consists of an annotation section and a sequence section. The start of the annotation section is marked by a line beginning with the word “LOCUS”. The start of sequence section is marked by a line beginning with the word “ORIGIN” and the end of the section is marked by a line with only “//“.
Example
LOCUS EU490707 1302 bp DNA linear PLN 05-MAY-2008
DEFINITION Selenipedium aequinoctiale maturase K (matK) gene, partial cds;
chloroplast.
ACCESSION EU490707
VERSION EU490707.1 GI:186972394
KEYWORDS .
SOURCE chloroplast Selenipedium aequinoctiale
ORGANISM Selenipedium aequinoctiale
Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
Spermatophyta; Magnoliophyta; Liliopsida; Asparagales; Orchidaceae;
Cypripedioideae; Selenipedium.
REFERENCE 1 (bases 1 to 1302)
AUTHORS Neubig,K.M., Whitten,W.M., Carlsward,B.S., Blanco,M.A.,
Endara,C.L., Williams,N.H. and Moore,M.J.
TITLE Phylogenetic utility of ycf1 in orchids
JOURNAL Unpublished
REFERENCE 2 (bases 1 to 1302)
AUTHORS Neubig,K.M., Whitten,W.M., Carlsward,B.S., Blanco,M.A.,
Endara,C.L., Williams,N.H. and Moore,M.J.
TITLE Direct Submission
JOURNAL Submitted (14-FEB-2008) Department of Botany, University of
Florida, 220 Bartram Hall, Gainesville, FL 32611-8526, USA
FEATURES Location/Qualifiers
source 1..1302
/organism="Selenipedium aequinoctiale"
/organelle="plastid:chloroplast"
/mol_type="genomic DNA"
/specimen_voucher="FLAS:Blanco 2475"
/db_xref="taxon:256374"
gene <1..>1302
/gene="matK"
CDS <1..>1302
/gene="matK"
/codon_start=1
/transl_table=11
/product="maturase K"
/protein_id="ACC99456.1"
/db_xref="GI:186972395"
/translation="IFYEPVEIFGYDNKSSLVLVKRLITRMYQQNFLISSVNDSNQKG
FWGHKHFFSSHFSSQMVSEGFGVILEIPFSSQLVSSLEEKKIPKYQNLRSIHSIFPFL
EDKFLHLNYVSDLLIPHPIHLEILVQILQCRIKDVPSLHLLRLLFHEYHNLNSLITSK
KFIYAFSKRKKRFLWLLYNSYVYECEYLFQFLRKQSSYLRSTSSGVFLERTHLYVKIE
HLLVVCCNSFQRILCFLKDPFMHYVRYQGKAILASKGTLILMKKWKFHLVNFWQSYFH
FWSQPYRIHIKQLSNYSFSFLGYFSSVLENHLVVRNQMLENSFIINLLTKKFDTIAPV
ISLIGSLSKAQFCTVLGHPISKPIWTDFSDSDILDRFCRICRNLCRYHSGSSKKQVLY
RIKYILRLSCARTLARKHKSTVRTFMRRLGSGLLEEFFMEEE"
ORIGIN
1 attttttacg aacctgtgga aatttttggt tatgacaata aatctagttt agtacttgtg
61 aaacgtttaa ttactcgaat gtatcaacag aattttttga tttcttcggt taatgattct
121 aaccaaaaag gattttgggg gcacaagcat tttttttctt ctcatttttc ttctcaaatg
181 gtatcagaag gttttggagt cattctggaa attccattct cgtcgcaatt agtatcttct
241 cttgaagaaa aaaaaatacc aaaatatcag aatttacgat ctattcattc aatatttccc
301 tttttagaag acaaattttt acatttgaat tatgtgtcag atctactaat accccatccc
361 atccatctgg aaatcttggt tcaaatcctt caatgccgga tcaaggatgt tccttctttg
421 catttattgc gattgctttt ccacgaatat cataatttga atagtctcat tacttcaaag
481 aaattcattt acgccttttc aaaaagaaag aaaagattcc tttggttact atataattct
541 tatgtatatg aatgcgaata tctattccag tttcttcgta aacagtcttc ttatttacga
601 tcaacatctt ctggagtctt tcttgagcga acacatttat atgtaaaaat agaacatctt
661 ctagtagtgt gttgtaattc ttttcagagg atcctatgct ttctcaagga tcctttcatg
721 cattatgttc gatatcaagg aaaagcaatt ctggcttcaa agggaactct tattctgatg
781 aagaaatgga aatttcatct tgtgaatttt tggcaatctt attttcactt ttggtctcaa
841 ccgtatagga ttcatataaa gcaattatcc aactattcct tctcttttct ggggtatttt
901 tcaagtgtac tagaaaatca tttggtagta agaaatcaaa tgctagagaa ttcatttata
961 ataaatcttc tgactaagaa attcgatacc atagccccag ttatttctct tattggatca
1021 ttgtcgaaag ctcaattttg tactgtattg ggtcatccta ttagtaaacc gatctggacc
1081 gatttctcgg attctgatat tcttgatcga ttttgccgga tatgtagaaa tctttgtcgt
1141 tatcacagcg gatcctcaaa aaaacaggtt ttgtatcgta taaaatatat acttcgactt
1201 tcgtgtgcta gaactttggc acggaaacat aaaagtacag tacgcacttt tatgcgaaga
1261 ttaggttcgg gattattaga agaattcttt atggaagaag aa
//
A multi-GenBank(full) file contains multiple GenBank(full) formated sequences.
Definition
Repeat sequence can be obtain from result of the CRISPR-Cas association tools, such as CRISPRCasfinder, PILERCR, CRT. The input file for the repeat type annotation is organizing the repeat to (multi)FASTA format
Example
>NC_003155_11_CRT
CGACCCACCTCCGCTCGCGCGGAGAGAAC
>NC_003155_11_PILER-CR
CGACCCACCTCCGCTCGCGCGGAGAG
>NC_003155_12_PILER-CR
CGGTTCACCTCCGCTCGCGCGGAGAGCAC
>NC_003155_12_CRISPRCasFinder
GTGCTCTCCGCGCGAGCGGAGGTGAACCG
[1] Stern, A., Keren, L., Wurtzel, O., Amitai, G. & Sorek, R. Self-targeting by CRISPR: gene regulation or autoimmunity? Trends Genet. 26, 335–340 (2010).
[2] Bondy-Denomy J. Protein Inhibitors of CRISPR-Cas9. ACS Chem Biol. 2018 Feb 16;13(2):417-423. doi: 10.1021/acschembio.7b00831. Epub 2018 Jan 17. PMID: 29251498; PMCID: PMC6049822.
[3] Ester,M., Kriegel,H.P., Sander,J. and Xu,X. (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. KDD-1996 Proceedings. AAAI Press, Menlo Park, CA, pp. 226–231.
[4] You Zhou, Yongjie Liang, Karlene H. Lynch, Jonathan J. Dennis, David S. Wishart (2010) "PHAST: A Fast Phage Search Tool" (Nucleic Acid Research submitted)
[5] Bland C, Ramsey TL, Sabree F, Lowe M, Brown K, Kyrpides NC, Hugenholtz P. CRISPR Recognition Tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats. BMC Bioinformatics. 2007 Jun 18;8(1):209 [6]Rauch BJ, Silvis MR, Hultquist JF, Waters CS, McGregor MJ, Krogan NJ, Bondy-Denomy J. Inhibition of CRISPR-Cas9 with Bacteriophage Proteins. Cell. 2017 Jan 12;168(1-2):150-158.e10. doi: 10.1016/j.cell.2016.12.009. Epub 2016 Dec 29. PMID: 28041849; PMCID: PMC5235966.
[7] http://millardlab.org/bioinformatics/bacteriophage-genomes/