Name Description Download
PHIS database for standalone This database was organized for standalone program deposited in https://github.com/HIT-ImmunologyLab/PHISDetector to download including PHIS custom database(Phage Genome and Protein Database (PGPD), Bacterial Genome and Protein Database (BGPD), Sequence Composition Database (SCD), Prophage DNA and Protein Database (PDPD), CRISPR Spacer Database (CSD), Protein-Protein Interaction (PPI) Database (PPID)) and trained machine learning models. The md5 checksum is c485de234bf1fb5bc66cae83f678b6a5
DBSCAN-SWA database for standalone This database was organized for standalone program deposited in https://github.com/HIT-ImmunologyLab/DBSCAN-SWA to download including Phage Genome and Protein Database (PGPD) and UniProt TrEML database
From PHISDetector


From Zenodo
Prophage
bacteria_prophage_info.txt 63,352 prophage regions identified in 9,646 bacterial genomes using Phage_Finder or DBSCAN-SWA (our in-house developed prophage detection tool) in Prophage DNA and Protein Database (PDPD)
prophage_summary_PHASTER.txt 41,428 prophage regions on 12,839 bacterial genomes from NCBI predicted by PHASTER
elife-08490-fig1-data3-v3-Virsorter.txt 2,499 prophage regions on 11,936 bacterial genomes from NCBI predicted by VirSorter
prophinder_prophage_regions.txt 1,109 prophage regions on 359 bacterial genomes from NCBI predicted by Prophinder
prophage_merge.txt 52,009 prophage regions on 24,335 bacterial genomes from NCBI integrated from the PHASTER, VirSorter and Prophinder predictions
CRISPR Repeat sequences
CRISPRmap_REPEATS.fa 3,527 repeats sequences gathered from CRISPRmap database
CRISPR spacer sequences
merge_nr_spacer.fa 418,766 spacer sequences created in CRISPR Spacer Database (CSD)
all-bacteria-spacer.spc 1,217,629 spacer sequences predicted in microbial genome and metagenome sequences from NCBI and NCBI whole genome sequencing (WGS)
Bacterial genomes
complete_bacteria_list.txt Information of 13,055 completely assembled bacterial genomes in Bacterial Genome and Protein Database (BGPD)
bacteria_assembly_genome_def_info.txt Information of the 20,393 bacterial complete genomes collected from NCBI
Phage genomes
phage_inf.txt Information of 10463 phage genome sequences collected from millardlab (http://millardlab.org/bioinformatics/bacteriophage-genomes/) in Bacterial Genome and Protein Database (BGPD)
phage_ncbi_refseq_def_info.txt Information of the 10,230 phage complete genomes collected from NCBI
Protein-protein Interaction
pro_pro_int_list.txt 912 non-redundant PPIs considered to be correlated with phage-host interactions in Protein-Protein Interaction (PPI) Database (PPID).
domain_domain_int.txt 318 non-redundant DDIs considered to be correlated with phage-host interactions in Protein-Protein Interaction (PPI) Database (PPID).
intact_interaction_info.txt 1,426,040 protein-protein interactions from IntAct database
Virulence Factors (VF)
ShortBRED_VF_2017_markers.faa 86,136 markers collection (mid-2017) for microbial Virulence Factors based on input protein sequences compiled from Victors, VFDB, and MvirDB
VFDB_setB_pro.fas 28,434 virulence factors from VFDB
Victors_pro_format.faa 4,958 virulence factors from Victors
Antibiotic resistance genes (ARGs)
ShortBRED_CARd_2017_markers.faa 3,237 markers collection (mid-2017) for Antibiotic Resistance Factors based on The Comprehensive Antibiotic Resistance Database (CARD)
Known phage-host pairs
phage-bacteria-pairs.txt 2928 known phage-host pairs extracted from NCBI based on the host annotation in Genbank format