Download

Stand-alone Version

The stand-alone version of PhaBOX for large-scale inputs can be downloaded via https://github.com/KennthShang/PhaBOX.

Please noted that the local version of PhaBOX will not generate the visualization files. However, all the intermediate files, such as the network files and significant protein alignments will still provided as outputs.

Protein cluster database

The protein cluster database and the annotation of the proteins are provided for user who may want to further analysis the alignment results.

Dataset

Below, we provided the scource of the training and test data for user who may want to use for study. Because some of the benchmark datasets are curated by other research groups. We will listed the name of the paper and the link to the dataset. All the data are public and we are grateful for their contributions to our study.

Because some datasets are very large in size, only the accession and the label are given in CSV format. In this case, there are some useful websites/tools that may help you to download:

PhaMer
  • [Phages database]

    237 DOWNLOADS

  • [Bacteria database]

    229 DOWNLOADS

  • Mock metagenomic data: https://www.ebi.ac.uk/ena/browser/view/PRJEB19901

    From paper: Kleiner, M., Thorson, E., Sharp, C. E., Dong, X., Liu, D., Li, C., & Strous, M. (2017). Assessing species biomass contributions in microbial communities via metaproteomics. Nature communications, 8(1), 1-14.

  • IMG/VR v3 data: https://genome.jgi.doe.gov/portal/IMG_VR/IMG_VR.home.html

    From paper: Simon Roux, David Páez-Espino, I-Min A Chen, Krishna Palaniappan, Anna Ratner, Ken Chu, T B K Reddy, Stephen Nayfach, Frederik Schulz, Lee Call, Russell Y Neches, Tanja Woyke, Natalia N Ivanova, Emiley A Eloe-Fadrosh, Nikos C Kyrpides, IMG/VR v3: an integrated ecological and evolutionary framework for interrogating genomes of uncultivated viruses, Nucleic Acids Research, Volume 49, Issue D1, 8 January 2021, Pages D764–D775.

PhaGCN

The ICTV taxa are from: https://ictv.global/taxonomy

PhaTYP
CHERRY
  • [The VHM dataset]

    536 DOWNLOADS

    From paper: Ahlgren, N. A., Ren, J., Lu, Y. Y., Fuhrman, J. A., & Sun, F. (2017). Alignment-free oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences. Nucleic acids research, 45(1), 39-53.

  • [The TEST dataset]

    212 DOWNLOADS

    From paper: Lu, C., Zhang, Z., Cai, Z., Zhu, Z., Qiu, Y., Wu, A., ... & Peng, Y. (2021). Prokaryotic virus host predictor: a Gaussian model for host prediction of prokaryotic viruses in metagenomics. BMC biology, 19(1), 1-11.

  • Hi-C dataset: https://github.com/mmarbout/HGP-Hi-C

    From paper: Marbouty, M., Thierry, A., Millot, G. A., & Koszul, R. (2021). MetaHiC phage-bacteria infection network reveals active cycling phages of the healthy human gut. Elife, 10, e60608.

PhaVIP

The stand-alone version of PhaVIP for large-scale inputs can be downloaded via https://github.com/KennthShang/PhaVIP.

Released information and annotation of the proteins