And for the sake of completeness, there's already a v1.9.1 patch release catching 2 minor bugs 😆
https://github.com/oschwengers/bakta/releases/tag/v1.9.1
(7/7)
Microbial bioinformatics, WGS bacteria, plasmids, PostDoc @JLUGiessen, father of 2, husband, astrophotographer
And for the sake of completeness, there's already a v1.9.1 patch release catching 2 minor bugs 😆
https://github.com/oschwengers/bakta/releases/tag/v1.9.1
(7/7)
We replaced HMMER with PyHMMER, and updated to Pyrodigal to v3.1. Furthermore, we bumped various dependencies to most recent versions.
(6/7)
We introduce auxiliary scripts for common downstream tasks as for example the extraction of annotations for certain sub regions or the aggregation of annotation stats of multiple genomes. Ideas, contributions & PRs are highly welcome!
(5/7)
Bakta now annotates and exports spacer and repeat sequences within CRISPR arrays.
(4/7)
Currently, only import of CDS coordinates are supported, but more might come later.
BTW, to additionally provide functional annotations of these CDS, you can provide related aa sequences with custom annotations via --proteins.
(3/7)
We introduce a new --region parameter supporting user-provided pre-annotated feature regions in Genbank/GFF3 format.
For example, CDS coordinates are imported, supersede ab initio-predicted CDS, and then are subject to the regular internal annotation workflow.
(2/7)
🦠🧬💻 Just released Bakta 1.9.0 with new features & various improvements:
- new --regions option to provide pre-annotated feature regions
- annotation of spacer & repeat sequences in CRISPR arrays
https://github.com/oschwengers/bakta/releases/tag/v1.9.0
More information below 👇 (1/7)
We fixed some rare occasions of wrong 5' / 3' ("prime") characters in product descriptions causing issues in downstream analyses. (6/6)
Now "bakta_proteins" writes its full annotation results as a comprehensive JSON - just like the main workflow. (5/6)
Compatibility with NCBI Bankit was improved:
- setting genome sequences' attributes "location" and "plasmid-name" (explicitly or auto-generated)
- removing strain designation from "organism"
(4/6)
and improved the --plasmid parameter automatically setting input sequence attributes (complete, circular) for improved convenience in most cases.
This can be overwritten via a replicon table (--replicons) (3/6)
We introduced a new --force option explicitly allowing to overwrite existing output directories (2/6)
🦠🧬💻 Just released Bakta 1.8.0 with various improvements:
- new --force option & improved --plasmid parameter
- improved
@NCBI
#Bankit compatibility
- increased sensitivity for user protein sequences
https://github.com/oschwengers/bakta/releases/tag/v1.8.0
More information below 👇 (1/6)
🦠🚀FYI: Bakta is now available via @galaxyproject!
Thanks & kudos to
@Pi_R_Marin@twitter.com
and the #UseGalaxy team.
v1.6.1 is available for all Galaxy instances via IUC.
So, if you're interested in using Bakta in Galaxy, kindly ask your local admins 😉
As well as tons of minor fixes and improvements, that have been applied. (10/10)
Of note, we updated Pyrodigal to most recent v2.1.0 fixing a bug in the SD motif-detection on reverse contig edges (9/10)
We introduced a "--meta" option to run gene prediction in metagenome mode. Of course, still only bacterial genome features and proteins will be annotated. (8/10)
Also several 3-letter gene symbol creation/curation steps were implemented for tRNAs/rRNAs/ncRNAs. (7/10)
"TssL" is automatically extracted from protein description, added to a list of alternative gene symbols, and finally the best-matching gene symbol for each gene is selected in line with its neighbors:
tssH T4SS ... TssH
tssL T4SS ... TssL
tssK T4SS ... TssK
(6/10)
We implemented several gene symbol curation steps for CDS operons.
For example:
tssH T4SS ATPase TssH
icmH T4SS protein TssL
tssK T4SS baseplate subunit TssK
... (5/10)