Sup' #python #bioinformatics ! Didn't announce it so far, but now that it's working, I'm pleased to say you can now pip install BLAST+ (and more)! Technicalities below🧶 1/12
Disclaimer, this is not an official @NCBI thing or anything, but it builds pretty heavily on the NCBI C++ Toolkit (https://www.ncbi.nlm.nih.gov/t... a unified API to the NCBI algorithms and data model, so kudos to the developers who made that possible 🤝 2/11
Among other things, the C++ Toolkit has a BLAST API, which allows getting the same results as the BLAST+ binaries we love. However, the project is huge! ~2000 C++ files to compile to get the BLAST+ functionalities. 3/12
Since this is not super practical to link statically à la PyHMMER (where you get the whole HMMER library linked statically inside pyhmmer.plan7), I instead setup a more indirect system, where the C++ Toolkit libraries are distributed as dynamic libraries in their own package 4/12
This package (https://pypi.org/project/pyncb... is built with @JFrog's Conan package manager, and distributed as Python-agnostic platform-specific platform wheels, which means you should not have to rebuild it across installs (at least on MacOS 13+ and Linux) 5/11
Then the main package (https://pypi.org/project/pyncb... contains the Cython bindings linking against the runtime components (with some dynamic RUNPATH stuff that was a pain to get working), meaning I can still update them often without expecting you to perform a 1h build downstream 6/12
Once you have pyncbitk setup, you can run your BLAST from the Python interpreter, and you even get #mypy type annotations for the parameters. 7/12
The docs and interface are WIP, but you can already find some working examples that show how to prepare data for BLASTn and how to recover results (https://pyncbitk.readthedocs.i... 8/12
As a proof-of-concept, I ported @torstenseemann's ABRicate into a pure-Python package (https://pypi.org/project/pyabr... and it's working like a charm! 9/12
Overall it's still rough around the edges, but I think that's a major personal milestone in pushing forward the technical soundness of bioinformatics foundation tools like BLAST+! 10/12
The C++ Toolkit is actually really feature-rich and also contains some APIs to handle taxonomy, other algorithms like Gnomon or Dustmasker (though you can already use @apcamargo_ 's excellent pydustmasker for that) so I'll keep expanding functionalities in the Python part. 11/12
Code on GitHub of course (https://github.com/althonos/py... if you wanna have a look. Happy coding 🤖 ! 12/12




