#BioPerl

2025-02-04

Here's a #BioPerl blast from the past @hyphaltip - github.com/OBF/wp-content/blob

(Spotted while working with @gedankenstuecke et al on the trial @OpenBio website migration from WordPress to Hugo on GitHub Pages)

Christos Argyropoulos MD, PhDChristosArgyrop@mstdn.science
2024-03-22

Wrapping over #hmmer hmmer.org/ made me appreciate AUTOLOAD in #perl @Perl
The actual code I had to write was minimal , i.e. about 23 lines in the pm file and ~85 in the alienfile, but it ended up "containerizing" (inside perlbrew) all 41 programs of the HMMER and EASEL suites #bioinformatics
#Github repo:
github.com/chrisarg/alien-seqa
#cpan:
metacpan.org/pod/Alien::SeqAli

#bioperl relevant modules:
metacpan.org/pod/Bio::Tools::R
metacpan.org/pod/Bio::Index::H
Great start for building one's own programs.

Christos Argyropoulos MD, PhDChristosArgyrop@mstdn.science
2023-10-01

The beauty, succinctness & speed of #bioperl #perl
(creating and accessing an index of 191,106 sequences ~ 275MB of biological (human #cDNA and #ncRNA) sequence data

5 sec to create the index (using BerkeleyDB), and 12 sec to transverse the sequence data
#bioinformatics @Perl

(Code in alt text of the left image edited to fit the character limit)

use LWP::Simple;
use FindBin qw($Bin);
use File::Basename;
use File::Spec;
use Bio::DB::Fasta;   
use Memory::Usage;   
my $download_dir = File::Spec->catfile( $Bin, 'fastaloc' );
mkdir $download_dir unless -e $download_dir;
my @fasta_files = qw(
  https://ftp.ensembl.org/pub/release-110/fasta/homo_sapiens/ncrna/Homo_sapiens.GRCh38.ncrna.fa.gz
  https://ftp.ensembl.org/pub/release-110/fasta/homo_sapiens/cds/Homo_sapiens.GRCh38.cds.all.fa.gz
);
my @index_files;
for my $dataset (@fasta_files) {
$download_dir and upack them
    my $dataset_fname             = basename($dataset);
    my $uncompressed_dataset_name = $dataset_fname =~ s/.gz//r;
    $dataset_fname = File::Spec->catfile( $download_dir, $dataset_fname );
    $uncompressed_dataset_name =
      File::Spec->catfile( $download_dir, $uncompressed_dataset_name );
    unless ( -e $dataset_fname || -e $uncompressed_dataset_name ) {
        my $rc = getstore( $dataset, $dataset_fname );
        if ( is_error($rc) ) {
            next "getstore of <$dataset> failed with $rc";
        }
    }
    system 'gzip', '-d', $dataset_fname
      unless -e $uncompressed_dataset_name;
    push @index_files,$uncompressed_dataset_name;
}
my $mu = Memory::Usage->new();
$mu->record('Before db');
my $db = Bio::DB::Fasta->new( $download_dir);
$mu->record('After db');
my $stream  = $db->get_PrimarySeq_stream;
$mu->record('After stream');
while (my $seq = $stream->next_seq) {
   my $sequence = $seq->seq(); }
$mu->record('Acc seq');
$mu->dump();

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst