@kaasbaas #wikipedia (English language) is only ~22GB of #bzip2 compressed #xml (uncompressed size is ~86GB).
is it possible to access it without decompression? I guess #random #access to .xml.bz2 should be a solved problem, right?
we're routinely using gzip with random access in #bioinformatics ie via #samtools or #tabix
EDIT:
Wikipedia xml.bz2 does support random access for multistream version. does @kiwix or any other wiki reader support it? I couldn't find info on their website...