Do you know #UnifiedLookupTables? With #ULT you leverage classical, CPU-only compression methods to encode data of any modality (natural language, chemical reactions, images, you name it) and use a ULT-pretrained #LLM to perform tasks on your data without actually exposing it to the model?
With ULT, making a #languageModel #multimodal becomes a matter of preprocessing your data on a CPU, with no architectural changes on the model. Read it here: https://doi.org/10.1088/2632-2153/ae143c
#AI