Logo
OneLLM logo

OneLLM

One Framework to Align All Modalities with Language

Visit Website
Screenshot of OneLLM
December 29th, 2024

About OneLLM

OneLLM is a framework presented in the paper titled 'OneLLM: One Framework to Align All Modalities with Language'. It is designed to align multiple modalities with language using a unified framework. The framework consists of modality tokenizers, a universal encoder, a universal projection module (UPM), and a language-like model (LLM). By leveraging these components, OneLLM enables the alignment of various modalities such as images, videos, depth/normal maps, etc., with textual information. It i...

Key Features

4 features
  • Modality tokenizers for transforming input signals into tokens.
  • Universal encoder for representing input modalities in a common embedding space.
  • Universal projection module (UPM) for mapping the modalities to the language space.
  • Language-like model (LLM) for generating textual descriptions from the aligned modalities.

Use Cases

4 use cases
  • Image captioning: Generating captions for images.
  • Visual question answering: Answering questions based on visual content.
  • Multi-modal sentiment analysis: Analyzing sentiments from combined textual and visual information.
  • Cross-modal retrieval: Retrieving relevant information across different modalities.
Loading reviews...

Browse All Tools in These Categories