Perceiver IO
We have built the perceiver-io library, a modular implementation of the Perceiver family of model architectures (Perceiver, Perceiver IO, and Perceiver AR) in PyTorch. Our library integrates with PyTorch Lightning for distributed training and Hugging Face for inference, making it easy to train and deploy these models.
The Perceiver architectures introduce several key innovations over standard transformer architectures, particularly in handling diverse and large-scale inputs and outputs. We have implemented and demonstrated these capabilities through various examples, including:
- Masked language modeling: We provide pretrained and fine-tuned models for language understanding tasks.
- Sentiment analysis: We fine-tuned a text classification model on the IMDb dataset to predict the sentiment of IMDb reviews.
- Image classification: We implement image classification using pretrained weights and provide utilities to train custom image classification models.
- Optical flow: We implement optical flow estimation, predicting the apparent motion of each pixel between two consecutive video frames.
- Causal language modeling: We support the Perceiver AR architecture for causal language modeling using pretrained weights.
- Symbolic audio modeling: We showcase how to use a Perceiver AR audio model to generate symbolic (MIDI) audio data, by training Perceiver AR on the GiantMIDI-Piano dataset.
These examples highlight the key strengths of the Perceiver architectures, including their ability to handle very large and diverse inputs, produce structured outputs, and maintain efficiency on long sequences.
The project provides both ported official models and custom models used in training examples.
Related articles:
- Training compute-optimal Perceiver AR language models
- A gentle introduction to Rotary Position Embedding
Links: