Perceiver IO

We have built the perceiver-io library, a modular implementation of the Perceiver family of model architectures (PerceiverPerceiver IO, and Perceiver AR) in PyTorch. Our library integrates with PyTorch Lightning for distributed training and Hugging Face for inference, making it easy to train and deploy these models.

The Perceiver architectures introduce several key innovations over standard transformer architectures, particularly in handling diverse and large-scale inputs and outputs. We have implemented and demonstrated these capabilities through various examples, including:

  • Masked language modeling: We provide pretrained and fine-tuned models for language understanding tasks.
  • Sentiment analysis: We fine-tuned a text classification model on the IMDb dataset to predict the sentiment of IMDb reviews.
  • Image classification: We implement image classification using pretrained weights and provide utilities to train custom image classification models.
  • Optical flow: We implement optical flow estimation, predicting the apparent motion of each pixel between two consecutive video frames.
  • Causal language modeling: We support the Perceiver AR architecture for causal language modeling using pretrained weights.
  • Symbolic audio modeling: We showcase how to use a Perceiver AR audio model to generate symbolic (MIDI) audio data, by training Perceiver AR on the GiantMIDI-Piano dataset.

These examples highlight the key strengths of the Perceiver architectures, including their ability to handle very large and diverse inputs, produce structured outputs, and maintain efficiency on long sequences.

The project provides both ported official models and custom models used in training examples.

 

Related articles:
Links:
Open Project

Examples

The following shows a side-by-side comparison of an input video and the estimated optical flow using Perceiver IO. The optical flow is the apparent motion of each pixel between two frames. For each pixel a flow vector is estimated specifying the direction and magnitude of the predicted motion.

Let's Collaborate

Got a project?

We’re a team of creatives who are excited about unique ideas and help fin-tech companies to create amazing identity by crafting top-notch UI/UX.

Back

Leave a Reply

Your email address will not be published. Required fields are marked *

This website stores cookies on your computer. Cookie Policy