Rutam Prita Mishra
Rutam21
Passionate Software Developer
Rutam21
Passionate Software Developer
Updated 1 year ago
This is a model that can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model that uses two fixed, pretrained text encoders (OpenCLIP-ViT/G and CLIP-ViT/L).
model dvc git arxiv
Updated 1 year ago
This is a fine-tuned version of the multi-modal LayoutLM model for the task of question answering on documents.
dvc git
Updated 1 year ago
This model is a fine-tune checkpoint of DistilBERT-base-uncased, fine-tuned on SST-2.
model dvc git
Updated 1 year ago
BART is a transformer encoder-encoder (seq2seq) model with a bidirectional (BERT-like) encoder and an autoregressive (GPT-like) decoder.
model dvc git
Updated 1 year ago
This is a fine-tuned version of the multi-modal LayoutLM model for the task of question answering on invoices and other documents.
model dvc git
Updated 1 year ago
Vision-and-Language Transformer (ViLT) model fine-tuned on VQAv2 is up to tens of times faster than previous VLP models..
model dvc git
Updated 1 year ago
Pix2Struct is an image encoder - text decoder model that is trained on image-text pairs for various tasks, including image captioning and visual question answering.
model dvc git
Updated 1 year ago
This model is based on a multi-stage text-to-video generation diffusion model, which inputs a description text and returns a video that matches the text description.
model dvc git
Updated 1 year ago
BLIP is a new VLP framework which transfers flexibly to both vision-language understanding and generation tasks.
model dvc git
Updated 1 year ago
This is an image captioning model trained by @ydshieh in flax.
model dvc git
Updated 1 year ago
It is a pretrained model on English language using a masked language modeling (MLM) objective.
model dvc git
Updated 2 years ago
ScanObjectNN is a newly published real-world dataset comprising 2902 3D objects in 15 categories.
dataset dvc git 3d model
Updated 2 years ago
SSP-3D is an evaluation dataset consisting of 311 images of sportspersons in tight-fitted clothes, with a variety of body shapes and poses.
dataset dvc git 3d model
Updated 2 years ago
Urban Sound 8K is an audio dataset that contains 8732 labeled sound excerpts (<=4s) of urban sounds from 10 classes.
dataset audio dvc git
Updated 2 years ago
The FSDnoisy18k dataset is an open dataset containing 42.5 hours of audio across 20 sound event classes, including a small amount of manually-labeled data and a larger quantity of real-world noisy data.
dataset audio dvc git