Available models and sources

thingsvision currently supports many models from several different sources, which represent different places or other libraries from which the model architectures or weights can come from. You can find more information about which models are available in which source and notes on their usage on this page.

`torchvision`

thingsvision supports all models from the torchvision.models module. You can find a list of all available models here.

Example:

model_name = 'alexnet'
source = 'torchvision'
device = 'cpu'

extractor = get_extractor(
  model_name=model_name,
  source=source,
  device=device,
  pretrained=True
)

Model names are case-sensitive and must be spelled exactly as they are in the torchvision documentation (e.g., alexnet, resnet18, vgg16, …).

If you use pretrained=True, the model will by default be pretrained on ImageNet, otherwise it is initialized randomly. For some models, torchvision provides multiple weight initializations, in which case you can pass the name of the weights in the model_parameters argument, e.g. if you want to get the extractor for a RegNet Y 32GF model, pretrained using SWAG and finetuned on ImageNet, you would do the following:

model_name = 'regnet_y_32gf'
source = 'torchvision'
device = 'cpu'

extractor = get_extractor(
  model_name=model_name,
  source=source,
  device=device,
  pretrained=True,
  model_parameters={'weights': 'IMAGENET1K_SWAG_LINEAR_V1'}
)

For a list of all available weights, please refer to the torchvision documentation.

`timm`

thingsvision supports all models from the timm module. You can find a list of all available models here.

Example:

model_name = 'tf_efficientnet_b0'
source = 'timm'
device = 'cpu'

extractor = get_extractor(
  model_name=model_name,
  source=source,
  device=device,
  pretrained=True
)

Model names are case-sensitive and must be spelled exactly as they are in the timm documentation (e.g., tf_efficientnet_b0, densenet121, mixnet_l, …).

If you use pretrained=True, the model will be pretrained according to the model documentation, otherwise it is initialized randomly.

`ssl`

thingsvision provides various Self-supervised learning models that are loaded from the VISSL library or the Torch Hub.

SimCLR (simclr-rn50)
MoCov V2 (mocov2-rn50),
Jigsaw (jigsaw-rn50),
RotNet (rotnet-rn50)
SwAV (swav-rn50)
PIRL (pirl-rn50)
BarlowTwins (barlowtwins-rn50)
VicReg (vicreg-rn50)
DINO (dino-rn50)

All models have the ResNet50 architecture and are pretrained on ImageNet-1K. Here, the model name describes the pre-training method, instead of the model architecture.

DINO models are available in ViT (Vision Transformer) and XCiT (Cross-Covariance Image Transformer) variants. For ViT models trained using DINO, the following models are available: dino-vit-small-p8, dino-vit-small-p16, dino-vit-big-p8, dino-vit-p16, where the trailing number describes the image patch resolution in the ViT (i.e. either 8x8 or 16x16). For the XCiT models, we have dino-xcit-small-12-p16, dino-xcit-small-12-p8, dino-xcit-medium-24-p16, dino-xcit-medium-24-p8, where the penultimate number represents model depth (12 = small, 24 = medium).

Example:

model_name = 'simclr-rn50'
source = 'ssl'
device = 'cpu'

extractor = get_extractor(
  model_name=model_name,
  source=source,
  device=device,
  pretrained=True
)

`keras`

thingsvision supports all models from the keras.applications module. You can find a list of all available models here.

Example:

model_name = 'VGG16'
source = 'keras'
device = 'cpu'

extractor = get_extractor(
  model_name=model_name,
  source=source,
  device=device,
  pretrained=True
)

Model names are case-sensitive and must be spelled exactly as they are in the keras.applications documentation (e.g., VGG16, ResNet50, InceptionV3, …).

If you use pretrained=True, the model will be pretrained on ImageNet, otherwise it is initialized randomly.

`custom`

In addition, we provide several custom models - that are not available in other sources -, in the custom source. These models are:

CORnet

We provide all CORnet models from this paper. Available model names are:

cornet-s
cornet-r
cornet-rt
cornet-z

Example:

model_name = 'cornet-s'
source = 'custom'
device = 'cpu'

extractor = get_extractor(
  model_name=model_name,
  source=source,
  device=device,
  pretrained=True
)

Models trained on Ecoset

We provide models trained on the Ecoset dataset, which contains 1.5m images from 565 categories selected to be both frequent in linguistic use and rated as concrete by human observers. Available model_names are:

Alexnet_ecoset
Resnet50_ecoset
VGG16_ecoset
Inception_ecoset

Example:

model_name = 'Alexnet_ecoset'
source = 'custom'
device = 'cuda'

extractor = get_extractor(
  model_name=model_name,
  source=source,
  device=device,
  pretrained=True
)

Models trained on ImageNet and fine-tuned on SalObjSub

We provide an Alexnet model pretrained on ImageNet and fine-tuned on SalObjSub. Available model name is:

AlexNet_SalObjSub

Example:

model_name = 'AlexNet_SalObjSub'
source = 'custom'
device = 'cpu'

extractor = get_extractor(
  model_name=model_name,
  source=source,
  device=device,
  pretrained=True
)

Official CLIP and OpenCLIP

We provide models trained using CLIP, both from the official repository and from OpenCLIP. Available model_names are:

clip
OpenClip

Both provide multiple model architectures and, in the case of OpenCLIP also multiple training datasets, which can be specified using the model_parameters argument. For example, if you want to get a ViT-B/32 model from official CLIP, you would do the following:

model_name = 'clip'
source = 'custom'
device = 'cpu'
model_parameters = {
    'variant': 'ViT-B/32'
}

extractor = get_extractor(
  model_name=model_name,
  source=source,
  device=device,
  pretrained=True,
  model_parameters=model_parameters
)

ViT-B/32 is the default model architecture, so you can also leave out the model_parameters argument. For a list of all available architectures and datasets, please refer to the CLIP repo.

In the case of OpenCLIP, you can also specify the dataset used for training for most models, e.g. if you want to get a ViT-B/32 model trained on the LAION-400M dataset, you would do the following:

model_name = 'OpenCLIP'
source = 'custom'
device = 'cpu'
model_parameters = {
    'variant': 'ViT-B/32',
    'dataset': 'laion400m_e32'
}

extractor = get_extractor(
  model_name=model_name,
  source=source,
  device=device,
  pretrained=True,
  model_parameters=model_parameters
)

For a list of all available architectures and datasets, please refer to the OpenCLIP repo.

Harmonization

We provide Harmonization models from the official repo. The following variants are available as of now:

ViT_B16
ResNet50
VGG16
EfficientNetB0
tiny_ConvNeXT
tiny_MaxViT
LeViT_small

Example:

model_name = 'Harmonization'
source = 'custom'
device = 'cpu'
model_parameters = {
    'variant': 'ViT_B16'
}

extractor = get_extractor(
  model_name=model_name,
  source=source,
  device=device,
  pretrained=True
  model_parameters=model_parameters
)