Skip to main content

CLIP

ben.wangzLess than 1 minute

CLIP

Introduction

prepare pre trained model

curl -LO https://huggingface.co/laion/CLIP-ViT-B-16-laion2B-s34B-b88K/resolve/main/open_clip_pytorch_model.bin

prepare base image

  • the dependency open-clip-torch is too big to install, so we need to prepare a base image with it installed
  • prepare base.dockerfile
    • FROM docker.io/library/python:3.12.1-bullseye
      
      RUN pip install -i https://mirrors.aliyun.com/pypi/simple/ open-clip-torch==2.24.0
      
      
  • build the base image
    • podman build -t open-clip-base:latest -f base.dockerfile .
      

Text Vectorization

  1. prepare text-vectorization.py
    • import os
      import torch
      import open_clip
      
      model_path = os.getenv("MODEL_PATH", default="/app/model/wiki.en.bin")
      model_name = os.getenv("MODEL_NAME", default="ViT-B-16")
      sentence = os.getenv("SENTENCE", default="hello world")
      device = "cuda" if torch.cuda.is_available() else "cpu"
      model, _, preprocess = open_clip.create_model_and_transforms(
          model_name="ViT-B-16", pretrained=model_path, device=device
      )
      tokenizer = open_clip.get_tokenizer(model_name="ViT-B-16")
      text = tokenizer([sentence])
      with torch.no_grad(), torch.cuda.amp.autocast():
          text_features = model.encode_text(text)
          text_features /= text_features.norm(dim=-1, keepdim=True)
          print(text_features)
      
      
  2. run with container
    • podman run --rm \
          -v $(pwd)/open_clip_pytorch_model.bin:/app/model/open_clip_pytorch_model.bin \
          -v $(pwd)/text-vectorization.py:/app/text-vectorization.py \
          -e MODEL_PATH=/app/model/open_clip_pytorch_model.bin \
          -e MODEL_NAME=ViT-B-16 \
          -e SENTENCE="On a freezing New Year's Eve, a poor young girl, shivering, bareheaded and barefoot, unsuccessfully tries to sell matches in the street." \
          -it localhost/clip-test:latest \
              python3 /app/text-vectorization.py
      

Image Vectorization

  1. prepare image-vectorization.py
    • import os
      import torch
      import open_clip
      
      from PIL import Image
      
      model_path = os.getenv("MODEL_PATH", default="/app/model/wiki.en.bin")
      model_name = os.getenv("MODEL_NAME", default="ViT-B-16")
      image_path = os.getenv("IMAGE_PATH", default="/app/image.jpg")
      device = "cuda" if torch.cuda.is_available() else "cpu"
      model, _, preprocess = open_clip.create_model_and_transforms(
          model_name="ViT-B-16", pretrained=model_path, device=device
      )
      image = preprocess(Image.open(image_path)).unsqueeze(0)
      with torch.no_grad(), torch.cuda.amp.autocast():
          image_features = model.encode_image(image)
          image_features /= image_features.norm(dim=-1, keepdim=True)
          print(image_features)
      
      
  2. prepare image.jpg
    • image.jpg
      image.jpg
  3. run with container
    • podman run --rm \
          -v $(pwd)/open_clip_pytorch_model.bin:/app/model/open_clip_pytorch_model.bin \
          -v $(pwd)/image-vectorization.py:/app/image-vectorization.py \
          -v $(pwd)/image.jpg:/app/image.jpg \
          -e MODEL_PATH=/app/model/open_clip_pytorch_model.bin \
          -e MODEL_NAME=ViT-B-16 \
          -e IMAGE_PATH=/app/image.jpg \
          -it localhost/clip-test:latest \
              python3 /app/image-vectorization.py
      

reference

  • https://github.com/openai/CLIP
  • https://github.com/mlfoundations/open_clip
  • https://huggingface.co/laion/CLIP-ViT-B-16-laion2B-s34B-b88K