fasttext
Less than 1 minute
fasttext
Introduction
- FastText is a library for efficient learning of word representations and sentence classification
- It allows users to learn text representations and text classifiers
- It is written in C++ and supports distributed training
- A popular idea in modern machine learning is to represent words by vectors. These vectors capture hidden information about a language, like word analogies or semantic. It is also used to improve performance of text classifiers.
prepare
- prepare pre-trained model
curl -LO https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.en.zip unzip wiki.en.zip
- prepare python script
vectorization.py
import fasttext import os model_path = os.getenv("MODEL_PATH", default="/app/model/wiki.en.bin") sentence = os.getenv("SENTENCE", default="hello world") model = fasttext.load_model(model_path) vector = model.get_sentence_vector(sentence) print(vector)
run with container
# NOTE: need more than 16GB memory
podman run --rm \
-v $(pwd)/wiki.en.bin:/app/model/wiki.en.bin \
-v $(pwd)/vectorization.py:/app/vectorization.py \
-e MODEL_PATH=/app/model/wiki.en.bin \
-e SENTENCE="On a freezing New Year's Eve, a poor young girl, shivering, bareheaded and barefoot, unsuccessfully tries to sell matches in the street." \
-it docker.io/library/python:3.12.1-bullseye \
bash -c "pip install -i https://mirrors.aliyun.com/pypi/simple/ fasttext==0.9.2 && python3 /app/vectorization.py"
reference
- https://github.com/facebookresearch/fastText