Introducing the Ettin Reranker Family
Introducing the Ettin Reranker Family TL;DR Today I'm releasing six new Sentence Transformers CrossEncoder rerankers, state-of-the-art at their respective sizes, built on top of the Ettin ModernBERT encoders, together with the data and full training recipe that produced them: cross-encoder/ettin-reranker-17m-v1 cross-encoder/ettin-reranker-32m-v1 cross-encoder/ettin-reranker-68m-v1 cross-encoder/ettin-reranker-150m-v1 cross-encoder/ettin-reranker-400m-v1 cross-encoder/ettin-reranker-1b-v1 The models were trained with a distillation recipe: pointwise MSE on mixedbread-ai/mxbai-rerank-large-v2 scores over cross-encoder/ettin-reranker-v1-data , which is a subset of lightonai/embeddings-pre-training mixed with a reranked subset of lightonai/embeddings-fine-tuning . Our six rerankers paired with google/embeddinggemma-300m on MTEB(eng, v2) Retrieval. See Results for five more embedder pairings. If you're new to rerankers and want the "why" first, jump to What is a reranker, and why pair one with an embedder?. If you just want to plug a model in, jump to Usage. If you want to train your own, jump to Training. I bootstrapped the training recipe below with the new train-sentence-transformers Agent Skill shipped in Sentence Transformers v5.5.0. Install it withhf skills add train-sentence-transformers [--global] [--claude] and ask your AI coding agent (Claude Code, Codex, Cursor, Gemini CLI, ...) to fine-tune aSentenceTransformer ,CrossEncoder , orSparseEncoder model on your data. Table of contents - What is a reranker, and why pair one with an embedder? - Usage - Architecture Details - Results - Training - Conclusion - Acknowledgements What is a reranker, and why pair one with an embedder? A reranker (a.k.a. pointwise cross-encoder) is a neural model that takes a (query, document) pair and outputs a single relevance score. Unlike an embedding model, which encodes the query and document separately and computes their similarity from the two embedding vectors, a reranker lets the two texts attend to each other through every transformer layer. That joint encoding is more accurate but also more expensive: the model has to be run once per (query, document) pair rather than once per text. Because cross-encoders are too expensive to run over a full corpus, the common production pattern is retrieve-then-rerank: a fast embedding model retrieves the top-K candidates (cheap), then a cross-encoder re-orders just those K with high accuracy. The total cost stays bounded while the final ranking is much closer to what an exhaustive cross-encoder pass would produce. Throughout this blogpost I'll use "reranker" and "cross-encoder" interchangeably. Usage The released models are normal Sentence Transformers CrossEncoder models, so you can use them with just 3 lines of code: from sentence_transformers import CrossEncoder model = CrossEncoder("cross-encoder/ettin-reranker-32m-v1") scores = model.predict([ ("Where was Apple founded?", "Apple Inc. was founded in Cupertino, California in 1976 by Steve Jobs, Steve Wozniak, and Ronald Wayne."), ("Where was Apple founded?", "The Fuji apple is an apple cultivar developed in the late 1930s and brought to market in 1962."), ]) print(scores) # [11.393298 2.968891] <- larger means more relevant For a query and a list of candidates, you can also use rank to get back sorted indices and scores: ranked = model.rank( query="Which planet is known as the Red Planet?", documents=[ "Venus is often called Earth's twin because of its similar size and proximity.", "Mars,…
