📱 Android Demo |
📣 GGUF |
✒️ Sherry Paper (ACL 2026) |
📖 Documentation |
🤗 AngelSlim |
💬 WeChat
Hy-MT1.5-1.8B translation quality scores. Source: HY-MT1.5 Technical Report
For more detailed information, please refer to [AngelSlim] and [HY-MT]
World-Class Translation Quality Hy-MT1.5-1.8B-1.25bit is built upon the Hy-MT1.5-1.8B foundation model, a specialized translation model developed by Tencent Hunyuan Team through a holistic multi-stage training pipeline integrating MT-oriented pre-training, supervised fine-tuning, on-policy distillation, and reinforcement learning. The base model natively supports 33 languages, 5 dialects/minority languages, and 1,056 translation directions. With only 1.8B parameters, it comprehensively outperforms much larger open-source models (e.g., Tower-Plus-72B, Qwen3-32B) and mainstream commercial translation APIs (e.g., Microsoft Translator, Doubao Translator). For full details, please refer to the HY-MT1.5-1.8B and HY-MT1.5 Technical Report.
Sherry: Extreme 1.25-bit Quantization This model employs Sherry (accepted at ACL 2026), a hardware-efficient ternary quantization framework. Sherry introduces a 3:4 fine-grained sparsity strategy: for every 4 model weights, the 3 most important are stored in 1-bit ({-1, +1}), while the remaining 1 is zeroed out. This packs 4 weights into just 5 bits, achieving an effective 1.25-bit width with power-of-two alignment, compressing the original 3.3GB FP16 model to just 440MB, with minimal accuracy loss.
Sherry fine-grained sparsity: for every 4 weights, the 3 most important are stored in 1-bit, and the remaining 1 is zeroed out.
Performance comparison of different model sizes on the Flores-200 Chinese-Foreign mutual translation benchmark:
Performance of different model sizes on the Flores-200 Chinese-Foreign mutual translation benchmark.
FP16 (8x speed) vs. 1.25-bit speed comparison. Demo device: Snapdragon 888, 8GB RAM:
Demo device: Snapdragon 888, 8GB RAM.
We provide a ready-to-use Android demo for offline translation. The demo features a background word extraction mode that works across any app on your phone — browse emails, webpages, or chat messages and get instant translations without switching apps. No network required, no data collection, one-time download for permanent use.
Download Demo:
https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit-GGUF/resolve/main/Hy-MT-demo.apk
Demo device: Snapdragon 865, 8GB RAM.
Demo device: Snapdragon 7+ Gen 2, 16GB RAM.
git clone https://github.com/ggml-org/llama.cpp.git
cd llama.cpp
git fetch origin pull/22836/head:pr-22836-stq_0
git checkout pr-22836-stq_0
pip install -r requirements.txt
cmake -B build
cmake --build build --config Release
pip install huggingface_hub
huggingface-cli download AngelSlim/Hy-MT1.5-1.8B-1.25bit \
--local-dir model_zoo/Hy-MT1.5-1.8B-1.25bit
python convert_hf_to_gguf.py model_zoo/Hy-MT1.5-1.8B-1.25bit \
--outfile model_zoo/Hy-MT1.5-1.8B-bf16.gguf \
--outtype bf16
./build/bin/llama-quantize \
model_zoo/Hy-MT1.5-1.8B-bf16.gguf \
model_zoo/Hy-MT1.5-1.8B-STQ1_0.gguf \
STQ1_0
The prompt format can be viewed at HY-MT1.5-1.8B
./build/bin/llama-completion \
--model model_zoo/Hy-MT1.5-1.8B-STQ1_0.gguf \
-p "Translate the following segment into Chinese, without additional explanation. Hello " \
--jinja \
-ngl 0 \
-n 64 -st
./build/bin/llama-bench -m model_zoo/Hy-MT1.5-1.8B-STQ1_0.gguf -ngl 0
The code for this project is open-sourced under the License for AngelSlim.
@misc{huang2026sherry,
title={Sherry: Hardware-Efficient 1.25-Bit Ternary Quantization via Fine-grained Sparsification},
author={Hong Huang and Decheng Wu and Qiangqiang Hu and Guanghua Yu and Jinhai Yang and Jianchen Zhu and Xue Liu and Dapeng Wu},
year={2026},
eprint={2601.07892},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2601.07892},
}
@article{angelslim2026,
title={AngelSlim: A more accessible, comprehensive, and efficient toolkit for large model compression},
author={Hunyuan AI Infra Team},
journal={arXiv preprint arXiv:2602.21233},
year={2026}
}
@misc{zheng2025hymt,
title={HY-MT1.5 Technical Report},
author={Mao Zheng and Zheng Li and Tao Chen and Mingyang Song and Di Wang},
year={2025},
eprint={2512.24092},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2512.24092},
}