YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Running Jina Embedding V3 on Text-Embedding-Inference
See branch: TEI-support
Changes Made to GTE styled architecture:
- Removed the "roberta" prefix from all tensor names.
- Renamed "mixer" to "attention" in encoder layers.
- Converted "Wqkv" to "qkv_proj" for combined query, key, value projections.
- Renamed "mlp.fc1" to "mlp.up_proj" and "mlp.fc2" to "mlp.down_proj".
- Created "mlp.up_gate_proj" by duplicating and expanding "mlp.up_proj".
- Renamed "norm1" to "attn_ln" and "norm2" to "mlp_ln" in encoder layers.
- Changed "emb_ln" to "embeddings.LayerNorm".
- Renamed "weight" to "gamma" and "bias" to "beta" for layer normalization layers.
- Removed LoRA-related tensors.
Features:
- Structural Compatibility: The renamed model now closely matches the expected GTE architecture, allowing it to load without "tensor not found" errors.
- Preservation of Core Weights: Most of the original model's weights are preserved, maintaining some of the learned features.
- Adaptability: The script can handle various naming conventions and structures, making it somewhat flexible for future adjustments.
- Transparency: The script provides a clear view of the tensor names and shapes after conversion, aiding in debugging.
Limitations:
- Approximated Architecture: The conversion is an approximation of the GTE architecture, not an exact match. This may affect model performance.
- Loss of LoRA Adaptations: By removing LoRA-related tensors, we've lost the fine-tuning adaptations, potentially impacting the model's specialized capabilities.
- Up-Gate Projection Approximation: The "up_gate_proj" is created by duplicating weights, which may not accurately represent the intended GTE architecture.
- Potential Performance Impact: The structural changes, especially in the MLP layers, may affect the model's performance and output quality.
- Lack of Positional Embeddings Handling: We haven't specifically addressed positional embeddings, which might be different between XLM-RoBERTa and GTE models.
- Possible Missing Specialized Layers: There might be specialized layers or components in the GTE architecture that we haven't accounted for.
- No Guarantee of Functional Equivalence: While the model now loads, there's no guarantee it will function identically to a true GTE model.
- Config File Mismatch: We haven't addressed potential mismatches in the config.json file, which might cause issues during model initialization or inference.
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
HF Inference API was unable to determine this model's library.