---
language:
- en
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:557850
- loss:DenoisingAutoEncoderLoss
base_model: google-bert/bert-base-cased
widget:
- source_sentence: A man his
sentences:
- A construction worker peeking out of a manhole while his coworker sits on the
sidewalk smiling.
- A man is jumping unto his filthy bed.
- A man is sitting in a chair and looking at something that he is holding.
- source_sentence: A and a woman walking with a a
sentences:
- A man and a woman is walking with a dog across a beach
- A baby smiles while swinging in a blue infant swing.
- A man uses a projector to give a presentation.
- source_sentence: blue
sentences:
- A baby wearing a bib makes a funny face at the camera.
- The man is wearing a blue shirt.
- There are three policemen on bikes making sure that the streets are cleared for
the president.
- source_sentence: Two boys and
sentences:
- Two boys sitting and eating ice cream.
- A man with a hat, boots, and brown pants, is playing the violin outside in front
of a black structure.
- A man is a safety suit walking outside while another man in a dark suit walks
into a building.
- source_sentence: A finds humorous that.
sentences:
- A older gentleman finds it humorous that he is getting his picture taken while
doing his laundry.
- A dark-skinned man smoking a cigarette near a green trashcan.
- A woman walks on a sidewalk wearing a white dress with a blue plaid pattern.
datasets:
- sentence-transformers/all-nli
pipeline_tag: sentence-similarity
library_name: sentence-transformers
---
# SentenceTransformer based on google-bert/bert-base-cased
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [google-bert/bert-base-cased](https://huggingface.co/google-bert/bert-base-cased) on the [all-nli](https://huggingface.co/datasets/sentence-transformers/all-nli) dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [google-bert/bert-base-cased](https://huggingface.co/google-bert/bert-base-cased)
- **Maximum Sequence Length:** 512 tokens
- **Output Dimensionality:** 768 dimensions
- **Similarity Function:** Cosine Similarity
- **Training Dataset:**
- [all-nli](https://huggingface.co/datasets/sentence-transformers/all-nli)
- **Language:** en
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
### Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
```
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("jinoooooooooo/bert-base-cased-nli-tsdae")
# Run inference
sentences = [
'A finds humorous that.',
'A older gentleman finds it humorous that he is getting his picture taken while doing his laundry.',
'A woman walks on a sidewalk wearing a white dress with a blue plaid pattern.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```
## Training Details
### Training Dataset
#### all-nli
* Dataset: [all-nli](https://huggingface.co/datasets/sentence-transformers/all-nli) at [d482672](https://huggingface.co/datasets/sentence-transformers/all-nli/tree/d482672c8e74ce18da116f430137434ba2e52fab)
* Size: 557,850 training samples
* Columns: damaged
and original
* Approximate statistics based on the first 1000 samples:
| | damaged | original |
|:--------|:---------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
| type | string | string |
| details |
a horse jumps a
| A person on a horse jumps over a broken down airplane.
|
| at
| Children smiling and waving at camera
|
| boy jumping a.
| A boy is jumping on skateboard in the middle of a red bridge.
|
* Loss: [DenoisingAutoEncoderLoss
](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#denoisingautoencoderloss)
### Evaluation Dataset
#### all-nli
* Dataset: [all-nli](https://huggingface.co/datasets/sentence-transformers/all-nli) at [d482672](https://huggingface.co/datasets/sentence-transformers/all-nli/tree/d482672c8e74ce18da116f430137434ba2e52fab)
* Size: 6,584 evaluation samples
* Columns: damaged
and original
* Approximate statistics based on the first 1000 samples:
| | damaged | original |
|:--------|:---------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
| type | string | string |
| details | Two while packages.
| Two women are embracing while holding to go packages.
|
| young children, with the number one with 2 are standing wooden in a bathroom in sink.
| Two young children in blue jerseys, one with the number 9 and one with the number 2 are standing on wooden steps in a bathroom and washing their hands in a sink.
|
| A a during world city of
| A man selling donuts to a customer during a world exhibition event held in the city of Angeles
|
* Loss: [DenoisingAutoEncoderLoss
](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#denoisingautoencoderloss)
### Training Hyperparameters
#### Non-Default Hyperparameters
- `eval_strategy`: steps
- `per_device_train_batch_size`: 16
- `per_device_eval_batch_size`: 16
- `learning_rate`: 2e-05
- `num_train_epochs`: 1
- `warmup_ratio`: 0.1
- `fp16`: True
#### All Hyperparameters