Natural Language Inference
5.4. Natural Language Inference#
According to the SBERT paper, they report a slight STS performance improvement when the models were subjected to NLI data. Previously used assin and assin2 datasets also contain relatedness information on each sentence pair. Similarly to SNLI, it contains a label feature indicating if a sentence entails the other (0), if they have no apparent relationship between them (1) or if they contradict each other (2).
We trained the large models on assin and assin2 NLI information with an 8 batch size for five epochs with a learning rate of 1e − 5. For the base algorithm, we used a learning rate of 4e − 5 using the Adamoptimization algorithm and a batch size of 32 for ten epochs.
This type of fine-tuning, generated two different SBERT variants:
?
?