Achievements
7.1. Achievements#
This section is still under construction.
We were able to create initial datasets versions containing pre-processed text that can be utilised for future tasks, along with the necessary scripts to recreate the results. Along the work, we were able to create multiple language models that can be fine-tuned for multiple tasks, such as Question-Answering. BERTimbau developed variants were completely trained in Portuguese and were adapted to the Portuguese legal domain. Both Legal-BERTimbau-base and Legal-BERTimbau-large perform better on Portuguese court documents than BERTimbau.
Our SBERTimbau variants represent the first publicly available Portuguese SBERTs fine-tuned for STS. On the same note, all the developed SBERTimbau variants, as mentioned previously, are also adapted to the Portuguese legal domain, since they derive from BERTimbau versions. Legal-SBERTimbau-sts-large-v2 and Legal-SBERTimbau-sts-large-ma surpass state-of-the-art multilingual language models on assin and assin2 datasets, which were annotated by Portuguese and Brazilian researchers.