- tfi
Kim, SW., Gil, JM. Research paper classification systems based on TF-IDF and LDA schemes. Hum. Cent. Comput. Inf. Sci. 9, 30 (2019).
Bowman, S.R., Angeli, G., Potts, C., Manning, C.D.: A large annotated corpus for learning natural language inference. In: Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing (2015).
- Mul
Williams, A., Nangia, N., Bowman, S.R.: A broad-coverage challenge corpus for sentence understanding through inference. In: NAACL HLT 2018 - 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference (2018).
- Agia
Eneko Agirre, Mona Diab, Daniel Cer, and Aitor Gonzalez-Agirre. 2012. SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity. In Proceedings of the First Joint Conference on Lexical and Computational Semantics, SemEval '12, pages 385–393, Stroudsburg, PA, USA. Association for Computational Linguistics.
- Agib
Eneko Agirre, Daniel Cer, Mona Diab, Aitor GonzalezAgirre, and Weiwei Guo. 2013. *SEM 2013 shared task: Semantic Textual Similarity. In Second Joint Conference on Lexical and Computational Semantics (*SEM), pages 32–43, Atlanta, Georgia, USA. Association for Computational Linguistics.
- Agic
Eneko Agirre, Carmen Banea, Claire Cardie, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Weiwei Guo, Rada Mihalcea, German Rigau, and Janyce Wiebe. 2014. SemEval-2014 Task 10: Multilingual Semantic Textual Similarity. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pages 81–91, Dublin, Ireland. Association for Computational Linguistics.
- Agid
Eneko Agirre, Carmen Banea, Claire Cardie, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Weiwei Guo, Inigo Lopez-Gazpio, Montse Maritxalar, Rada Mihalcea, German Rigau, Larraitz Uria, and Janyce Wiebe. 2015. SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pages 252–263, Denver, Colorado. Association for Computational Linguistics.
- Agie
Eneko Agirre, Carmen Banea, Daniel M. Cer, Mona T. Diab, Aitor Gonzalez-Agirre, Rada Mihalcea, German Rigau, and Janyce Wiebe. 2016. SemEval2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation. In Proceedings of the 10th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2016, San Diego, CA, USA, June 16-17, 2016, pages 497–511.
- Cer
Daniel Cer, Mona Diab, Eneko Agirre, Iigo LopezGazpio, and Lucia Specia. 2017. SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pages 1–14, Vancouver, Canada.
- Mar
Marco Marelli, Stefano Menini, Marco Baroni, Luisa Bentivogli, Raffaella Bernardi, and Roberto Zamparelli. 2014. A SICK cure for the evaluation of compositional distributional semantic models. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), pages 216–223, Reykjavik, Iceland. European Language Resources Association (ELRA).
- Nun
Cordeiro, N., “NLP Applied To Portuguese Consumer Law,” Master’s thesis, Instituto Superior Tecnico, 2022 (to be published).
- AIL+15
Alexandr Andoni, Piotr Indyk, Thijs Laarhoven, Ilya Razenshteyn, and Ludwig Schmidt. Practical and optimal lsh for angular distance. 2015.
- AINguytitleenR
Alexandr Andoni, Piotr Indyk, Huy L. Nguy\title en, and Ilya Razenshteyn. Beyond locality-sensitive hashing. URL:, arXiv:, doi:10.1137/1.9781611973402.76.
- AR15
Alexandr Andoni and Ilya Razenshteyn. Optimal data-dependent hashing for approximate near neighbors. 2015. URL:, doi:10.1145/2746539.2746553.
Andrew Blair-Stanek, Nils Holzenberger, and Benjamin Van Durme. Tax Law NLP Resources. 2020. URL:, doi:10.7281/T1/N1X6I4.
- BWG+20
Łukasz Borchmann, Dawid Wisniewski, Andrzej Gretkowski, Izabela Kosmala, Dawid Jurkiewicz, Łukasz Szałkiewicz, Gabriela Pałka, Karol Kaczmarek, Agnieszka Kaliska, and Filip Graliński. Contract discovery: dataset and a few-shot semantic retrieval challenge with competitive baselines. November 2020. URL:
- CGG+21
Fredrik Carlsson, Amaru Cuba Gyllensten, Evangelia Gogoulou, Erik Ylipää Hellqvist, and Magnus Sahlgren. Semantic re-tuning with contrastive tension. In International Conference on Learning Representations. 2021. URL:
- DC19
Zhuyun Dai and Jamie Callan. Deeper text understanding for ir with contextual neural language modeling. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'19, 985–988. New York, NY, USA, 2019. Association for Computing Machinery. URL:, doi:10.1145/3331184.3331303.
- DCLT19
J Devlin, M Chang, K Lee, and K Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 2019.
- ER11
Günes Erkan and Dragomir R. Radev. Lexrank: graph-based lexical centrality as salience in text summarization. CoRR, 2011. URL:, arXiv:1109.2128.
- FSCA16
E Fonseca, L Santos, Marcelo Criscuolo, and S Aluisio. Assin: avaliacao de similaridade semantica e inferencia textual. In Computational Processing of the Portuguese Language-12th International Conference, Tomar, Portugal, 13–15. 2016.
- GYC21
Tianyu Gao, Xingcheng Yao, and Danqi Chen. Simcse: simple contrastive learning of sentence embeddings. 2021. URL:, doi:10.48550/ARXIV.2104.08821.
- GMS+20
Suchin Gururangan, Ana Marasović, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, and Noah A. Smith. Don't stop pretraining: adapt language models to domains and tasks. In Proceedings of ACL. 2020.
- HBCB21
Dan Hendrycks, Collin Burns, Anya Chen, and Spencer Ball. Cuad: an expert-annotated nlp dataset for legal contract review. 2021.
- HLT+21
Zihan Huang, Charles Low, Mengqiu Teng, Hongyi Zhang, Daniel E. Ho, Mark S. Krass, and Matthias Grabmair. Context-aware legal citation recommendation using deep learning. 2021. URL:, doi:10.1145/3462757.3466066.
- KB15
Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In Yoshua Bengio and Yann LeCun, editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings. 2015. URL:
- K+05
Philipp Koehn and others. Europarl: a parallel corpus for statistical machine translation. 2005.
- LZH+20
Bohan Li, Hao Zhou, Junxian He, Mingxuan Wang, Yiming Yang, and Lei Li. On the sentence embeddings from pre-trained language models. CoRR, 2020. URL:, arXiv:2011.05864.
- LPC+18
Marco Lippi, Przemyslaw Palka, Giuseppe Contissa, Francesca Lagioia, Hans-Wolfgang Micklitz, Giovanni Sartor, and Paolo Torroni. CLAUDETTE: an automated detector of potentially unfair clauses in online terms of service. 2018. URL:, arXiv:1805.01217.
- May21
Philip May. Machine translated multilingual sts benchmark dataset. 2021. URL:
- MCCD13
Tomás Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. In Yoshua Bengio and Yann LeCun, editors, 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings. 2013. URL:
- PSM14
Jeffrey Pennington, Richard Socher, and Christopher Manning. Glove: global vectors for word representation. 01 2014. doi:10.3115/v1/D14-1162.
- RFO20
Livy Real, Erick Fonseca, and Hugo Goncalo Oliveira. The assin 2 shared task: a quick overview. In International Conference on Computational Processing of the Portuguese Language, 406–412. Springer, 2020.
- RG19
Nils Reimers and Iryna Gurevych. Sentence-bert: sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 11 2019.
- RG20
Nils Reimers and Iryna Gurevych. Making monolingual sentence embeddings multilingual using knowledge distillation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2020.
- RZ09
Stephen Robertson and Hugo Zaragoza. 2009.
- Ron16
X Rong. word2vec Parameter Learning Explained. 2016.
- RLLT21
Federico Ruggeri, Francesca Lagioia, Marco Lippi, and Paolo Torroni. Detecting and explaining unfairness in consumer contracts through memory networks. 2021.
- SNL19
Fábio Souza, Rodrigo Nogueira, and Roberto Lotufo. Portuguese named entity recognition using bert-crf. arXiv preprint arXiv:1909.10649, 2019. URL:
- SNL20
Fábio Souza, Rodrigo Nogueira, and Roberto Lotufo. Bertimbau: pretrained bert models for brazilian portuguese. In Ricardo Cerri and Ronaldo C. Prati, editors, Intelligent Systems, 403–417. Cham, 2020. Springer International Publishing.
- TRRuckle+21
Nandan Thakur, Nils Reimers, Andreas Rücklé, Abhishek Srivastava, and Iryna Gurevych. BEIR: A heterogenous benchmark for zero-shot evaluation of information retrieval models. CoRR, 2021. URL:, arXiv:2104.08663.
- VSP+17
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL:
- VGO+20
Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, Stéfan J. van der Walt, Matthew Brett, Joshua Wilson, K. Jarrod Millman, Nikolay Mayorov, Andrew R. J. Nelson, Eric Jones, Robert Kern, Eric Larson, C J Carey, İlhan Polat, Yu Feng, Eric W. Moore, Jake VanderPlas, Denis Laxalde, Josef Perktold, Robert Cimrman, Ian Henriksen, E. A. Quintero, Charles R. Harris, Anne M. Archibald, Antônio H. Ribeiro, Fabian Pedregosa, Paul van Mulbregt, and SciPy 1.0 Contributors. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17:261–272, 2020. doi:10.1038/s41592-019-0686-2.
- WWIV18
Jorge Wagner, Rodrigo Wilkens, Marco Idiart, and Aline Villavicencio. The brwac corpus: a new open resource for brazilian portuguese. 05 2018.
- WRG21
Kexin Wang, Nils Reimers, and Iryna Gurevych. Tsdae: using transformer-based sequential denoising auto-encoder for unsupervised sentence embedding learning. 2021. URL:, doi:10.48550/ARXIV.2104.06979.
- WTRG21
Kexin Wang, Nandan Thakur, Nils Reimers, and Iryna Gurevych. GPL: generative pseudo labeling for unsupervised domain adaptation of dense retrieval. CoRR, 2021. URL:, arXiv:2112.07577.
- InternationalJoITaEEIJITEEVD19
International Journal of Innovative Technology and Exploring Engineering (IJITEE), V Vaissnave, and P Deepalakshmi. An Artificial Intelligence based Analysis in Legal domain. 2019.
- MinisterioPPortugal
Ministério Público Portugal. Supremo Tribunal de Justiça.