Projects
Current Projects
Aim of the project is developing a search engine on specific to the law domain with the help of natural language processing techniques, text mining techniques, big data solutions and machine learning algorithms. Also, develop a matching algorithm specific to the field of law that works faster and with higher performance, integrate it into our search engine interface and make it available to lawyers. |
Despite the rapidly increasing crime and lawsuit rates in Turkey, very few people know their own rights and laws. New crimes, new cases and new legislation are emerging every day. It is very difficult not only for citizens but also for people working in the field of law to follow all these. Our goal is to make laws and litigation outcomes more accessible and understandable to everyone. That's why we want to create a question and answer system (Q&A) that will give the most accurate answers to the questions asked by users in the field of law. |
Previous Projects
In this study, a new method called Domain Classification and a natural language generator system in which this method is applied have been developed in order to increase the model performance in natural language processing studies conducted in the corpus of Turkish legal texts. In short, the new method developed states that the performance of a deep learning model trained in the field of law will be higher when it is trained on a sub-field based special dataset classified according to legal disciplines. To be able to test the method during the development process of the natural language generator is designed with an architecture using Recurrent Neural Networks, which can work as a hybrid, capable of being trained and working even on low-equipped devices by using interdisciplinary study. In addition, the texts produced in different fields of Turkish law by the natural language generator system developed in this study were examined, and it was discussed in which areas the developed Domain Classification method and the natural language generator could benefit the lawyers and the judicial system in general. |
One of the applications of Natural Language Processing (NLP) is to process free text data for extracting information. Information extraction has various forms like Named Entity Recognition (NER) for detecting the named entities in the free text. Biomedical named-entity extraction task is about extracting named entities like drugs, diseases, organs, etc. from texts in medical domain. In our study, we improve commonly used models in this domain, such as biLSTM+CRF model, using transformer based language models like BERT and its domain-specific variant BioBERT in the embedding layer. We conduct several experiments on several different benchmark biomedical datasets using a variety of combination of models and embeddings such as BioBERT+biLSTM+CRF, BERT+biLSTM+CRF, Fasttext+biLSTM+CRF, and Graph Convolutional Networks. Our results show a quite visible, 4% to 13%, improvements when baseline biLSTM+CRF model is initialized with pretrained language models such as BERT and especially with domain specific one like BioBERT on several datasets. Project link |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Murat Can Ganiz (Principle Investigator) , Berna Altınel (Research Assistant, PhD student) |
Mithat Poyraz , Burak Görener , Murat Diker |
Dilara Torunoğlu (COME MS), Abdülkerim Canbay (COME), Hamdi Atacan Oğul (COME) |
Zeynep Hilal Kilimci (COME MS), Işıl Çoşkun (ISE-graduated with honors) |
İsmail Murat Engün (ISE-graduated with honors), Süleyman Kaan Yeloğlu (ISE), Abdülhadi Çelenlioğlu (ISE) |
Mithat Poyraz , Çağla Şahinli |
Erkan Köşlük (ISE-graduated), Inigo Ortiz de Urbina (COME Erasmus) |
Mithat Poyraz (COME MS), Duygu Taylan (COME-graduated with honors) |
|
|