Current Projects

  • Semantic Search Engine on Law Domain.
  • Aim of the project is developing a search engine on specific to the law domain with the help of natural language processing techniques, text mining techniques, big data solutions and machine learning algorithms. Also, develop a matching algorithm specific to the field of law that works faster and with higher performance, integrate it into our search engine interface and make it available to lawyers.

  • Q&A Systems in Law
  • Despite the rapidly increasing crime and lawsuit rates in Turkey, very few people know their own rights and laws. New crimes, new cases and new legislation are emerging every day. It is very difficult not only for citizens but also for people working in the field of law to follow all these. Our goal is to make laws and litigation outcomes more accessible and understandable to everyone. That's why we want to create a question and answer system (Q&A) that will give the most accurate answers to the questions asked by users in the field of law.

    Previous Projects

  • Language Modeling in Turkish Legal Corpus: Improving Model Performance with Domain Classification by Using Recurrent Neural Networks
  • In this study, a new method called Domain Classification and a natural language generator system in which this method is applied have been developed in order to increase the model performance in natural language processing studies conducted in the corpus of Turkish legal texts. In short, the new method developed states that the performance of a deep learning model trained in the field of law will be higher when it is trained on a sub-field based special dataset classified according to legal disciplines. To be able to test the method during the development process of the natural language generator is designed with an architecture using Recurrent Neural Networks, which can work as a hybrid, capable of being trained and working even on low-equipped devices by using interdisciplinary study. In addition, the texts produced in different fields of Turkish law by the natural language generator system developed in this study were examined, and it was discussed in which areas the developed Domain Classification method and the natural language generator could benefit the lawyers and the judicial system in general.

  • Biomedical Named Entity Recognition Using Transformers with biLSTM + CRF and Graph Convolutional Neural Networks
  • One of the applications of Natural Language Processing (NLP) is to process free text data for extracting information. Information extraction has various forms like Named Entity Recognition (NER) for detecting the named entities in the free text. Biomedical named-entity extraction task is about extracting named entities like drugs, diseases, organs, etc. from texts in medical domain. In our study, we improve commonly used models in this domain, such as biLSTM+CRF model, using transformer based language models like BERT and its domain-specific variant BioBERT in the embedding layer. We conduct several experiments on several different benchmark biomedical datasets using a variety of combination of models and embeddings such as BioBERT+biLSTM+CRF, BERT+biLSTM+CRF, Fasttext+biLSTM+CRF, and Graph Convolutional Networks. Our results show a quite visible, 4% to 13%, improvements when baseline biLSTM+CRF model is initialized with pretrained language models such as BERT and especially with domain specific one like BioBERT on several datasets.

    Project link
  • Large Scale Supervised Learning for Unstructured Big Data Analytics
  • Deep learning Algorithms for Supervised Word Embeddings
  • Social Media Bot Detection using Big Data Analytics
  • A Deep Learning based Word Embedding Framework for Mining of Turkish Documents
  • A Preprocessing Framework for Twitter Bot Detection using NoSql DBs and Python based Data Engineering Tools
  • A Data Collection and Preprocessing Framework for Opinion Mining and Sentiment Analysis using NoSql DBs and Python based Data Engineering Tools
  • Concept based aspect aware Sentiment Analysis using NoSql DBs and Python based Data Engineering Tools or Machine Learning Libraries for Big Data (Apache Spark / MLLib / Mahout)
  • University Student Profiling and Comparison from Twitter using NoSql DBs and Python based Data Engineering Tools
  • Class Based Semantics for Supervised Word Sense Disambiguation (WSD)
  • Opinion Leader Detection on Social Networks
  • Semantic Supervised and Unsupervised Term Weighting Metrics
  • Semi-Supervised Semantic Text Classification Algorithms (Random Walk Algorithms, Manifold Regularization)
  • Concept-level Analysis of Natural Language (bag-of-concepts based language processing systems)
  • Information Extraction, specifically Named Entity Recognition (NER) algorithms for Highly Noisy and Short Turkish Texts (Tweets)
  • HR Analytics using NoSql DBs and Python based Data Engineering Tools or Machine Learning Libraries for Big Data (Apache Spark / MLLib / Mahout)
  • TÜBİTAK 3501 Career Award, 111E239, Development of Semantic Semi-Supervised Algorithms for Textual Data Mining (Successfully completed with several SCI Journal and International Conference publications)
    Murat Can Ganiz (Principle Investigator) , Berna Altınel (Research Assistant, PhD student)
  • Using Domain Knowledge for Improving Text Mining Algorithms
    Mithat Poyraz , Burak Görener , Murat Diker
  • Semantic Smoothing Algorithms for Text Mining
    Dilara Torunoğlu (COME MS), Abdülkerim Canbay (COME), Hamdi Atacan Oğul (COME)
  • Smoothing Methods for Bayesian Algorithms
    Zeynep Hilal Kilimci (COME MS), Işıl Çoşkun (ISE-graduated with honors)
  • Semi-supervised Learning Algorithms for Text Mining
    İsmail Murat Engün (ISE-graduated with honors), Süleyman Kaan Yeloğlu (ISE), Abdülhadi Çelenlioğlu (ISE)
  • Concept Ranking Algorithms for Natural Language Processing
    Mithat Poyraz , Çağla Şahinli
  • Abnormal Event Detection on BGP Traffic
    Erkan Köşlük (ISE-graduated), Inigo Ortiz de Urbina (COME Erasmus)
  • Intelligent Focused Web Crawler Project
    Mithat Poyraz (COME MS), Duygu Taylan (COME-graduated with honors)
  • Preprocessing Methods for Turkish Text Classification
  • Spectral Algorithms for Document Clustering