Researcher:
Kara, Atakan

Loading...
Profile Picture
ORCID

Job Title

Undergraduate Student

First Name

Atakan

Last Name

Kara

Name

Name Variants

Kara, Atakan

Email Address

Birth Date

Search Results

Now showing 1 - 2 of 2
  • Placeholder
    Publication
    GECTurk: grammatical error correction and detection dataset for Turkish
    (Association for Computational Linguistics, 2023) Department of Computer Engineering; Department of Computer Engineering; Kara, Atakan; Sofian, Farrin Marouf; Yong-Xern Bond, Andrew; Şahin, Gözde Gül; College of Engineering; Graduate School of Sciences and Engineering
    Grammatical Error Detection and Correction (GEC) tools have proven useful for native speakers and second language learners. Developing such tools requires a large amount of parallel, annotated data, which is unavailable for most languages. Synthetic data generation is a common practice to overcome the scarcity of such data. However, it is not straightforward for morphologically rich languages like Turkish due to complex writing rules that require phonological, morphological, and syntactic information. In this work, we present a flexible and extensible synthetic data generation pipeline for Turkish covering more than 20 expert-curated grammar and spelling rules (a.k.a., writing rules) implemented through complex transformation functions. Using this pipeline, we derive 130,000 high-quality parallel sentences from professionally edited articles. Additionally, we create a more realistic test set by manually annotating a set of movie reviews. We implement three baselines formulating the task as i) neural machine translation, ii) sequence tagging, and iii) prefix tuning with a pretrained decoder-only model, achieving strong results. Furthermore, we perform exhaustive experiments on out-of-domain datasets to gain insights on the transferability and robustness of the proposed approaches. Our results suggest that our corpus, GECTurk, is high-quality and allows knowledge transfer for the out-of-domain setting. To encourage further research on Turkish GEC, we release our datasets, baseline models, and the synthetic data generation pipeline at https://github.com/GGLAB-KU/gecturk.
  • Placeholder
    Publication
    Beta poisoning attacks against machine learning models: extensions, limitations and defenses
    (IEEE, 2022) Department of Computer Engineering; Department of Computer Engineering; Kara, Atakan; Köprücü, Nursena; Gürsoy, Mehmet Emre; College of Engineering
    The rise of machine learning (ML) has made ML models lucrative targets for adversarial attacks. One of these attacks is Beta Poisoning, which is a recently proposed training-time attack based on heuristic poisoning of the training dataset. While Beta Poisoning was shown to be effective against linear ML models, it was originally developed with a fixed Gaussian Kernel Density Estimator (KDE) for likelihood estimation, and its effectiveness against more advanced, non-linear ML models has not been explored. In this paper, we advance the state of the art in Beta Poisoning attacks by making three novel contributions. First, we extend the attack so that it can be executed with arbitrary KDEs and norm functions. We integrate Gaussian, Laplacian, Epanechnikov and Logistic KDEs with three norm functions, and show that the choice of KDE can significantly impact attack effectiveness, especially when attacking linear models. Second, we empirically show that Beta Poisoning attacks are ineffective against non-linear ML models (such as neural networks and multi-layer perceptrons), even with our extensions. Results imply that the effectiveness of the attack decreases as model non-linearity and complexity increase. Finally, our third contribution is the development of a discriminator-based defense against Beta Poisoning attacks. Results show that our defense strategy achieves 99% and 93% accuracy in identifying poisoning samples on MNIST and CIFAR-10 datasets, respectively.