Publications with Fulltext

Permanent URI for this collectionhttps://hdl.handle.net/20.500.14288/6

Browse

Search Results

Now showing 1 - 8 of 8
  • Thumbnail Image
    PublicationOpen Access
    Discovering Black Lives Matter events in the United States: Shared Task 3, CASE 2021
    (Association for Computational Linguistics (ACL), 2021) Giorgi, Salvatore; Zavarella, Vanni; Tanev, Hristo; Stefanovitch, Nicolas; Hwang, Sy; Hettiarachchi, Hansi; Ranasinghe, Tharindu; Kalyan, Vivek; Tan, Paul; Tan, Shaun; Andrews, Martin; Hu, Tiancheng; Stoehr, Niklas; Re, Francesco Ignazio; Vegh, Daniel; Atzenhofer, Dennis; Curtis, Brenda; Department of Sociology; Hürriyetoğlu, Ali; Teaching Faculty; Department of Sociology; College of Social Sciences and Humanities
    Evaluating the state-of-the-art event detection systems on determining spatio-temporal distribution of the events on the ground is performed unfrequently. But, the ability to both (1) extract events ""in the wild"" from text and (2) properly evaluate event detection systems has potential to support a wide variety of tasks such as monitoring the activity of socio-political movements, examining media coverage and public support of these movements, and informing policy decisions. Therefore, we study performance of the best event detection systems on detecting Black Lives Matter (BLM) events from tweets and news articles. The murder of George Floyd, an unarmed Black man, at the hands of police officers received global attention throughout the second half of 2020. Protests against police violence emerged worldwide and the BLM movement, which was once mostly regulated to the United States, was now seeing activity globally. This shared task asks participants to identify BLM related events from large unstructured data sources, using systems pretrained to extract socio-political events from text. We evaluate several metrics, assessing each system's ability to evolution of protest events both temporally and spatially. Results show that identifying daily protest counts is an easier task than classifying spatial and temporal protest trends simultaneously, with maximum performance of 0.745 (Spearman) and 0.210 (Pearson r), respectively. Additionally, all baselines and participant systems suffered from low recall (max.5.08), confirming the high impact of media sourcing in the modelling of protest movements.
  • Thumbnail Image
    PublicationOpen Access
    3D microprinting of iron platinum nanoparticle-based magnetic mobile microrobots
    (Wiley, 2021) Giltinan, Joshua; Sridhar, Varun; Bozüyük, Uğur; Sheehan, Devin; Department of Mechanical Engineering; Sitti, Metin; Faculty Member; Department of Mechanical Engineering; School of Medicine; College of Engineering; 297104
    Wireless magnetic microrobots are envisioned to revolutionize minimally invasive medicine. While many promising medical magnetic microrobots are proposed, the ones using hard magnetic materials are not mostly biocompatible, and the ones using biocompatible soft magnetic nanoparticles are magnetically very weak and, therefore, difficult to actuate. Thus, biocompatible hard magnetic micro/nanomaterials are essential toward easy-to-actuate and clinically viable 3D medical microrobots. To fill such crucial gap, this study proposes ferromagnetic and biocompatible iron platinum (FePt) nanoparticle-based 3D microprinting of microrobots using the two-photon polymerization technique. A modified one-pot synthesis method is presented for producing FePt nanoparticles in large volumes and 3D printing of helical microswimmers made from biocompatible trimethylolpropane ethoxylate triacrylate (PETA) polymer with embedded FePt nanoparticles. The 30 mu m long helical magnetic microswimmers are able to swim at speeds of over five body lengths per second at 200Hz, making them the fastest helical swimmer in the tens of micrometer length scale at the corresponding low-magnitude actuation fields of 5-10mT. It is also experimentally in vitro verified that the synthesized FePt nanoparticles are biocompatible. Thus, such 3D-printed microrobots are biocompatible and easy to actuate toward creating clinically viable future medical microrobots.
  • Thumbnail Image
    PublicationOpen Access
    PROTEST-ER: retraining BERT for protest event extraction
    (Association for Computational Linguistics (ACL), 2021) Caselli, Tommaso; Basile, Angelo; Department of Sociology; Department of Computer Engineering; Hürriyetoğlu, Ali; Mutlu, Osman; Teaching Faculty; Researcher; Department of Sociology; Department of Computer Engineering; College of Social Sciences and Humanities; College of Engineering
    We analyze the effect of further pre-training BERT with different domain specific data as an unsupervised domain adaptation strategy for event extraction. Portability of event extraction models is particularly challenging, with large performance drops affecting data on the same text genres (e.g., news). We present PROTEST-ER, a retrained BERT model for protest event extraction. PROTEST-ER outperforms a corresponding generic BERT on out-of-domain data of 8.1 points. Our best performing models reach 51.91-46.39 F1 across both domains.
  • Thumbnail Image
    PublicationOpen Access
    Federated dropout learning for hybrid beamforming with spatial path index modulation in multi-user MMWave-MIMO systems
    (Institute of Electrical and Electronics Engineers (IEEE), 2021) Mishra, Kumar Vijay; Department of Electrical and Electronics Engineering; Ergen, Sinem Çöleri; Elbir, Ahmet Musab; Faculty Member; Department of Electrical and Electronics Engineering; College of Engineering; 7211; N/A
    Millimeter wave multiple-input multiple-output (mmWave-MIMO) systems with small number of radio-frequency (RF) chains have limited multiplexing gain. Spatial path index modulation (SPIM) is helpful in improving this gain by utilizing additional signal bits modulated by the indices of spatial paths. In this paper, we introduce model-based and model-free frameworks for beamformer design in multi-user SPIM-MIMO systems. We first design the beamformers via model-based manifold optimization algorithm. Then, we leverage federated learning (FL) with dropout learning (DL) to train a learning model on the local dataset of users, who estimate the beamformers by feeding the model with their channel data. The DL randomly selects different set of model parameters during training, thereby further reducing the transmission overhead compared to conventional FL. Numerical experiments show that the proposed framework exhibits higher spectral efficiency than the state-of-the-art SPIM-MIMO methods and mmWave-MIMO, which relies on the strongest propagation path. Furthermore, the proposed FL approach provides at least 10 times lower transmission overhead than the centralized learning techniques.
  • Thumbnail Image
    PublicationOpen Access
    Multilingual protest news detection - shared task 1, CASE 2021
    (Association for Computational Linguistics (ACL), 2021) Liza, Farhana Ferdousi; Kumar, Ritesh; Ratan, Shyam; Department of Sociology; Department of Computer Engineering; Hürriyetoğlu, Ali; Yörük, Erdem; Mutlu, Osman; Teaching Faculty; Faculty Member; Researcher; Department of Sociology; Department of Computer Engineering; College of Social Sciences and Humanities; Graduate School of Sciences and Engineering; N/A; 28982; N/A
    Benchmarking state-of-the-art text classification and information extraction systems in multilingual, cross-lingual, few-shot, and zero-shot settings for socio-political event information collection is achieved in the scope of the shared task Socio-political and Crisis Events Detection at the workshop CASE @ ACL-IJCNLP 2021. Socio-political event data is utilized for national and international policy- and decision-making. Therefore, the reliability and validity of such datasets are of utmost importance. We split the shared task into three parts to address the three aspects of data collection (Task 1), fine-grained semantic classification (Task 2), and evaluation (Task 3). Task 1, which is the focus of this report, is on multilingual protest news detection and comprises four subtasks that are document classification (subtask 1), sentence classification (subtask 2), event sentence coreference identification (sub-task 3), and event extraction (subtask 4). All subtasks have English, Portuguese, and Spanish for both training and evaluation data. Data in Hindi language is available only for the evaluation of subtask 1. The majority of the submissions, which are 238 in total, are created using multi- and cross-lingual approaches. Best scores are between 77.27 and 84.55 F1-macro for subtask 1, between 85.32 and 88.61 F1-macro for subtask 2, between 84.23 and 93.03 CoNLL 2012 average score for subtask 3, and between 66.20 and 78.11 F1-macro for subtask 4 in all evaluation settings. The performance of the best system for subtask 4 is above 66.20 F1 for all available languages. Although there is still a significant room for improvement in cross-lingual and zero-shot settings, the best submissions for each evaluation scenario yield remarkable results. Monolingual models outperformed the multilingual models in a few evaluation scenarios, in which there is relatively much training data.
  • Thumbnail Image
    PublicationOpen Access
    Classification of imbalanced data with a geometric digraph family
    (Journal of Machine Learning Research (JMLR), 2016) Department of Mathematics; Manukyan, Artur; Ceyhan, Elvan; PhD Student; Undergraduate Student; Faculty Member; Department of Mathematics; Graduate School of Sciences and Engineering; College of Sciences
    We use a geometric digraph family called class cover catch digraphs (CCCDs) to tackle the class imbalance problem in statistical classification. CCCDs provide graph theoretic solutions to the class cover problem and have been employed in classification. We assess the classification performance of CCCD classifiers by extensive Monte Carlo simulations, comparing them with other classifiers commonly used in the literature. In particular, we show that CCCD classifiers perform relatively well when one class is more frequent than the other in a two-class setting, an example of the cl ass imbalance problem. We also point out the relationship between class imbalance and class overlapping problems, and their influence on the performance of CCCD classifiers and other classification methods as well as some state-of-the-art algorithms which are robust to class imbalance by construction. Experiments on both simulated and real data sets indicate that CCCD classifiers are robust to the class imbalance problem. CCCDs substantially undersample from the majority class while preserving the information on the discarded points during the undersampling process. Many state-of-the-art methods, however, keep this information by means of ensemble classifiers, but CCCDs yield only a single classifier with the same property, making it both appealing and fast.
  • Thumbnail Image
    PublicationOpen Access
    Challenges and applications of automated extraction of socio-political events from text (CASE 2021): workshop and shared task report
    (Association for Computational Linguistics (ACL), 2021) Tanev, Hristo; Zavarella, Vanni; Piskorski, Jakub; Yeniterzi, Reyyan; Villavicencio, Aline; Department of Sociology; Department of Computer Engineering; Hürriyetoğlu, Ali; Yörük, Erdem; Mutlu, Osman; Yüret, Deniz; Teaching Faculty; Faculty Member; Researcher; Faculty Member; Department of Sociology; Department of Computer Engineering; College of Social Sciences and Humanities; Graduate School of Sciences and Engineering; College of Engineering; N/A; 28982; N/A; 179996
    This workshop is the fourth issue of a series of workshops on automatic extraction of sociopolitical events from news, organized by the Emerging Market Welfare Project, with the support of the Joint Research Centre of the European Commission and with contributions from many other prominent scholars in this field. The purpose of this series of workshops is to foster research and development of reliable, valid, robust, and practical solutions for automatically detecting descriptions of sociopolitical events, such as protests, riots, wars and armed conflicts, in text streams. This year workshop contributors make use of the state-of-the-art NLP technologies, such as Deep Learning, Word Embeddings and Transformers and cover a wide range of topics from text classification to news bias detection. Around 40 teams have registered and 15 teams contributed to three tasks that are i) multilingual protest news detection, ii) fine-grained classification of socio-political events, and iii) discovering Black Lives Matter protest events. The workshop also highlights two keynote and four invited talks about various aspects of creating event data sets and multi- and cross-lingual machine learning in few- and zero-shot settings.
  • Thumbnail Image
    PublicationOpen Access
    MSVD-Turkish: a comprehensive multimodal video dataset for integrated vision and language research in Turkish
    (Springer, 2021) Çıtamak, Begüm; Çağlayan, Ozan; Kuyu, Menekşe; Erdem, Erkut; Madhyastha, Pranava; Specia, Lucia; Department of Computer Engineering; Erdem, Aykut; Faculty Member; Department of Computer Engineering; College of Engineering; 20331
    Automatic generation of video descriptions in natural language, also called video captioning, aims to understand the visual content of the video and produce a natural language sentence depicting the objects and actions in the scene. This challenging integrated vision and language problem, however, has been predominantly addressed for English. The lack of data and the linguistic properties of other languages limit the success of existing approaches for such languages. In this paper we target Turkish, a morphologically rich and agglutinative language that has very different properties compared to English. To do so, we create the first large-scale video captioning dataset for this language by carefully translating the English descriptions of the videos in the MSVD (Microsoft Research Video Description Corpus) dataset into Turkish. In addition to enabling research in video captioning in Turkish, the parallel English-Turkish descriptions also enable the study of the role of video context in (multimodal) machine translation. In our experiments, we build models for both video captioning and multimodal machine translation and investigate the effect of different word segmentation approaches and different neural architectures to better address the properties of Turkish. We hope that the MSVD-Turkish dataset and the results reported in this work will lead to better video captioning and multimodal machine translation models for Turkish and other morphology rich and agglutinative languages.