Research Outputs

Permanent URI for this communityhttps://hdl.handle.net/20.500.14288/2

Browse

Search Results

Now showing 1 - 8 of 8
  • Thumbnail Image
    PublicationOpen Access
    A gated fusion network for dynamic saliency prediction
    (Institute of Electrical and Electronics Engineers (IEEE), 2022) Kocak, Aysun; Erdem, Erkut; Department of Computer Engineering; Erdem, Aykut; Faculty Member; Department of Computer Engineering; College of Engineering; 20331
    Predicting saliency in videos is a challenging problem due to complex modeling of interactions between spatial and temporal information, especially when ever-changing, dynamic nature of videos is considered. Recently, researchers have proposed large-scale data sets and models that take advantage of deep learning as a way to understand what is important for video saliency. These approaches, however, learn to combine spatial and temporal features in a static manner and do not adapt themselves much to the changes in the video content. In this article, we introduce the gated fusion network for dynamic saliency (GFSalNet), the first deep saliency model capable of making predictions in a dynamic way via the gated fusion mechanism. Moreover, our model also exploits spatial and channelwise attention within a multiscale architecture that further allows for highly accurate predictions. We evaluate the proposed approach on a number of data sets, and our experimental analysis demonstrates that it outperforms or is highly competitive with the state of the art. Importantly, we show that it has a good generalization ability, and moreover, exploits temporal information more effectively via its adaptive fusion scheme.
  • Placeholder
    Publication
    Communicative cues for reach-to-grasp motions: From humans to robots
    (Assoc Computing Machinery, 2018) N/A; N/A; N/A; Department of Computer Engineering; Department of Computer Engineering; Kebüde, Doğancan; Eteke, Cem; Sezgin, Tevfik Metin; Akgün, Barış; Master Student; PhD Student; Faculty Member; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; Graduate School of Sciences and Engineering; College of Engineering; College of Engineering; N/A; N/A; 42946; 258784
    Intent communication is an important challenge in the context of human-robot interaction. The aim of this work is to identify subtle non-verbal cues that make communication among humans fluent and use them to generate intent expressive robot motion. A human human reach-to-grasp experiment (n = 14) identified two temporal and two spatial cues: (1) relative time to reach maximum hand aperture (MA), (2) overall motion duration (OT), (3) exaggeration in motion (Exg), and (4) change in grasp modality (GM). Results showed there was statistically significant difference in the temporal cues between no-intention and intention conditions. In a follow-up experiment (n = 30), reach-to-grasp motions of a simulated robot containing different cue combinations were shown to the participants. They were asked to guess the target object during robot's motion, based on the assumption that intent expressive motion would result in earlier and more accurate guesses. Results showed that, OT, GM and several cue combinations led to faster and more accurate guesses which imply they can be used to generate communicative motion. However, MA had no effect, and surprisingly Exg had a negative effect on expressiveness.
  • Placeholder
    Publication
    Communicative cues for reach-to-grasp motions: from humans to robots: robotics track
    (International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS), 2018) N/A; N/A; N/A; Department of Computer Engineering; Department of Computer Engineering; Kebüde, Doğancan; Eteke, Cem; Sezgin, Tevfik Metin; Akgün, Barış; Master Student; Master Student; Faculty Member; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; Graduate School of Sciences and Engineering; College of Engineering; College of Engineering; N/A; N/A; 18632; 258784
    Intent communication is an important challenge in the context of human-robot interaction. The aim of this work is to identify subtle non-verbal cues that make communication among humans fluent and use them to generate intent expressive robot motion. A human- human reach-to-grasp experiment (n = 14) identified two temporal and two spatial cues: (1) relative time to reach maximum hand aperture (AM), (2) overall motion duration (07), (3) exaggeration in motion (Exg), and (4) change in grasp modality (GM). Results showed there was statistically significant difference in the temporal cues between no-intention and intention conditions. In a follow-up experiment (n = 30), reach-to-grasp motions of a simulated robot containing different cue combinations were shown to the partici-pants. They were asked to guess the target object during robot's motion, based on the assumption that intent expressive motion would result in earlier and more accurate guesses. Results showed that, OT, GM and several cue combinations led to faster and more accurate guesses which imply they can be used to generate communicative motion. However, MA had no effect, and surprisingly Exg had a negative effect on expressiveness.
  • Placeholder
    Publication
    Learning markerless robot-depth camera calibration and end-effector pose estimation
    (Ml Research Press, 2023) Department of Computer Engineering; Sefercik, Buğra Can; Akgün, Barış; Department of Computer Engineering; Koç Üniversitesi İş Bankası Yapay Zeka Uygulama ve Araştırma Merkezi (KUIS AI)/ Koç University İş Bank Artificial Intelligence Center (KUIS AI); College of Engineering; Graduate School of Sciences and Engineering
    Traditional approaches to extrinsic calibration use fiducial markers and learning-based approaches rely heavily on simulation data. In this work, we present a learning-based markerless extrinsic calibration system that uses a depth camera and does not rely on simulation data. We learn models for end-effector (EE) segmentation, single-frame rotation prediction and keypoint detection, from automatically generated real-world data. We use a transformation trick to get EE pose estimates from rotation predictions and a matching algorithm to get EE pose estimates from keypoint predictions. We further utilize the iterative closest point algorithm, multiple-frames, filtering and outlier detection to increase calibration robustness. Our evaluations with training data from multiple camera poses and test data from previously unseen poses give sub-centimeter and sub-deciradian average calibration and pose estimation errors. We also show that a carefully selected single training pose gives comparable results. © 2023 Proceedings of Machine Learning Research. All rights reserved.
  • Placeholder
    Publication
    Natural language communication with robots
    (Association for Computational Linguistics (ACL), 2016) Bisk, Yonatan; Marcu, Daniel; Department of Computer Engineering; Yüret, Deniz; Faculty Member; Department of Computer Engineering; College of Engineering; 179996
    We propose a framework for devising empirically testable algorithms for bridging the communication gap between humans and robots. We instantiate our framework in the context of a problem setting in which humans give instructions to robots using unrestricted natural language commands, with instruction sequences being subservient to building complex goal configurations in a blocks world. We show how one can collect meaningful training data and we propose three neural architectures for interpreting contextually grounded natural language commands. The proposed architectures allow us to correctly understand/ground the blocks that the robot should move when instructed by a human who uses unrestricted language. The architectures have more difficulty in correctly understanding/grounding the spatial relations required to place blocks correctly, especially when the blocks are not easily identifiable.
  • Placeholder
    Publication
    Object and relation centric representations for push effect prediction
    (Elsevier, 2024) Tekden, Ahmet E.; Asfour, Tamim; Uğur, Emre; Department of Computer Engineering; Erdem, Aykut; Department of Computer Engineering; College of Engineering
    Pushing is an essential non -prehensile manipulation skill used for tasks ranging from pre -grasp manipulation to scene rearrangement, reasoning about object relations in the scene, and thus pushing actions have been widely studied in robotics. The effective use of pushing actions often requires an understanding of the dynamics of the manipulated objects and adaptation to the discrepancies between prediction and reality. For this reason, effect prediction and parameter estimation with pushing actions have been heavily investigated in the literature. However, current approaches are limited because they either model systems with a fixed number of objects or use image -based representations whose outputs are not very interpretable and quickly accumulate errors. In this paper, we propose a graph neural network based framework for effect prediction and parameter estimation of pushing actions by modeling object relations based on contacts or articulations. Our framework is validated both in real and simulated environments containing different shaped multi -part objects connected via different types of joints and objects with different masses, and it outperforms image -based representations on physics prediction. Our approach enables the robot to predict and adapt the effect of a pushing action as it observes the scene. It can also be used for tool manipulation with never -seen tools. Further, we demonstrate 6D effect prediction in the lever -up action in the context of robot -based hard -disk disassembly.
  • Placeholder
    Publication
    Realtime engagement measurement in human-computer interaction
    (Institute of Electrical and Electronics Engineers Inc., 2020) Department of Computer Engineering; Department of Computer Engineering; Department of Computer Engineering; N/A; N/A; N/A; Sezgin, Tevfik Metin; Yemez, Yücel; Erzin, Engin; Türker, Bekir Berker; Numanoğlu, Tuğçe; Kesim, Ege; Faculty Member; Faculty Member; Faculty Member; PhD Student; Master Student; Master Student; Department of Computer Engineering; College of Engineering; College of Engineering; College of Engineering; Graduate School of Sciences and Engineering; Graduate School of Sciences and Engineering; Graduate School of Sciences and Engineering; 18632; 107907; 34503; N/A; N/A; N/A
    Social robots are expected to understand their interlocutors and behave accordingly like humans do. Endowing robots with the capability of monitoring user engagement during their interactions with humans is one of the crucial steps towards achieving this goal. In this work, an interactive game is designed and implemented, which is played with a robot. During the interaction, the user engagement is monitored in realtime via detection of user gaze, turn-taking, laughters/smiles and head nods from audio-visual data. In the experiments conducted, the realtime monitored engagement is found to be consistent with the humanannotated engagement levels.
  • Thumbnail Image
    PublicationOpen Access
    The role of roles: physical cooperation between humans and robots
    (Sage, 2012) Moertl, Alexander; Lawitzky, Martin; Hirche, Sandra; N/A; Department of Computer Engineering; Department of Mechanical Engineering; Küçükyılmaz, Ayşe; Sezgin, Tevfik Metin; Başdoğan, Çağatay; PhD Student; Faculty Member; Faculty Member; Department of Computer Engineering; Department of Mechanical Engineering; Graduate School of Sciences and Engineering; College of Engineering; N/A; 18632; 125489
    Since the strict separation of working spaces of humans and robots has experienced a softening due to recent robotics research achievements, close interaction of humans and robots comes rapidly into reach. In this context, physical human-robot interaction raises a number of questions regarding a desired intuitive robot behavior. The continuous bilateral information and energy exchange requires an appropriate continuous robot feedback. Investigating a cooperative manipulation task, the desired behavior is a combination of an urge to fulfill the task, a smooth instant reactive behavior to human force inputs and an assignment of the task effort to the cooperating agents. In this paper, a formal analysis of human-robot cooperative load transport is presented. Three different possibilities for the assignment of task effort are proposed. Two proposed dynamic role exchange mechanisms adjust the robot's urge to complete the task based on the human feedback. For comparison, a static role allocation strategy not relying on the human agreement feedback is investigated as well. All three role allocation mechanisms are evaluated in a user study that involves large-scale kinesthetic interaction and full-body human motion. Results show tradeoffs between subjective and objective performance measures stating a clear objective advantage of the proposed dynamic role allocation scheme.