Publication: Multimodal speaker identification with audio-video processing
| dc.conference.date | SEP 14-17, 2003 | |
| dc.conference.location | Barcelona, Spain | |
| dc.conference.organizer | IEEE International Conference on Image Processing | |
| dc.contributor.department | MVGL (Multimedia, Vision and Graphics Laboratory) | |
| dc.contributor.facultymember | Yes | |
| dc.contributor.kuauthor | Erzin, Engin | |
| dc.contributor.kuauthor | Kanak, Alper | |
| dc.contributor.kuauthor | Tekalp, Ahmet Murat | |
| dc.contributor.kuauthor | Yemez, Yücel | |
| dc.contributor.schoolcollegeinstitute | Laboratory | |
| dc.date.accessioned | 2024-11-10T00:06:40Z | |
| dc.date.issued | 2003 | |
| dc.description.abstract | In this paper we present a multimodal audio-visual speaker identification system. The objective is to improve the recognition performance over conventional unimodal schemes. The proposed system decomposes the information existing in a video stream into three components: speech, face texture and lip motion. Lip motion between successive frames is first computed in terms of optical row vectors and then encoded as a feature vector in a magnitude-direction histogram domain. The feature vectors obtained along the whole stream are then interpolated to match the rate of the speech signal and fused with mel frequency cepstral coeffcients (MFCC) of the corresponding speech signal. The resulting joint feature vectors are used to train and test a Hidden Markov Model (HMM) based identification system. Face texture images are treated separately in eigenface domain and integrated to the system through decision-fusion. Experimental results are also included for demonstration of the system performance. | |
| dc.description.fulltext | Yes | |
| dc.description.harvestedfrom | Manual | |
| dc.description.indexedby | WOS | |
| dc.description.indexedby | Scopus | |
| dc.description.openaccess | Green OA | |
| dc.description.peerreviewstatus | N/A | |
| dc.description.publisherscope | International | |
| dc.description.readpublish | N/A | |
| dc.description.sponsoredbyTubitakEu | N/A | |
| dc.description.studentonlypublication | No | |
| dc.description.studentpublication | Yes | |
| dc.description.version | Post-print | |
| dc.identifier.embargo | No | |
| dc.identifier.endpage | 8 | |
| dc.identifier.filenameinventoryno | IR06888 | |
| dc.identifier.isbn | 0780377508 | |
| dc.identifier.quartile | N/A | |
| dc.identifier.scopus | 2-s2.0-0345565788 | |
| dc.identifier.startpage | 5 | |
| dc.identifier.uri | https://hdl.handle.net/20.500.14288/16652 | |
| dc.identifier.wos | 000187010500002 | |
| dc.keywords | Speech | |
| dc.keywords | Pattern recognition | |
| dc.keywords | Video processing | |
| dc.language.iso | eng | |
| dc.publisher | Institute of Electrical and Electronics Engineers | |
| dc.relation.affiliation | Koç University | |
| dc.relation.collection | Koç University Institutional Repository | |
| dc.relation.ispartof | 2003 International Conference on Image Processing, Vol 3, Proceedings | |
| dc.relation.openaccess | Yes | |
| dc.rights | Other | |
| dc.subject | Computer science | |
| dc.subject | Artificial intelligence | |
| dc.subject | Imaging systems | |
| dc.subject | Photography | |
| dc.title | Multimodal speaker identification with audio-video processing | |
| dc.type | Conference Proceeding | |
| dspace.entity.type | Publication | |
| local.contributor.kuauthor | Yemez, Yücel | |
| local.contributor.kuauthor | Kanak, Alper | |
| local.contributor.kuauthor | Erzin, Engin | |
| local.contributor.kuauthor | Tekalp, Ahmet Murat | |
| relation.isOrgUnitOfPublication | cb6bbbf6-fd19-4052-b581-f591a9748d21 | |
| relation.isOrgUnitOfPublication.latestForDiscovery | cb6bbbf6-fd19-4052-b581-f591a9748d21 | |
| relation.isParentOrgUnitOfPublication | 20385dee-35e7-484b-8da6-ddcc08271d96 | |
| relation.isParentOrgUnitOfPublication.latestForDiscovery | 20385dee-35e7-484b-8da6-ddcc08271d96 |
Files
Original bundle
1 - 1 of 1
