27(2)
/
2022 / 12
/
pp. 31 - 46
探討語者驗證系統中特徵處理模組與注意力機制
Investigation of Feature Processing Modules and Attention Mechanisms in Speaker Verification System
作者
Ting-Wei Chen *
(Department of Computer Science and Engineering, National Sun Yat-sen University)
Wei-Ting Lin
(Department of Computer Science and Engineering, National Sun Yat-sen University)
Chia-Ping Chen
(Department of Computer Science and Engineering, National Sun Yat-sen University)
Chung-Li Lu
(Chunghwa Telecom Laboratories)
Bo-Cheng Chan
(Chunghwa Telecom Laboratories)
Yu-Han Cheng
(Chunghwa Telecom Laboratories)
Hsiang-Feng Chuang
(Chunghwa Telecom Laboratories)
Wei-Yu Chen
(Chunghwa Telecom Laboratories)
Ting-Wei Chen *
Department of Computer Science and Engineering, National Sun Yat-sen University
Wei-Ting Lin
Department of Computer Science and Engineering, National Sun Yat-sen University
Chia-Ping Chen
Department of Computer Science and Engineering, National Sun Yat-sen University
Chung-Li Lu
Chunghwa Telecom Laboratories
Bo-Cheng Chan
Chunghwa Telecom Laboratories
Yu-Han Cheng
Chunghwa Telecom Laboratories
Hsiang-Feng Chuang
Chunghwa Telecom Laboratories
Wei-Yu Chen
Chunghwa Telecom Laboratories
中文摘要
本論文建構並替換不同的音訊特徵前處理模組與注意力機制來改進語者驗證系統。我們使用了基於ECAPA-TDNN 所改進的模型作為基準模型,並透過替換與組合不同的前處理模組與注意力機制來進行比較,以選出最佳的組合作為論文提出的最終模型。訓練上我們使用了VoxCeleb 2資料集進行訓練,並使用多個測試集來測試模型的表現。最終模型在VoxSRC2022驗證集中對比基準模型有16% 的進步幅度,成功在語者驗證系統上取得了更好的成效。
英文摘要
In this paper, we use several combinations of feature front-end modules and attention mechanisms to improve the performance of our speaker verification system. An updated version of ECAPA-TDNN is chosen as a baseline. We replace and integrate different feature front-end and attention mechanism modules to compare and find the most effective model design, and this model would be our final system. We use VoxCeleb 2 dataset as our training set, and test the performance of our models on several test sets. With our final proposed model, we improved performance by 16% over baseline on VoxSRC2022 valudation set, achieving better results for our speaker verification system.
中文關鍵字
語者驗證; 前處理模組; 注意力機制; 時延神經網路
英文關鍵字
Speaker Verification; Frontend Module; Attention Mechanism; Time Delay Neural Network