臺灣學術期刊開放取用平台

keyboard_backspace

返回卷期清單 (27(2) / 2022 / 12)

27(2)

/

2022 / 12

/

pp. 31 - 46

探討語者驗證系統中特徵處理模組與注意力機制

Investigation of Feature Processing Modules and Attention Mechanisms in Speaker Verification System

113

70

[全文電子檔下載 (Download PDF)]

作者

Ting-Wei Chen *

(Department of Computer Science and Engineering, National Sun Yat-sen University)

Wei-Ting Lin

(Department of Computer Science and Engineering, National Sun Yat-sen University)

Chia-Ping Chen

(Department of Computer Science and Engineering, National Sun Yat-sen University)

Chung-Li Lu

(Chunghwa Telecom Laboratories)

Bo-Cheng Chan

(Chunghwa Telecom Laboratories)

Yu-Han Cheng

(Chunghwa Telecom Laboratories)

Hsiang-Feng Chuang

(Chunghwa Telecom Laboratories)

Wei-Yu Chen

(Chunghwa Telecom Laboratories)

Ting-Wei Chen *

Department of Computer Science and Engineering, National Sun Yat-sen University

Wei-Ting Lin

Department of Computer Science and Engineering, National Sun Yat-sen University

Chia-Ping Chen

Department of Computer Science and Engineering, National Sun Yat-sen University

Chung-Li Lu

Chunghwa Telecom Laboratories

Bo-Cheng Chan

Chunghwa Telecom Laboratories

Yu-Han Cheng

Chunghwa Telecom Laboratories

Hsiang-Feng Chuang

Chunghwa Telecom Laboratories

Wei-Yu Chen

Chunghwa Telecom Laboratories

中文摘要

本論文建構並替換不同的音訊特徵前處理模組與注意力機制來改進語者驗證系統。我們使用了基於ECAPA-TDNN 所改進的模型作為基準模型，並透過替換與組合不同的前處理模組與注意力機制來進行比較，以選出最佳的組合作為論文提出的最終模型。訓練上我們使用了VoxCeleb 2資料集進行訓練，並使用多個測試集來測試模型的表現。最終模型在VoxSRC2022驗證集中對比基準模型有16% 的進步幅度，成功在語者驗證系統上取得了更好的成效。

英文摘要

In this paper, we use several combinations of feature front-end modules and attention mechanisms to improve the performance of our speaker verification system. An updated version of ECAPA-TDNN is chosen as a baseline. We replace and integrate different feature front-end and attention mechanism modules to compare and find the most effective model design, and this model would be our final system. We use VoxCeleb 2 dataset as our training set, and test the performance of our models on several test sets. With our final proposed model, we improved performance by 16% over baseline on VoxSRC2022 valudation set, achieving better results for our speaker verification system.

中文關鍵字

語者驗證; 前處理模組; 注意力機制; 時延神經網路

英文關鍵字

Speaker Verification; Frontend Module; Attention Mechanism; Time Delay Neural Network