本論文針對消防指揮記錄中心無綫電語料中,語句長、噪聲多的特點,提出了一種在長語句中辨識意圖的方法。此方法首先使用自監督學習 (self super- vised learning) 模型來進行語音的特徵提取,然後再使用兩個下游模型分別對語音中的意圖進行偵測和辨識。此方法在無線電語料長語音意圖辨識任務上與 Whis- per+BERT 的方法相比,錯誤減少率 (error reduction rate, ERR) 為 33.2%。在關鍵詞發現(keyword spotting) 任務上與區域提案網絡(region proposal network, RPN) 方法相比,在每小時誤報次數(false alarm per hour, FAH) 相近的情況下,錯誤拒絕率 (false rejection rate, FRR) 的ERR 為 73.2%。在短語句分類任務上和Whisper+BERT 方法相比,ERR 為 4.3%。同時與Whisper+BERT 方法相比,推理算力需求下降了 91.4%。我們所提出的方法,可以廣泛地應用在從長語音(電話或無線電對話等)中提取關鍵資訊。
According to the characteristics of long sentences and lots of noise in the radio corpus of the fire command record center, this paper proposes a method to identify intent in long sentences. This method first using a self-supervised learning model for speech feature extraction, and then using two downstream models to detect and recognize intent in speech respectively. Compared with the Whisper+BERT method, this method has an error reduction rate (ERR) of 33.2% in the long speech intent recognition task of radio corpus. Compared with the region proposal network (RPN) method on the keyword spotting task, the false alarm per hour (FAH) is similar, and the false rejection rate (false rejection rate, FRR) ERR is 73.2%. Compared with the Whisper+BERT method on the short sentence classification task, the ERR is 4.3%. At the same time, compared with the Whisper+BERT method, the inference computing power requirement has dropped by 91.4%. This method can be widely used in the fields of extracting key information from long speech, recording key information of telephone or radio communication and so on.
意圖分類;長語句;自監督學習;關鍵詞發現;語句分類
Intent Classification; Long Speech Sentence; Self-supervised Learning; Keyword Spot-ting; Speech Sentence Classification