28(1)
/
2023 / 6
/
pp. 35 - 52
台語—國語神經網路翻譯系統
Taiwanese-Mandarin Neural Machine Translation
作者
周廷軒 Ting-Hsuan Chou
(國立臺灣海洋大學資訊工程學系 Department of Computer Sciene and Engineering, National Taiwan Ocean University)
林川傑 Chuan-Jie Lin *
(國立臺灣海洋大學資訊工程學系 Department of Computer Sciene and Engineering, National Taiwan Ocean University)
周廷軒 Ting-Hsuan Chou
國立臺灣海洋大學資訊工程學系 Department of Computer Sciene and Engineering, National Taiwan Ocean University
林川傑 Chuan-Jie Lin *
國立臺灣海洋大學資訊工程學系 Department of Computer Sciene and Engineering, National Taiwan Ocean University
中文摘要
本論文提出一個台語—國語互譯的神經網路機器翻譯系統,以目前能找到的大型台語語料庫做為訓練資料,測試以字或詞為翻譯單位、自訓練或預訓練嵌入模型、各種未知詞處理策略,來找尋各種翻譯方向的最佳系統。最後建立出的國語翻台語系統效能過贏過目前已知的所有台語翻譯系統,在新聞類文章翻譯效能BLEU分數來到75.02分,散文類文章則是38.13分。此外,我們也第一次建立台語翻國語的系統,在上述兩種文本類別測試的BLEU分數分別是73.38及35.32。
英文摘要
This paper proposes a Taiwanese-Mandarin neural machine translation system trained by all available Taiwanese corpora and translation datasets. The unit of translation can be either Chinese words or characters. Embedding will be either self-trained or pre-trained. And some strategies will be proposed to handle OOV output. The final Mandarin-to-Taiwanese outperforms all known Taiwanese MT systems. The best BLEU score evaluated on news articles is 75.02, and the best score on literature articles is 38.13. We also built the first Taiwanese-to-Mandarin NMT system in the world, which achieves BLEU scores of 73.38 and 35.32 on those two genres of articles.
中文關鍵字
台語;機器翻譯;神經網路;嵌入向量
英文關鍵字
Taiwanese; Machine Translation; Neural Network; Embedding Model