越南語是一種孤立語,具有豐富的複合詞生產能力,但沒有存在形態句法、音位佈局或音韻學的證據來假設其音節與短語之間存在一個語言層面(Schiering et al. 2010)。我們利用隨機森林分類器模擬一個人工聽眾,以研究複合詞與短語在語音上的可區分性,這項研究遵循了Nguyen和Ingram(2007)的方法。這種機器學習演算法展現了系統僅憑語音學特徵區分這兩個類別的最大潛力。此外,它還對每個語音相關因素在區分這些類別中的重要性進行了排序,使我們能夠解釋不僅包括特定語音維度上是否存在差異,還包括這種差異的重要性。結果證實,這兩個類別只能在最大對比條件下透過語音來分離,且最大對比是透過接縫標記實現的。進一步地,我們展示了即使在最大對比條件下,這兩個類別也不能完美分離。而且從語音資料中也產生了複合詞偏見,即使隨機森林分類器是基於最大對比資料進行訓練的。
Vietnamese is an isolating language with rich productive compounding, but no morphosyntactic, phonotactic or phonological evidence to assume a linguistic level between the syllable and the phrase (Schiering et al. 2010). We model an artificial listener with a Random Forest Classifier, to study the phonetic distinguishability of compounds vs. phrases, following Nguyen and Ingram (2007). This Machine Learning algorithm represents the maximal potential for a system to differentiate the two classes based on phonetics alone. It ranks the importance of each phonetic correlate to the differentiation of these classes. This allows an interpretation beyond whether a difference on a particular phonetic dimension exists including how important this difference is. The results confirm that the two classes can only be phonetically separated under circumstances of maximal contrast, and that maximal contrast is realized through juncture marking. Furthermore, we show that the two classes cannot be perfectly separated even under conditions of maximal contrast and additionally that there is an across-the-board preference for a compound interpretation from the phonetic data, even when the Random Forest Classifier was trained on maximal contrast data.
語音學; 隨機森林分類; 韻律層級; 語言輸出; 分塊; 音韻學
Phonetics; Random Forest Classification; Prosodic Hierarchy; Speech Production; Chunking; Phonology