Taiwanese translation test

Supaplex · June 16, 2022, 9:20am

測試的問題，測試台灣話(Taiwanese Hokkien)在主要的機器翻譯可能出的問題

「你共我講袂用得，為何按呢?」

完全錯誤的翻譯結果

Qgil-WMF · June 16, 2022, 5:47pm

@VChang_WMF you have been testing Taiwanese as well, right?

Supaplex · June 17, 2022, 8:01am

Right now, the existing machine translation tools will not work on Taiwanese Hokkien, they will treat it as Mandarin if it is written in Chinese character, or it Vietnamese if use the POJ or Taiwan’s MOE’s Tâi-lô system. Some Chinese people couldn’t recognize Taiwanese Hokkien written in POJ form, using the embedded machine translation think it is Vietnamese.

I am glad that WMF has a technology optimism about the machine translation technology, but for small or regional languages like Taiwanese Hokkien, it doesn’t work right now.

Qgil-WMF · June 17, 2022, 8:05am

Would you like to contribute your findings to Supporting automatic translations of languages existing on wiki but not supported by google translate?

VChang_WMF · June 17, 2022, 8:20am

Unfortunately this is the case where google translation does not support it. @Iyumu raised this issue as well.

@Supaplex is there any tool/way comes to your mind that you think can potentially address this issue? To-siā!

Supaplex · June 18, 2022, 7:27am

I am not a language expert, so I could only point you to the existing resource on Wikipedia. The more Hokkien grammar a sentence is, the more machine translation relies on Mandarin will go wrong. And if there are more Hokkien only characters or totally different usage in Hokkien, the more likely a machine translation relying on Mandarin will go wrong.

Hokkien Language grammer on zh Wikipedia
Taiwanese Hokkien characterson zh Wikipedia

Iyumu · June 23, 2022, 2:51am

Góa bo̍k-chêng chai-iáⁿ Tâi-oân ū chi̍t-tīn lâng teh chhòng Tâi-oân-ōe ê Common Voice, m̄-koh in chú-iàu chhái-iōng ê sī Hàn-jī kap Tâi-lô, án-ne khióng-kiaⁿ sī khah bô hāu-lut ê. In-ūi chhin-chhiūⁿ lí kóng ê, ko͘ Hàn-jī, ke-khì to̍h pháiⁿ liáu-kái ah, koh khah bián kóng sī Hàn-jī hām Lô-má-jī lām-ēng.

As far as I know, there’s already a project on Common Voice carried by the community here in Taiwan. However, they are working the project with the documents of Han characters and Tailo romanization which might be not efficient enough for the machine to understand the language. Like you mentioned sentences only written by the Han characters can not be processed well now. And not to mention the sentence in both Han and Tailo (I’m not sure, it seems more complicated to me).

Iyumu · June 23, 2022, 7:20am

Siūⁿ-beh chhiáⁿ-mn̄g lán che ke-khì hoan-e̍k ê būn-tê, kám mā ū khan-siap tio̍h ISO gí-giân hoan-hō (ISO639-3 code). Nā sī ū, lán Tâi-oân-ōe ê hoan-hō khó-lêng tī chit 2 nî ē ū piàn-tōng. In-ūi kū-nî, ū lâng chiam-tùi Bân-lâm-gú ê “nan” hoan-hō, hiòng ISO639-3 úi-oân-hōe the̍h-chhut siong-koan ê sin-chhéng. Chhiáⁿ-khòaⁿ chit-ê liân-kiat.

I was thinking about if the translation stuffs are related to langauge code (ISO639-3). If yes, the code of Taigi (Taiwanese) might be changed due to the code change request last year which aims to separate Taigi from the code of Minnan (nan).

Topic		Replies	Views
日本語スピーカー Japanese editors wanted: New design for Talk Pages Improve User Experience en , ja , talk-pages	5	423	September 5, 2022
Informal language variants Forum improvements discourse-software	3	348	June 8, 2022
No translation Site Feedback ar , error , discourse-software	6	364	June 30, 2022
Just chatting General en	24	2053	June 21, 2022
Fix automatic translation error messages Forum improvements en , discourse-software	18	1052	June 30, 2022

Taiwanese translation test

Related topics