Added german_cleaners#1642
Conversation
|
Hi @padmalcom thank you for the pr, could you please explain why here you try to avoid lowercasing for German? And in German are abbreviations used? Is this cleaner general for German or for certain datasets? (Excuse that I don't speak German.) Thank you. |
|
Hi @BenoitWang, sure! In German nouns start with a capital letter, verbs, adverbs etc. don't. E.g. "Zahlen" (numbers) and "zahlen" (to pay) can have different meanings, so cases are important. Abbreviations are used, yes. Since the dataset I use, is generated from mp3s and text is extracted via speech-to-text models and fixed by a punctuation model, I'm pretty sure that it does not contain abbreviations. The cleaner will work for every dataset. |
|
Ok I see. Please finish the tests (some simple linter issues). You are also welcome to create a list of german abbreviations if you've got time. |
|
Sorry what ist the issue Here? |
I added a german cleaner to avoid lowercasing texts.