This paper was accepted on the EMNLP Workshop on Computational Approaches to Linguistic Code-Switching (CALCS).
Code-switching (CS), i.e. mixing completely different languages in a single sentence, is a typical phenomenon in communication and could be difficult in lots of Pure Language Processing (NLP) settings. Earlier research on CS speech have proven promising outcomes for end-to-end speech translation (ST), however have been restricted to offline situations and to translation to one of many languages current within the supply (monolingual transcription).
On this paper, we concentrate on two important but unexplored areas for real-world CS speech translation: streaming settings, and translation to a 3rd language (i.e., a language not included within the supply). To this finish, we prolong the Fisher and Miami check and validation datasets to incorporate new targets in Spanish and German. Utilizing this information, we prepare a mannequin for each offline and streaming ST and we set up baseline outcomes for the 2 settings talked about earlier.