Neural text-to-speech (TTS) can present high quality near pure speech if an enough quantity of high-quality speech materials is offered for coaching. Nevertheless, buying speech information for TTS coaching is expensive and time-consuming, particularly if the objective is to generate completely different talking kinds. On this work, we present that we are able to switch talking fashion throughout audio system and enhance the standard of artificial speech by coaching a multi-speaker multi-style (MSMS) mannequin with long-form recordings, along with common TTS recordings. Specifically, we present that 1) multi-speaker modeling improves the general TTS high quality, 2) the proposed MSMS strategy outperforms pre-training and fine-tuning strategy when using further multi-speaker information, and three) long-form talking fashion is extremely rated whatever the goal textual content area.