Latest advances in deep studying and automated speech recognition (ASR) have enabled the end-to-end (E2E) ASR system and boosted its accuracy to a brand new stage. The E2E programs implicitly mannequin all typical ASR elements, such because the acoustic mannequin (AM) and the language mannequin (LM), in a single community skilled on audio-text pairs. Regardless of this less complicated system structure, fusing a separate LM, skilled solely on textual content corpora, into the E2E system has confirmed to be useful. Nonetheless, the applying of LM fusion presents sure drawbacks, resembling its lack of ability to handle the area mismatch concern inherent to the interior AM. Drawing inspiration from the idea of LM fusion, we suggest the mixing of an exterior AM into the E2E system to handle the area mismatch higher. By implementing this novel method, we’ve got achieved a big discount within the phrase error price, with a formidable drop of as much as 14.3% throughout diversified take a look at units. We additionally found that this AM fusion method is especially useful in enhancing named entity recognition.