Impressed by the developments in basis fashions for language-vision modeling, we discover the utilization of transformers and large-scale pretraining on biosignals. On this research, our purpose is to design a general-purpose structure for biosignals that may be simply skilled on a number of modalities and could be tailored to new modalities or duties with ease.
The proposed mannequin is designed with three key options: (i) A frequency-aware structure that may effectively establish native and world info from biosignals by leveraging world filters within the frequency area. (ii) A channel-independent design that shares the encoder’s weights throughout completely different channels utilizing both general-purpose or modality-specific filters. (iii) A modality-combining transformer able to successfully combining an arbitrary variety of modalities. We show the robustness of the proposed structure on a number of biosignal datasets, the place we present the proposed structure doesn’t solely carry out higher than single-modality fashions, but additionally outperform in switch studying duties.