Sleep staging is a clinically necessary activity for diagnosing numerous sleep problems however stays difficult to deploy at scale as a result of it requires medical experience, amongst different causes. Deep studying fashions can carry out the duty however on the expense of huge labeled datasets, that are unfeasible to obtain at scale. Whereas self-supervised studying (SSL) can mitigate this want, latest research on SSL for sleep staging have proven efficiency good points saturate after coaching with labeled information from solely tens of topics, therefore are unable to match peak efficiency attained with bigger datasets. We hypothesize that the fast saturation stems from making use of a pretraining scheme that solely pretrains a portion of the structure, i.e., the characteristic encoder however not the temporal encoder; due to this fact, we suggest adopting an structure that seamlessly {couples} the characteristic and temporal encoding and an acceptable pretraining scheme that pretrains the whole mannequin. On a pattern sleep staging dataset, we discover that the proposed scheme presents efficiency good points that don’t saturate with the labeled coaching dataset dimension (e.g., 3-5% enchancment in balanced accuracy throughout low- to high-labeled information settings), which translate into vital reductions within the quantity of labeled coaching information wanted for prime efficiency (e.g., by 800 topics). Based mostly on our findings, we suggest adopting this SSL paradigm for subsequent work on SSL for sleep staging.