A family robotic ought to be capable to navigate to focus on areas with out requiring customers to first annotate every little thing of their house. Present approaches to this object navigation problem don’t take a look at on actual robots and depend on costly semantically labeled 3D meshes. On this work, our purpose is an agent that builds self-supervised fashions of the world by way of exploration, the identical as a toddler would possibly. We suggest an end-to-end self-supervised embodied agent that leverages exploration to coach a semantic segmentation mannequin of 3D objects, and makes use of these representations to study an object navigation coverage purely from self-labeled 3D meshes. The important thing perception is that embodied brokers can leverage location consistency as a supervision sign — amassing photographs from completely different views/angles and making use of contrastive studying to fine-tune a semantic segmentation mannequin. In our experiments, we observe that our framework performs higher than different self-supervised baselines and competitively with supervised baselines, in each simulation and when deployed in actual homes.