Novel view synthesis from a single picture requires inferring occluded areas of objects and scenes whereas concurrently sustaining semantic and bodily consistency with the enter. Current approaches situation neural radiance fields (NeRF) on native picture options, projecting factors to the enter picture aircraft, and aggregating 2D options to carry out quantity rendering. Nevertheless, underneath extreme occlusion, this projection fails to resolve uncertainty, leading to blurry renderings that lack particulars. On this work, we suggest NerfDiff, which addresses this concern by distilling the information of a 3D-aware conditional diffusion mannequin (CDM) into NeRF by way of synthesizing and refining a set of digital views at take a look at time. We additional suggest a novel NeRF-guided distillation algorithm that concurrently generates 3D constant digital views from the CDM samples, and fine-tunes the NeRF based mostly on the improved digital views. Our strategy considerably outperforms present NeRF-based and geometry-free approaches on difficult datasets, together with ShapeNet, ABO, and Clevr3D.