GPT-4 with imaginative and prescient (GPT-4V) allows customers to instruct GPT-4 to investigate picture inputs offered by the person, and is the most recent functionality we’re making broadly accessible. Incorporating extra modalities (corresponding to picture inputs) into massive language fashions (LLMs) is seen by some as a key frontier in synthetic intelligence analysis and improvement. Multimodal LLMs provide the potential for increasing the influence of language-only programs with novel interfaces and capabilities, enabling them to resolve new duties and supply novel experiences for his or her customers. On this system card, we analyze the protection properties of GPT-4V. Our work on security for GPT-4V builds on the work achieved for GPT-4 and right here we dive deeper into the evaluations, preparation, and mitigation work achieved particularly for picture inputs.