ChatGPT の画像生成機能がアップグレードされました

火曜日のライブストリーム中に、OpenAI CEO のサムアルトマン氏は、ChatGPT の画像生成機能への 1 年以上ぶりの大幅アップグレードを発表しました。 ChatGPT は、同社の GPT-4o モデルを活用して、画像や写真をネイティブに作成および変更できるようになりました。 GPT-4o は長い間人工知能チャットボットプラットフォームの基礎でしたが、これまでこのモデルはテキストの生成と編集のみが可能で、画像は生成できませんでした。

Altman said GPT-4o native image generation is now live in ChatGPT and OpenAI’s AI video generation product Sora, available to subscribers of the company’s $200 per month Pro plan. OpenAIは、この機能は間もなくChatGPTのPlusユーザーと無料ユーザー、そして同社のAPIサービスを使用する開発者に展開される予定だと述べた。

具有图像输出功能的GPT-4o比它有效取代的图像生成模型DALL-E3的“思考”时间更长，从而可以生成OpenAI所描述的更准确、更详细的图像。GPT-4o可以编辑现有图像，包括其中有人的图像——对它们进行转换或“修复”细节，例如前景和背景对象。

OpenAIは、新しい画像生成機能の実装にどのような画像データを使用したかは明らかにしていない。 Many generative AI vendors view training data as a competitive advantage and are therefore secretive about it and the information surrounding it. But training data details could also trigger litigation related to intellectual property, another reason companies are reluctant to disclose too much information.

OpenAI には、クリエイターが自分の作品をトレーニングデータセットから削除するようリクエストできるオプトアウトフォームが用意されています。同社はまた、ウェブスクレイピングボットがウェブサイトから画像を含むトレーニングデータを収集することを禁止する要請を尊重すると述べた。

ChatGPT’s upgraded image generation capabilities come on the heels of Google’s experimental native image output for one of its flagship models, Gemini 2.0 Flash.この強力な機能はソーシャルメディアで急速に広まっていますが、必ずしも正当な理由があるわけではありません。 The graphics component of Gemini2.0 Flash has few protections, allowing people to remove watermarks and create images depicting copyrighted characters.