The chatbot that millions of people have used to write term papers, computer code and fairy tales doesn’t just do words. ChatGPT, the artificial-intelligence-powered tool from OpenAI, can analyze images, too — describing what’s in them, answering questions about them and even recognizing specific people’s faces. The hope is that, eventually, someone could upload a picture of a broken-down car’s engine or a mysterious rash and ChatGPT could suggest the fix.
What OpenAI doesn’t want ChatGPT to become is a facial recognition machine.
For the last few months, Jonathan Mosen has been among a select group of people with access to an advanced version of the chatbot that can analyze images. On a recent trip, Mr. Mosen, an employment agency chief executive who is blind, used the visual analysis to determine which dispensers in a hotel room bathroom were shampoo, conditioner and shower gel. It went far beyond the performance of image analysis software he had used in the past.
“It told me the milliliter capacity of each bottle. It told me about the tiles in the shower,” Mr. Mosen said. “It described all of this in a way that a blind person needs to hear it. And with one picture, I had exactly the answers that I needed.”
For the first time, Mr. Mosen is able to “interrogate images,” he said. He gave an example: Text accompanying an image that he came across on social media described it as a “woman with blond hair looking happy.” When he asked ChatGPT to analyze the image, the chatbot said it was a woman in a dark blue shirt, taking a selfie in a full-length mirror. He could ask follow-up questions, like what kind of shoes she was wearing and what else was visible in the mirror’s reflection.
“It’s extraordinary,” said Mr. Mosen, 54, who lives in Wellington, New Zealand, and has demonstrated the technology on a podcast he hosts about “living blindfully.”
In March, when OpenAI announced GPT-4, the latest software model powering its A.I. chatbot, the company said it was “multimodal,” meaning it could respond to text and image prompts. While most users have been able to converse with the bot only in words, Mr. Mosen was given early access to the visual analysis by Be My Eyes, a start-up that typically connects blind users to sighted volunteers and provides accessible customer service to corporate customers. Be My Eyes teamed up with OpenAI this year to test the chatbot’s “sight” before the feature’s release to the general public.
Recently, the app stopped giving Mr. Mosen information about people’s faces, saying they had been obscured for privacy reasons. He was disappointed, feeling that he should have the same access to information as a sighted person.
The change reflected OpenAI’s concern that it had built something with a power it didn’t want to release.
The company’s technology can identify primarily public figures, such as people with a Wikipedia page, said Sandhini Agarwal, an OpenAI policy researcher, but does not work as comprehensively as tools built for finding faces on the internet, such as those from Clearview AI and PimEyes. The tool can recognize OpenAI’s chief executive, Sam Altman, in photos, Ms. Agarwal said, but not other people who work at the company.
Making such a feature publicly available would push the boundaries of what was generally considered acceptable practice by U.S. technology companies. It could also cause legal trouble in jurisdictions, such as Illinois and Europe, that require companies to get citizens’ consent to use their biometric information, including a faceprint.
Additionally, OpenAI worried that the tool would say things it shouldn’t about people’s faces, such as assessing their gender or emotional state. OpenAI is figuring out how to address these and other safety concerns before releasing the image analysis feature widely, Ms. Agarwal said.
“We very much want this to be a two-way conversation with the public,” she said. “If what we hear is like, ‘We actually don’t want any of it,’ that’s something we’re very on board with.”
Beyond the feedback from Be My Eyes users, the company’s nonprofit arm is also trying to come up with ways to get “democratic input” to help set rules for A.I. systems.
Ms. Agarwal said the development of visual analysis was not “unexpected,” because the model was trained by looking at images and text collected from the internet. She pointed out that celebrity facial recognition software already existed, such as a tool from Google. Google offers an opt-out for well-known people who don’t want to be recognized, and OpenAI is considering that approach.
Ms. Agarwal said OpenAI’s visual analysis could produce “hallucinations” similar to what had been seen with text prompts. “If you give it a picture of someone on the threshold of being famous, it might hallucinate a name,” she said. “Like if I give it a picture of a famous tech C.E.O., it might give me a different tech C.E.O.’s name.”
The tool once inaccurately described a remote control to Mr. Mosen, confidently telling him there were buttons on it that were not there, he said.
Microsoft, which has invested $10 billion in OpenAI, also has access to the visual analysis tool. Some users of Microsoft’s A.I.-powered Bing chatbot have seen the feature appear in a limited rollout; after uploading images to it, they have gotten a message informing them that “privacy blur hides faces from Bing chat.”
Sayash Kapoor, a computer scientist and doctoral candidate at Princeton University, used the tool to decode a captcha, a visual security check meant to be intelligible only to human eyes. Even while breaking the code and recognizing the two obscured words supplied, the chatbot noted that “captchas are designed to prevent automated bots like me from accessing certain websites or services.”
“A.I. is just blowing through all of the things that are supposed to separate humans from machines,” said Ethan Mollick, an associate professor who studies innovation and entrepreneurship at the University of Pennsylvania’s Wharton School.
Since the visual analysis tool suddenly appeared in Mr. Mollick’s version of Bing’s chatbot last month — making him, without any notification, one of the few people with early access — he hasn’t shut down his computer for fear of losing it. He gave it a photo of condiments in a refrigerator and asked Bing to suggest recipes for those ingredients. It came up with “whipped cream soda” and a “creamy jalapeño sauce.”
Both OpenAI and Microsoft seem aware of the power — and potential privacy implications — of this technology. A spokesman for Microsoft said that the company wasn’t “sharing technical details” about the face-blurring but was working “closely with our partners at OpenAI to uphold our shared commitment to the safe and responsible deployment of AI technologies.”
Source link