How ChatGPT learns about the world while protecting privacy
How ChatGPT learns about the world while protecting privacy A plain-language guide to model training, privacy safeguards, and the privacy choices available in ChatGPT. Editor's note for Canada: The French text follows the English text (Le texte français suit le texte anglais). ChatGPT is becoming more capable across domains, helping people with complex, real-world work like coding, research, analysis, and multi-step tasks across tools. Those gains in capability are driven by training on a wide variety of data to help our models build broad knowledge of the world and apply it to new tasks. As OpenAI continues to develop frontier models, we work hard to help ensure that our model training process respects privacy. We’ve developed state of the art technologies to help our models learn useful general patterns rather than private information about individuals, and we have a number of user controls and policies to help keep individuals in control of their data. This post explains what information may be used in model training, how we reduce the processing of personal information in that process, and how users can control whether their ChatGPT conversations help improve our models. To develop the models that power ChatGPT(opens in a new window), we use a mix of information sources, including publicly available information, information we access through partnerships, and information provided or generated by users, contractors, and researchers. This data helps models build general knowledge and respond more reliably and safely. For publicly available internet content, we use only information that is freely and openly accessible. For example, if you participate in a publicly available online discussion forum, or post a blog or other public post, we may use that publicly accessible content for model training purposes. Before information is used in training, we apply safeguards designed to reduce personal information in our datasets. One of those safeguards is OpenAI Privacy Filter, which identifies and masks personal information in text. In our evaluations, Privacy Filter is more effective at removing personal information than any other tool of its kind. We use an internal version of Privacy Filter at multiple stages in the training process, including on public datasets that we use for training, as well as on user conversations if they have “Improve the model for everyone” enabled. We have also made Privacy Filter available to other developers for free(opens in a new window), to help the broader industry protect privacy in their workflows. Users can choose whether their conversations with ChatGPT help train future models. Users can go to Settings, then Data Controls, and turn off "Improve the model for everyone.(opens in a new window)” Once this setting is off, new conversations still appear in chat history but are not used to train ChatGPT. Temporary Chat(opens in a new window) offers another option. To start one, open a new chat and click the "Temporary" button in the top-right corner of the page. Temporary Chats do not appear in chat history, do not create memories, and are not used to improve our…

