{"id":949,"date":"2025-08-17T14:12:00","date_gmt":"2025-08-17T12:12:00","guid":{"rendered":"http:\/\/nis2.management\/?p=949"},"modified":"2025-08-18T11:24:41","modified_gmt":"2025-08-18T09:24:41","slug":"when-privacy-becomes-training-data","status":"publish","type":"post","link":"https:\/\/nis2.management\/en\/2025\/08\/17\/when-privacy-becomes-training-data\/","title":{"rendered":"When privacy becomes training data"},"content":{"rendered":"<p>Researchers found millions of passports, credit cards, r\u00e9sum\u00e9s, and faces in DataComp CommonPool, a massive AI training dataset scraped from the web.<\/p>\n\n\n\n<p>Auditing just 0.1% revealed hundreds of millions of likely PII (personally identifiable information) items, including sensitive job and health details.<\/p>\n\n\n\n<p>Despite face-blurring tools, researchers estimate 102 million faces were missed, and metadata\/captions still expose names, addresses, and locations.<\/p>\n\n\n\n<p>With over 2 million downloads, countless AI models may already be trained on this data, raising privacy and consent concerns.<\/p>\n\n\n\n<p>Experts warn: if it\u2019s online, it\u2019s probably been scraped\u2014highlighting the urgent need for new laws and ethical standards in AI data use.<\/p>\n\n\n\n<p>Read <a href=\"https:\/\/www.technologyreview.com\/2025\/07\/18\/1120466\/a-major-ai-training-data-set-contains-millions-of-examples-of-personal-data\/\">the Technology Review article<\/a> for more information.<\/p>","protected":false},"excerpt":{"rendered":"<p>Researchers found millions of passports, credit cards, r\u00e9sum\u00e9s, and faces in DataComp CommonPool, a massive AI training dataset scraped from the web. Auditing just 0.1% revealed hundreds of millions of<\/p>\n<p><a href=\"https:\/\/nis2.management\/en\/2025\/08\/17\/when-privacy-becomes-training-data\/\" class=\"av-btn av-btn-secondary av-btn-bubble\">Read more<span class=\"screen-reader-text\">When privacy becomes training data<\/span><i class=\"fa fa-arrow-right\"><\/i><span class=\"bubble_effect\"><span class=\"circle top-left\"><\/span><span class=\"circle top-left\"><\/span><span class=\"circle top-left\"><\/span><span class=\"button effect-button\"><\/span><span class=\"circle bottom-right\"><\/span><span class=\"circle bottom-right\"><\/span><span class=\"circle bottom-right\"><\/span><\/span><\/a><\/p>","protected":false},"author":2,"featured_media":950,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[92,102,99,104,101,100,103,91,98],"class_list":["post-949","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-bcm","tag-ai","tag-blurring","tag-consent","tag-ethics","tag-health","tag-model","tag-pii","tag-privacy","tag-training"],"_links":{"self":[{"href":"https:\/\/nis2.management\/en\/wp-json\/wp\/v2\/posts\/949","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nis2.management\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nis2.management\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nis2.management\/en\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/nis2.management\/en\/wp-json\/wp\/v2\/comments?post=949"}],"version-history":[{"count":1,"href":"https:\/\/nis2.management\/en\/wp-json\/wp\/v2\/posts\/949\/revisions"}],"predecessor-version":[{"id":951,"href":"https:\/\/nis2.management\/en\/wp-json\/wp\/v2\/posts\/949\/revisions\/951"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/nis2.management\/en\/wp-json\/wp\/v2\/media\/950"}],"wp:attachment":[{"href":"https:\/\/nis2.management\/en\/wp-json\/wp\/v2\/media?parent=949"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nis2.management\/en\/wp-json\/wp\/v2\/categories?post=949"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nis2.management\/en\/wp-json\/wp\/v2\/tags?post=949"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}