OpenAI says it’s “impossible” to create useful AI models without copyrighted material

sculd@beehaw.org · 10 months ago

OpenAI says it’s “impossible” to create useful AI models without copyrighted material

Critical_Insight@feddit.uk · 10 months ago

There’s not a musician that havent heard other songs before. Not a painter that haven’t seen other painting. No comedian that haven’t heard jokes. No writer that haven’t read books.

AI haters are not applying the same standards to humans that they do to generative AI. Obviously this is not to say that AI can’t plagiarize. If it’s spitting out sentences that are direct quotes from an article someone wrote before and doesn’t disclose the source then yeah that is an issue. There’s however a limit after which the output differs enough from the input that you can’t claim it’s stealing even if perfectly mimics the style of someone else.

Just because DallE creates pictures that have getty images watermark on them it doesn’t mean the picture itself is a direct copy from their database. If anything it’s the use of the logo that’s the issue. Not the picture.

sculd@beehaw.org · 10 months ago

Said in another thread but I will repeat here. AIs are not humans. AIs’ creative process and learning process are also different.

AIs are being used to make profit for executives while creators suffer.

Critical_Insight@feddit.uk · 10 months ago

That sucks for the creators ofcourse but if AI creates better content that’s where people will go. That’s a big if though especially in the near future

sour@kbin.social · 10 months ago

better

BraveSirZaphod@kbin.social · 10 months ago

AI haters are not applying the same standards to humans that they do to generative AI

I don’t think it should go unquestioned that the same standards should apply. No human is able to look at billions of creative works and then create a million new works in an hour. There’s a meaningfully different level of scale here, and so it’s not necessarily obvious that the same standards should apply.

If it’s spitting out sentences that are direct quotes from an article someone wrote before and doesn’t disclose the source then yeah that is an issue.

A fundamental issue is that LLMs simply cannot do this. They can query a webpage, find a relevant chunk, and spit that back at you with a citation, but it is simply impossible for them to actually generate a response to a query, realize that they’ve generated a meaningful amount of copyrighted material, and disclose its source, because it literally does not know its source. This is not a fixable issue unless the fundamental approach to these models changes.