• 0 Posts
  • 44 Comments
Joined 2 years ago
cake
Cake day: June 12th, 2023

help-circle
  • The language model isn’t teaching anything it is changing the wording of something and spitting it back out. And in some cases, not changing the wording at all, just spitting the information back out, without paying the copyright source.

    You could honestly say the same about most “teaching” that a student without a real comprehension of the subject does for another student. But ultimately, that’s beside the point. Because changing the wording, structure, and presentation is all that is necessary to avoid copyright violation. You cannot copyright the information. Only a specific expression of it.

    There’s no special exception for AI here. That’s how copyright works for you, me, the student, and the AI. And if you’re hoping that copyright is going to save you from the outcomes you’re worried about, it won’t.



  • If I understand correctly they are ruling you can by a book once, and redistribute the information to as many people you want without consequences. Aka 1 student should be able to buy a textbook and redistribute it to all other students for free. (Yet the rules only work for companies apparently, as the students would still be committing a crime)

    A student can absolutely buy a text book and then teach the other students the information in it for free. That’s not redistribution. Redistribution would mean making copies of the book to hand out. That’s illegal for people and companies.


  • It seems like a lot of people misunderstand copyright so let’s be clear: the answer is yes. You can absolutely digitize your books. You can rip your movies and store them on a home server and run them through compression algorithms.

    Copyright exists to prevent others from redistributing your work so as long as you’re doing all of that for personal use, the copyright owner has no say over what you do with it.

    You even have some degree of latitude to create and distribute transformative works with a violation only occurring when you distribute something pretty damn close to a copy of the original. Some perfectly legal examples: create a word cloud of a book, analyze the tone of news article to help you trade stocks, produce an image containing the most prominent color in every frame of a movie, or create a search index of the words found on all websites on the internet.

    You can absolutely do the same kinds of things an AI does with a work as a human.



  • Tariffs are a net negative. Always. The things produced will not be competitive on the global market, if they were, we’d already be making them. The higher prices always destroy more jobs than they create. Retaliatory tariffs destroy even more jobs. The higher prices drive down demand and make the working class consumer poorer. Always.

    There’s no economic upside to tariffs, over any time horizon. They create a small number of jobs in a specific sector at a very expensive cost. Some politicians might decide that the enormous economic cost is worth it for other reasons, but a net positive they are not.










  • When you use (read, view, listen to…) copyrighted material you’re subject to the licensing rules, no matter if it’s free (as in beer) or not.

    You’ve got that backwards. Copyright protects the owner’s right to distribution. Reading, viewing, listening to a work is never copyright infringement. Which is to say that making it publicly available is the owner exercising their rights.

    This means that quoting more than what’s considered fair use is a violation of the license, for instance. In practice a human would not be able to quote exactly a 1000 words document just on the first read but “AI” can, thus infringing one of the licensing clauses.

    Only on very specific circumstances, with some particular coaxing, can you get an AI to do this with certain works that are widely quoted throughout its training data. There may be some very small scale copyright violations that occur here but it’s largely a technical hurdle that will be overcome before long (i.e. wholesale regurgitation isn’t an actual goal of AI technology).

    Some licensing on copyrighted material is also explicitly forbidding to use the full content by automated systems (once they were web crawlers for search engines)

    Again, copyright doesn’t govern how you’re allowed to view a work. robots.txt is not a legally enforceable license. At best, the website owner may be able to restrict access via computer access abuse laws, but not copyright. And it would be completely irrelevant to the question of whether or not AI can train on non-internet data sets like books, movies, etc.




  • a much stronger one would be to simply note all of the works with a Creative Commons “No Derivatives” license in the training data, since it is hard to argue that the model checkpoint isn’t derived from the training data.

    Not really. First of all, creative commons strictly loosens the copyright restrictions on a work. The strongest license is actually no explicit license i.e. “All Rights Reserved.” No derivatives is already included under full, default, copyright.

    Second, derivative has a pretty strict legal definition. It’s not enough to say that the derived work was created using a protected work, or even that the derived work couldn’t exist without the protected work. Some examples: create a word cloud of your favorite book, analyze the tone of news article to help you trade stocks, produce an image containing the most prominent color in every frame of a movie, or create a search index of the words found on all websites on the internet. All of that is absolutely allowed under even the strictest of copyright protections.

    Statistical analysis of copyrighted materials, as in training AI, easily clears that same bar.



  • They do, though. They purchase data sets from people with licenses, use open source data sets, and/or scrape publicly available data themselves. Worst case they could download pirated data sets, but that’s copyright infringement committed by the entity distributing the data without the legal authority.

    Beyond that, copyright doesn’t protect the work from being used to create something else, as long as you’re not distributing significant portions of it. Movie and book reviewers won that legal battle long ago.