@VoterFrog

VoterFrog@lemmy.world · 16 days ago

The language model isn’t teaching anything it is changing the wording of something and spitting it back out. And in some cases, not changing the wording at all, just spitting the information back out, without paying the copyright source.

You could honestly say the same about most “teaching” that a student without a real comprehension of the subject does for another student. But ultimately, that’s beside the point. Because changing the wording, structure, and presentation is all that is necessary to avoid copyright violation. You cannot copyright the information. Only a specific expression of it.

There’s no special exception for AI here. That’s how copyright works for you, me, the student, and the AI. And if you’re hoping that copyright is going to save you from the outcomes you’re worried about, it won’t.

VoterFrog@lemmy.world · 17 days ago

Makes sense to me. Search indices tend to store large amounts of copyrighted material yet they don’t violate copyright. What matters is whether or not you’re redistributing illegal copies of the material.

VoterFrog@lemmy.world · 17 days ago

If I understand correctly they are ruling you can by a book once, and redistribute the information to as many people you want without consequences. Aka 1 student should be able to buy a textbook and redistribute it to all other students for free. (Yet the rules only work for companies apparently, as the students would still be committing a crime)

A student can absolutely buy a text book and then teach the other students the information in it for free. That’s not redistribution. Redistribution would mean making copies of the book to hand out. That’s illegal for people and companies.

VoterFrog@lemmy.world · edit-2 17 days ago

It seems like a lot of people misunderstand copyright so let’s be clear: the answer is yes. You can absolutely digitize your books. You can rip your movies and store them on a home server and run them through compression algorithms.

Copyright exists to prevent others from redistributing your work so as long as you’re doing all of that for personal use, the copyright owner has no say over what you do with it.

You even have some degree of latitude to create and distribute transformative works with a violation only occurring when you distribute something pretty damn close to a copy of the original. Some perfectly legal examples: create a word cloud of a book, analyze the tone of news article to help you trade stocks, produce an image containing the most prominent color in every frame of a movie, or create a search index of the words found on all websites on the internet.

You can absolutely do the same kinds of things an AI does with a work as a human.

VoterFrog@lemmy.world · 5 months ago

Wikipedia has a whole list of citations on this very sentence lol.

There is near unanimous consensus among economists that tariffs are self-defeating and have a negative effect on economic growth and economic welfare

https://en.m.wikipedia.org/wiki/Tariff

VoterFrog@lemmy.world · 5 months ago

Tariffs are a net negative. Always. The things produced will not be competitive on the global market, if they were, we’d already be making them. The higher prices always destroy more jobs than they create. Retaliatory tariffs destroy even more jobs. The higher prices drive down demand and make the working class consumer poorer. Always.

There’s no economic upside to tariffs, over any time horizon. They create a small number of jobs in a specific sector at a very expensive cost. Some politicians might decide that the enormous economic cost is worth it for other reasons, but a net positive they are not.

VoterFrog@lemmy.world · 6 months ago

I mean, Agile doesn’t really demand that you do or don’t use tickets. You can definitely use tickets without scrum.

VoterFrog@lemmy.world · 7 months ago

ITT: A bunch of people who have never heard of information theory suddenly have very strong feelings about it.

VoterFrog@lemmy.world · 7 months ago

Models are not improving? Since when? Last week? Newer models have been scoring higher and higher in both objective and subjective blind tests consistently. This sounds like the kind of delusional anti-AI shit that the OP was talking about. I mean, holy shit, to try to pass off “models aren’t improving” with a straight face.

VoterFrog@lemmy.world · 7 months ago

Seconded, though I would advise getting the DLC after completing the main game.

VoterFrog@lemmy.world · 8 months ago

Is enshittification the scummiest thing you can think of? While other multinationals are paying for goon squads that kill people in other countries? While banks reorder daily transactions from largest to smallest so they can charge more overdraft fees, literally stealing from poor people? Even if enshittification is literally your biggest problem, you’d have to be living under a rock to think Google’s products are the most enshitified of all the garbage out there. You’ve never heard of anything from Meta? Amazon? Netflix? Microsoft?

VoterFrog@lemmy.world · 8 months ago

I don’t know man. There’s a lot shittier business practices out there than paying to be the default search engine - which is laughably easy to change on any browser. Like marketplaces and services that pay to be exclusive sources of content and then use the fact that they’re the only source for most content to force extortionate deals on content creators and enshitify every aspect of the end user experience. Just to name one.

VoterFrog@lemmy.world · 9 months ago

This is not the same thing at all. Trump instituted a zero tolerance policy, separating any family caught crossing illegally with the stated intent to dissuade families from making the trip.

Normally (including under Biden) the government separates children from suspected human traffickers or members of gangs that engage in trafficking. This is not to deter families. It’s to protect children - sending a child back to Mexico with a human trafficker is an abhorrent thing to do.

Stop carrying water for Trump.

VoterFrog@lemmy.world · 10 months ago

No mention of Gemini in their blog post on sge And their AI principles doc says

We acknowledge that large language models (LLMs) like those that power generative AI in Search have the potential to generate responses that seem to reflect opinions or emotions, since they have been trained on language that people use to reflect the human experience. We intentionally trained the models that power SGE to refrain from reflecting a persona. It is not designed to respond in the first person, for example, and we fine-tuned the model to provide objective, neutral responses that are corroborated with web results.

So a custom model.

VoterFrog@lemmy.world · edit-2 10 months ago

When you use (read, view, listen to…) copyrighted material you’re subject to the licensing rules, no matter if it’s free (as in beer) or not.

You’ve got that backwards. Copyright protects the owner’s right to distribution. Reading, viewing, listening to a work is never copyright infringement. Which is to say that making it publicly available is the owner exercising their rights.

This means that quoting more than what’s considered fair use is a violation of the license, for instance. In practice a human would not be able to quote exactly a 1000 words document just on the first read but “AI” can, thus infringing one of the licensing clauses.

Only on very specific circumstances, with some particular coaxing, can you get an AI to do this with certain works that are widely quoted throughout its training data. There may be some very small scale copyright violations that occur here but it’s largely a technical hurdle that will be overcome before long (i.e. wholesale regurgitation isn’t an actual goal of AI technology).

Some licensing on copyrighted material is also explicitly forbidding to use the full content by automated systems (once they were web crawlers for search engines)

Again, copyright doesn’t govern how you’re allowed to view a work. robots.txt is not a legally enforceable license. At best, the website owner may be able to restrict access via computer access abuse laws, but not copyright. And it would be completely irrelevant to the question of whether or not AI can train on non-internet data sets like books, movies, etc.

VoterFrog@lemmy.world · 10 months ago

It wasn’t Gemini, but the AI generated suggestions added to the top of Google search. But that AI was specifically trained to regurgitate and reference direct from websites, in an effort to minimize the amount of hallucinated answers.

VoterFrog@lemmy.world · 10 months ago

Point is that accessing a website with an adblocker has never been considered a copyright violation.

VoterFrog@lemmy.world · edit-2 10 months ago

a much stronger one would be to simply note all of the works with a Creative Commons “No Derivatives” license in the training data, since it is hard to argue that the model checkpoint isn’t derived from the training data.

Not really. First of all, creative commons strictly loosens the copyright restrictions on a work. The strongest license is actually no explicit license i.e. “All Rights Reserved.” No derivatives is already included under full, default, copyright.

Second, derivative has a pretty strict legal definition. It’s not enough to say that the derived work was created using a protected work, or even that the derived work couldn’t exist without the protected work. Some examples: create a word cloud of your favorite book, analyze the tone of news article to help you trade stocks, produce an image containing the most prominent color in every frame of a movie, or create a search index of the words found on all websites on the internet. All of that is absolutely allowed under even the strictest of copyright protections.

Statistical analysis of copyrighted materials, as in training AI, easily clears that same bar.

VoterFrog@lemmy.world · 10 months ago

We’re not just doing this for the money.

We’re doing it for a shitload of money!

VoterFrog@lemmy.world · edit-2 10 months ago

They do, though. They purchase data sets from people with licenses, use open source data sets, and/or scrape publicly available data themselves. Worst case they could download pirated data sets, but that’s copyright infringement committed by the entity distributing the data without the legal authority.

Beyond that, copyright doesn’t protect the work from being used to create something else, as long as you’re not distributing significant portions of it. Movie and book reviewers won that legal battle long ago.