fiat_lux 🆕 🏠

fiat_lux 🆕 🏠@lemmy.zip · 3 days ago

If you know you’re alone at home and then hear voices, that might be one way. There are ways to distinguish the presence of people beyond sight.
Blindness is much more than total blindness, which only describes a minority of blind people. There are different definitions, but the World Health Organization puts the definition as less than 3/60 or a visual field of less than 10 degrees in the better-seeing eye. That basically means that if you need to be more than 20 times closer to an object to be able to see the same level of detail, or you have almost no peripheral vision, you qualify.

fiat_lux 🆕 🏠@lemmy.zip · 7 days ago

Trauma responses are hard. I think it’s great you’re actively working on it and are conscious of your own biases, that’s huge. Good luck!

fiat_lux 🆕 🏠@lemmy.zip · 10 days ago

Melted like butter on piping hot toast.

fiat_lux 🆕 🏠@lemmy.zip · 11 days ago

This list is weird, aside from the length. They must be using a very greedy regexp for this many instances to have their names partially censored.

The text “buds” has been censored, all the instances using the TLD “university” have had “univer” removed, and the word “hangout” is also gone. “Shitpisscum” made it through, so it can’t just be about slightly naughty words. Also annihilation.social is listed 3 times for some reason.

Are these slurs in a culture I’m not familiar with? Does piefed do this everywhere?

fiat_lux 🆕 🏠@lemmy.zip · 15 days ago

I’m happy for both of you, this is adorable.

I’m assuming the red part detaches easily from the rest? I hope your bumper crop means this will never be necessary, but should you find yourself in a pinch, the search term “Sequin Horsehair Crinoline Tube” might help you in the future.

fiat_lux 🆕 🏠@lemmy.zip · 16 days ago

We can see that it’s solved by the fact that AI models continue to get better despite an increasing amount of AI-generated data being present in the world that training data is being drawn from.

Even if it logically followed that model improvement means model collapse is a solved problem, which it absolutely doesn’t, even the premise that models are improving to a significant degree is up for debate.

MMLU pro benchmark over time line graph showing plateauing values Massive Multitask Language Understanding (MMLU) benchmark vs time 07-2023 to 01-2026

A lot of people really want to believe that AI is going to just “go away” somehow, and this notion of model collapse is a convenient way to support that belief

Model collapse may for some people be an argument used to support a hope that AI will go away, but the reality of that hope does not alter the validity of the model collapse problem.

You can tell it’s not a solved problem because researchers are still trying to quantify the risk and severity of collapse - as you can see even just from the abstracts in the links I provided.

Some choice excerpts from the abstracts, for those who don’t want to click the links:

Our results show that even the smallest fraction of synthetic data (e.g., as little as 1% of the total training dataset) can still lead to model collapse

…we establish … that collapse can be avoided even as the fraction of real data vanishes. On the other hand, we prove that some assumptions … are indeed necessary: Without them, model collapse can occur arbitrarily quickly, even when the original data is still present in the training set.

fiat_lux 🆕 🏠@lemmy.zip · 16 days ago

It can’t only be from data from previous generations, even if the initial demonstration used that, because that would mean a single piece of human-generated text is sufficient to avoid collapse.

The loss of data from generation to generation is one way model collapse can occur, but it’s only one way. The actual issues that cause collapse are replication of errors and increasing data homogeneity. In a world where an unknown quantity of new data is AI generated, it is not possible to ensure only a certain quantity is used as future training data.

Additionally, as new human generated content is based on the information provided by AI, even if not used intentionally in the construction of the text itself, the error replication and data diversity issues cross over from being only an AI-generated content problem to an all content problem. You can see examples of this happening now in the media where a journalist relies on AI output to fact check, and then the article with the error gets republished by other media outlets.

Real AI training methods may stave off some model collapse, if we ignore existing issues around the cultural homogeneity of training data from across all time periods, or assume the models are sufficiently weighted to mitigate those issues, but it’s by no means settled that collapse is a non-problem.

You’ve mentioned using data mixing to prevent collapse, but some of the research suggests that even iterative mixing isn’t sufficient dependent on the quantities of real vs synthetic data. Strong Model Collapse (2024), Dohmatob, Feng, Subramonian, Kempe goes into that, and since then there’s been When Models Don’t Collapse: On the Consistency of Iterative MLE (2025) Barzilai, Shamir which presents one theoretical case where collapse won’t occur provided some assumptions hold, but the math is beyond me. They also note multiple situations where near-instant collapse can occur.

How much data poisoning might affect any of that is not at all clear, it would need to be in sufficient quantity for whatever model to have an effect, but it certainly wouldn’t help. The recent Bixonimania scandal suggests it’s feasible.

fiat_lux 🆕 🏠@lemmy.zip · 16 days ago

“model collapse” was demonstrated by repeatedly training generation after generation of models on the output of previous generations

the best models these days are trained largely on synthetic data - data that’s been pre-processed by other AIs to turn it into stuff that makes for better training material

You can prevent model collapse simply by enriching the training data with good data - stuff that is already archived, that can’t be “contaminated."

This feels like an odd juxtaposition.

If model collapse can be avoided by enriching with uncontaminated data, and model collapse comes from using training data generated by previous generations, doesn’t that imply that:

Either the best models are headed towards model collapse, or,
Models can’t be updated because modern data isn’t usable?

fiat_lux 🆕 🏠@lemmy.zip · 17 days ago

Congratulations. There’s something about convincing a cat you’re a source of enjoyment that is ridiculously rewarding. You earned those purrs.

I hope he remembers that he enjoyed this experience so you can both keep enjoying future purring!

fiat_lux 🆕 🏠@lemmy.zip · 17 days ago

Panel 3 and 4 aren’t quite right.

The guy in Panel 3 didn’t just remix it, he cherry picked the parts that would be most likely to rank for either a short or long tail keyword strategy depending on the size and business of his client or employer.

And that guy doesn’t have his paper taken away in Panel 4. He’s feeding as many papers as he can to the AI which are tailored for “Answer Engine Optimization” or “Generative Engine Optimization” (they haven’t settled on a catchy name yet for what is largely the same thing, even if some claim they’re different).

The techniques have changed slightly but SEO has been a filthy game for much longer than AI. Google made sure of that with their auction house, “featured snippet” sections and backlink authority ranking systems.

fiat_lux 🆕 🏠@lemmy.zip · 17 days ago

I laughed when the song kicked in after the intro. Your description of the whole thing was completely accurate.

That drummer was definitely way too good, and that’s probably why he featured more heavily in the clip than I think I’ve ever seen a drummer feature. I hope he’s doing something fun now.