cross-posted from: https://lemmy.world/post/28011368

So I started by doing research and by research I mean watching two videos on YouTube about basic recommendation algorithms.

I did watch a 30 minute video on Netflix software engineer talking about using machine learning and complex matrix and these bandit style machine learning algorithms to recommend TV shows/movies really the base conclusion is that there’s a 50% improve over doing all these complex things over their baseline measurement. Baseline will mean traditional pre neral network based algorithms.

The way I interpret it is that basics take you a long way and all the basics are is just organizing any peertube video into a vector and people watching into a vector as well. The idea would be that which videos are more similar to each other would be good recommendations if a watcher watch one of those videos, or if they didn’t like it don’t recommend any videos similar to that. Once these videos get vectorized then the watcher’s vector can be updated in a basic way more watch time mean its more of what they want and a like would give it a boost, or comment could boost multiplier.

I’d say that the watcher’s vector can be stored locally while videos vector is public. It will be a while to figure out a function/algorithm to adapt to watcher. Does the watcher taste change, do they multiple things , should the algorithm adapt fast or slow as new videos come in, novelty/consistency. I don’t expect this problem to be solved anytime soon , but the recommendation algorithm will simply evolve and split as to have their own unique benefits and drawbacks.

To start foundation is to start a standard for video vector. Video can be quantified and qualified. There’s only a few measurable quantities like video length and existing views. Qualitative attribute of videos like "is it a cooking tutorial, "is it a sports commentary ", or “is it a Livestream VOD” are going to require that the vector be stored in a format that can adapt to the expanding number of dimensions the quality a peertube video can have. Next issue is measure qualities to an actual number is something sports related or sports adjacent would a 1 mean yes or would a 0 mean neutral/agnostic or no.

The last simplist issue would be communicated the algorithm that updates the watcher’s vector since that can be done via updates from peertube server or GitHub

  • 🍪CRUMBGRABBER🍪@lemm.ee
    link
    fedilink
    English
    arrow-up
    3
    ·
    7 months ago

    We shouldn’t fear these algorithms, the key though is everything has to be under the control of the user. A simple way is just tags, which are widely used almost everywhere. If the user has their panel and can block tags or words or give a rating from +10 to -10 for any keyword or tag, you can accomplish a hell of a lot with no black box fuckery.

    • Cattail@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      7 months ago

      I like that idea for a stupid simple algorithm. ironically I plan for there to be like a Varity of algorithms both that are user only and a aggregate. really im trying to pin down a standardized video vector that can describe any video to any level of detail

      • 🍪CRUMBGRABBER🍪@lemm.ee
        link
        fedilink
        English
        arrow-up
        1
        ·
        7 months ago

        Another thing I think is fine is “because you watched ____” and suggestions based on that. Like we see a video that the user watched the whole video or liked it, and then we find similar videos to that I don’t think its that invasive and may be helpful.

        As long as its transparent and the user has control to use it or not I think thats the key. The fediverse however is enormously privacy and corporate sensitive, so I can see that it is a tight line to walk here.

        • Cattail@lemmy.worldOP
          link
          fedilink
          English
          arrow-up
          2
          ·
          7 months ago

          really im thinking about what data is okay to share and what data should be kept to the user. basically I determined that description of the video is only thing that can be public and the people/bot describing it okay to share (like associating their channel to a description they make to specific video) and the watchers device can collect video meta data to find suggestions

  • asudox@lemmy.asudox.dev
    link
    fedilink
    English
    arrow-up
    2
    ·
    7 months ago

    Looks cool. It’s pretty similar to how I thought of doing this as well. Do you have a prototype? You might be able to adjust the algorithm if you receive feedback from some PeerTube veterans.

    • Cattail@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      3
      ·
      7 months ago

      i havent made anything yet i just wanted to articulate that a basic algorythm can be done ethically where either instance/watcher/fediverse in general can make a vector to define a video and that could be shared via activity hub and the user can have a vector for themselves and even their own algorithm to sift through videos.

      im just starting and right now i have to figure out how to format the video vector do i want .json .csv .xml

      • asudox@lemmy.asudox.dev
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        7 months ago

        Why would the video vectors be stored and calculated on the server though, let alone be federated? Let alone stored on the instance? These things can be calculated instantly on the device.

        • Cattail@lemmy.worldOP
          link
          fedilink
          English
          arrow-up
          1
          ·
          7 months ago

          be better to store the video vector on an instance so that watchers can retrieve, just logistics. video vector (element) can be calculated anywhere just communicated to an instance, the idea is to be flexible. activityhub protocol has made the decisions easy the video vector has to be a .json element in a video json data.

          it would be better to store the results of a calculation to avoid repeated calculations. im looking into music classifications, and like the entire video can be sent to parse to see if its music or not, the tempo, genra, id assume that would be fairly costly to calculate or instance can send the video vector that states all that information

  • Twoafros@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    1
    ·
    7 months ago

    I’m not technical enough to know what the details mean but Im excited of the idea of a simple algorithm for peertube

  • manicdave@feddit.uk
    link
    fedilink
    English
    arrow-up
    1
    ·
    7 months ago

    What would these deltas look like? It’d be hard to anonymise and protect from abuse?

    Personally I’d be happy to share my likes and watch times, but I know some people worry about that.

    • Cattail@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      7 months ago

      im not at the aggregating data stage but you can just put a random id on a data set