Tubes and Bubbles

Topological confinement of recommendations on YouTube

(image source, courtesy of DuckDuckGo)

Supplemental material to the article published in PLOS ONE by Camille Roth, Antoine Mazieres and Telmo Menezes

Made in Berlin at Centre Marc Bloch (CNRS / Humboldt) | Computational Social Science teamContact


In order to anonymize the data, YouTube video and channel IDs have been mapped to integers that are consistent across all datasets. See the paper for a detailed description of the methodology.

  • We considered ~ 650 YouTube videos as seeds.
  • Half are related to the 2019 European Parliement election, the other half were picked based on their popularity on Reddit and Wikipedia. The file ./seeds.csv maps the video IDs to their sample origin.

  • We gathered YouTube's suggestions for these seeds, every 10 minutes, at least 2000 times (~ 15 days).
  • Each line of ./long_crawl.csv logs the video ID of the seed, followed by the ordered list of suggestions found. This data reveals a clear plateau of highly frequent suggestions.

  • For each seed, we recursively crawled frequent suggestions up to depth 3.
  • Each line in ./recursice-crawls/{videoID}.csv logs the node's video ID, the video ID of a suggestion belonging to the plateau, and the depth at which it was suggested.

  • Lastly, for each visited page, we gathered a few metadata about the video.
  • Each line in ./video_metadata.csv logs: video ID, channel ID, nb. of subscribers, nb. of views, category ID (see ./categories_ids.csv), nb. of likes, nb. of dislikes, age of the video (in seconds).