hiina 54a3eab511 fix typos and a few boogs, link git

2025-04-11 18:05:09 -06:00

3.2 KiB

Raw Blame History

title	toc
/vrg/ Archive	false

/vrg/ Archive

Welcome to the archive of Virtual Reality General threads from 4chan's /vg/ board.

While the b4k archive also has the /vg/ archived, it's slow and is missing data before ~august 2019. Thanks to an anon from Bibliotheca Anonoma, I got a copy of the text archive that goes back all the way to the first thread in ~2016. And thanks to the industrial revolution and its consequences, you can actually query the entire archive pretty efficiently in your browser.

I don't have thumbnails or images (yet), but I'm working on it. Until then, enjoy the data.

All queries in your browser, which means you'll download a fair amount (~100MB) of data. So probably don't browse this on your phone.

Thread Browser

Browse old threads in a somewhat faithful format.

Substring Search

See how freqently certain substrings occur in the posts over time, a la google's ngram thing.

Full-text Search

Search posts with an inverted index. Freakishly fast.

const archive_href = FileAttachment("data/vrgarchive.parquet").href;

FAQ

wtf is this

it's an archive of /vrg/ with a bunch of javascript so you can query it efficiently in your browser.

Where did you get the archive data?

Data after august 2019 is scraped from b4k, and the older data is from an anon on the Bibliotheca Anonoma matrix channel who happened to have a private archiver. Thanks anon for uploading it for me.

Where are the images?

Don't have them yet, but I'm working on it. 2 more weeks.

Can I get the raw data?

Yeah, download the ${html<a href=${archive_href}>vrgarchive.parquet</a>} file. It's not quite raw as from the 4chan API, but it is easier to query and it's only ~90MB or so.

How is the data so small?

The data compresses extremely well with ZSTD and parquet. The uncompresed data is ~1.5GB, but I guess after all these years we've only posted ~80MB of insightful, original text.

How does it work?

This site uses Observable Framework, which includes a DuckDB wasm build, which queries the archive as parquet files. It's kind of horrifying yeah but also cool.

https://git.vrg.party/hiina/vrg-archive has the source if you want to stare that the sql.

I don't have any scraper code uploaded yet, but full disclosure: it's all (almost) one-shot python slop by gemini 2.5 pro, so you might as well ask "I want to scrape a fuuka-based archiver for a single thread" and have it slop it out for you yourself.

Can you add X feature?

Maybe, post in the thread about it. If you don't want to wait, you can also just download the raw data and query it yourself, with duckDB or whatever.

archives are bad

Yeah, I'm kind of ambivalent, but I have autism for data visualization and awful javascript frameworks, so I did it anyway.

How can I contact you?

Post in the thread, I'll see it.

3.2 KiB Raw Blame History