Discontinuing Bibliogram - cadence's weblog (personal blog)

Before we start: If Bibliogram has been helpful to you, please consider making a donation! Donations help me pay for servers on my current and future projects, not to mention the time I put into writing code.

BGM: The Mayor's Lament

the short version

Instagram is really annoying, and I've given enough, and I don't want to deal with it anymore. Bibliogram will remain mostly broken unless somebody steps up to fix it. The main instance, bibliogram.art, will shut down, unless somebody wants to take it over. If you want to try fixing Bibliogram, you should read through the rest of this post for helpful tips about the current situation.

The origin

I started Bibliogram in early 2020 because Instagram was pissing me off. This may sound surprising, but I have never really used Instagram before starting Bibliogram, and I have never had an account, so I didn't personally care much about looking at Instagram posts. The story is that I encountered Instagram links sent by friends too frequently for my taste and wanted to make a workaround.

It was annoying to see a login wall in the browser when the server didn't try to stop you from accessing the data at all. So Bibliogram accessed the server and put the posts into a friendlier page layout and that was it.

A much-requested feature I added early on was RSS feeds. This ended up getting quickly turned off for the main instance, because RSS usage was dwarfing interactive usage. Many of these feeds had been added to people's readers and forgotten about. Even today I still receive a decent number of forgotten requests for feeds — these forgotten feeds haven't returned useful data for more than two years. Feed requests aren't free. Bibliogram needs to make an outgoing web request, wait for it, and convert the response data. This also uses up a piece of Bibliogram's rate limit to Instagram, even if nobody's there to see the feed that Bibliogram generates.

The rate limit

Instagram rate-limits access to its servers to stop people from doing the exact thing I'm doing. I'll try to document here the phases that I went through, but I might have forgotten some of those phases.

Don't panic: I am not documenting any of the currently working workarounds, only the past ones which are useless except for historical interest.

I'll be using rkrkrk as a sample username.

Before my time: rhx_gis

There's a parameter in profile pages called rhx_gis and your application needed to remember this parameter so that it can use it in subsequent requests. If you use the wrong rhx_gis, you're locked out. Instagram used this in the past, but didn't require it when I started working on Bibliogram in January 2020.

January 2020: main profile page

After 100 or so requests to profile pages like instagram.com/rkrkrk they'll stop returning a useful response until you cool down. Timeline continuations weren't limited, but you could only access the timeline if you knew the internal user ID, and you could only get that ID from the profile page. So if you'd accessed a profile page in the past, you could store the ID and you only needed to access the timeline continuation from then on, which wasn't limited. Problem solved.

(Currently, the limit on requests to profile pages is way less than 100. It's more like 3. I don't remember when they lowered it.)

June 2020: profile page blocked for servers

You can now only access a profile page if you're in somebody's house in real life — so not if you're a server on the internet. This era was documented here, and it was the first time people saw the notice that an instance was blocked. This limitation means Bibliogram could only load profiles it already knew the user IDs of. I developed a few ways of working around this:

For finding user IDs, the assistants feature was added. Trusted people could run the assistant program at home, which would collect user IDs (and nothing else) on behalf of Bibliogram, and Bibliogram instances could share between each other all the user IDs they already knew about.

Similarly, there was the import script, which copied all user ID mappings from one instance to another. It would output numbers like Imported 492381 entries (37161 new, 138 overwritten, 455082 skipped) which means that 37,161 previously unknown users can now be looked up on that instance thanks to sharing IDs.

Finally, there was a browser userscript people could install to let them access a specific user ID.

These bypass methods are all part of Bibliogram's code still, but they're not used any more because they're totally useless.

July 2020: /feed/ bypass

I'm Cool. I was messing around with google search and entered the query site:instagram.com just to see what would come up. Curiously, I found a URL like instagram.com/rkrkrk/feed/ which is just like a regular profile URL but with /feed/ on the end. I clicked it, but the page didn't load properly. I checked out the page source, and all the data needed for Bibliogram to work is in there. And then I decided to check it out on my server, and it wasn't blocked at all. In conclusion, Instagram's internal code is absolutely dogshit. You'll see more instances of their dogshit code as we continue.

I put /feed/ into the Bibliogram code and all is well. Total bypass.

December 2020: /feed/ blocked for some servers

Here's the update post for this one. This is INSTAGRAM_BLOCK_TYPE_DECEMBER in the code. /feed/ requests are now blocked for some servers - but not all. I moved Bibliogram to Iperweb, the first suggestion I got, and it works again. (Iperweb has poor value for money servers though, I wouldn't recommend you use that particular company.) Requests now work most of the time. My memory is a bit fuzzy on this one.

Late January 2021: graphql mostly blocked

Each graphql request has a different set of rules based on the matrix of whether you're at home or a cloud server or accessing via Tor, and which query_hash you're accessing. There's about 4 different behaviours that are fixed for a particular location-endpoint combination, but are seemingly assigned randomly. Why? Probably because its code is dogshit. Anyway, I route through Tor but only for the ones that work through Tor.

After fixing this, I guess nothing really happened for a while? Bibliogram was in a state of mostly-working. It worked the best I could make it work while always scraping from logged-out resources. If you're logged into an account, it's another matter entirely. A whole world of endpoints opens up to you, especially ones used by the official app. However, if you make a wrong move, Instagram will not hesitate to shut down the account for supposedly suspicious activity, and creating an account also means agreeing to the Instagram terms of service, which I do not wish to do.

July 2022: overhaul

Instagram radically changed the way it internally arranges the data in its pages, requiring new ways to make requests and new ways to parse through it.

For the profile page, there are 4 different ways that the data might be provided, and your extractor has to handle all 4. It seems to switch which format is being used every few hours. If it's not the right time, then the exractor you're using will fail. Here are the formats:

iweb, pass in the username, get user object json. rate limit maybe 50 or 100 per hour?
instagram.com/rkrkrk/?__a=1 ajax after original page load, tiny tiny tiny rate limit.
instagram.com/rkrkrk and extract _sharedData, similarly tiny tiny tiny rate limit. You can try the /feed/ workaround, which used to give more requests, but this appears to finally be patched now. /feed/ might only work from specific classes of IP addresses.
instagram.com/rkrkrk or /feed/ and extract PolarisQueryPreloaderCache.

While it is still possible to write code to handle these methods and switch between them, some of them are rate limited too heavily to make Bibliogram viable at those times. Tor seems to be restricted further, though not completely.

Bots accessing Bibliogram

From as soon as mid-2020 I began to deal with a serious problem of poorly coded bots accessing Bibliogram and using up its rate limit. These bots were created without regard for whether their requests succeed and they don't acknowledge my requests to slow down. They were designed specifically to scrape data from Bibliogram, and the owners were apparently too lazy to run their own instance of Bibliogram, or to contact me asking for help setting one up. The bots are a problem because they appear to be unmonitored, and they're using up the rate limit that would be better if it were helping real people. In August 2020 I blocked various proxy networks from accessing my site, then dealt with the really bad offenders on a case-by-case basis. Here's the list that's currently being used.

Later on, I'd create a system where Bibliogram dynamically applies its own rate-limiting system to anyone accessing it.

I think a mistake I made here was the faceless approach to blocking people who are being a problem. They treat being blocked as a puzzle to overcome, and Bibliogram as just another faceless website, rather than something being run by real people who want to help. In the future I'd approach this differently by giving bots a custom error message that appeals to the operators' humanity and asks them to contact me rather than just trying to work around the block. Kind of like the anti-piracy screen in Just Shapes And Beats.

Why is Bibliogram discontinued?

The simultaneous crackdown on /feed/ and Tor and with needing to write new code to scrape the page is too much for me to bother with, especially when I am working on it in my spare time and have no personal interest or incentive.

What does discontinued mean?

You can't look at profiles. You can still look at individual posts, but if this breaks in the future, I probably won't fix it.

The main instance, bibliogram.art, will shut down unless somebody offers to take over running it.

Can it be revived?

Bibliogram is open source, and it is still excellent code to build on top of, since it has the interface design, the post models, and the structures set up to perform several workarounds.

All that is needed to make it work is a function in collectors.js or body.js to access the data in the new format.

More Instagram workarounds are definitely possible due to its code still being dogshit, but I don't have the energy to look for more workarounds myself.

Yes, Bibliogram can be revived. But I won't be the one to make it happen.

If you decide to take up the task, I have three simple but important requests.

Get in touch. Reach out to me via email or Matrix or something. Maybe even join the existing chatroom. Let me know that my efforts weren't in vain. I might even be able to help you work with my code. I'm a generous person and I would really love to see Bibliogram revived!
Please immediately edit the homepage and the readme to not mention me or my repos. I don't want people to assume that I'm still in charge of the project. Either put your own links in those places like this, or cut them out entirely.
I would suggest changing the name. If you're stuck for names, "Bibliograph" has been bouncing around in my head for a while, try that as a name. I'm okay with you keeping the name the same if you don't want to change it.

Shoutouts

Huge huge massive thanks to the volunteers that translated Bibliogram's interface to their language, and I'm sorry that I let down your hard work a bit by discontinuing Bibliogram. I've credited you all again here to show just how important you are!

Esmail, for the Arabic (العربية) translation
Plamen, for the Bulgarian (Български) translation
Philipp Beckers, for the German (Deutsch) translation
tagomago, for the Spanish (Español) translation
Mostafa, for the Persian (فارسي) translation and their patience helping me understand bidirectional text
bopol, for the French (français) translation
XoseM, for the Galician (Galego) translation
Musthafa, for the Indonesian (Bahasa Indonesia) translation
Saverio Morelli, for the Italian (Italiano) translation
learnpastsole, for the Malay (Bahasa Melayu) translation
sech1p, for the Polish (Polski) translation
tmpod, for the Portuguese (Português) translation
TotalCaesar, for the Russian (Русский) translation
Ahmet, for the Turkish (Türkçe) translation
Суспільне, for the Ukrainian (Українська) translation

Thanks to the Bibliogram chatroom for keeping me company and coming along on the ride with me.

Thanks for Austin Huang for creating a real alternative to the Instagram app. May Barinsta rest in peace.

For the people in #87806986 on /g/, you are fucking hilarious. Maybe the best way to cheer somebody up really is by showing them a clown.

What's next?

Something very exciting is coming next. Stay tuned for my next project: Announcing BreezeWiki.

See ya!

— Cadence