With the support of an entire group, a hacker was able to download nearly 57 TB of content posted on Parler, the American hard-right network, before it was taken offline. In this database are videos, photos and geolocation data. It could be of great help to the various ongoing investigations into the assault on Capitol Hill, posted on the app by many pro-Trump activists.
Talking is no longer online since Sunday January 10, after its host, Amazon Web Service, decided to break the contract between them. But the content of hard right social network American, however, can still be found online. And for good reason, a Twitter user, @donk_enby, archived 56.7 terabytes of content published on the platform.
According to Vice, this gigantic dataset contains 412 million files, including 150 million photos and over 1 million videos. That is the equivalent of 96 to 99% of all the publications of the social network of the hard right, according to various sources.
To succeed in this titanic backup work, the hacker was able to count on the necessary support of the Archive Team, a group of hackers and researchers whose objective is to save (on a voluntary basis) the data of dying sites. She had started her collecting work Jan.6, but had to pick up the pace considerably and start a real time trial when Amazon announced its intention to unplug Parler.
She shared it publicly – that’s when the Archive Team offered her help. The collective took charge of the cost of storage, and even created a tool for any Twitter user to use their bandwidth for downloading, says Vice. A few hours after the tool was deployed, it allowed a download speed of 50 GB per second.
Metadata in public files, a gold mine
This database only contains public content, which was accessible to any user of the platform. Passwords, private chats and other confidential information of users have not been compromised or at least not in this way. On the other hand, each photo and video comes with metadata, since Parler did not remove them from the files, unlike the main social networks. Concretely, when you take a photo with your smartphone, your device attaches contextual data to the image file: for example the time and date on which the photo was taken, as well as your geolocation at that time.
In other words, by analyzing the metadata of photos and videos published on Parler, anyone can trace the routes of users of the platform or identify their gatherings. Everything, hour by hour. It is therefore a gold mine for researchers, investigators and journalists interested in the assault by Trump supporters on the Capitol: the American media Gizmodo has already demonstrated it. in an article. It must be said that many Parler users involved in the attack filmed and photographed the events live. A not very cautious practice which made it possible to identify many of the protagonists of the assault.
Without basic protection, Parler was easy to copy
To copy the content of Speak, Vice points out @donk_enby only used an iPad that was “jailbroken” (on which some of the safeguards put in place by Apple have been removed), and reverse-engineered software called Ghidra . An inexpensive material, and easy to obtain.
The hacker took advantage of a bug ” absurdly basic »In the architecture of Parler, as Wired. The management of the URLs (in other words, the addresses) of the publications by the social network was catastrophic, and made it possible to easily “scrap” all of its content.
Take the URL of a Twitter message: it is constructed in the form: twitter.com/Username / type of message / a long string of random numbers. On Speak, the URL contained only the encrypted component. Worse, the number sequence was not random, but sequential. Concretely, if a post from Karen sent at 9:21 p.m. contained the number 23134 in its URL, the photo of Chad sent 10 seconds later had the number 23135 in its URL. The URLs of the publications were therefore followed in chronological order.
Talking was not detecting automated scripts
From this observation, automating the collection of all the data was easy with the right skills. A simple script coded in Python allowed each post to be downloaded one after the other by just increasing the number in the URL by 1. Due to the architecture defect, the robot had no risk of stumbling upon a non-existent URL or of missing an existing URL.
This is the second major weakness of Parler.
Today, the vast majority of sites of its kind have a script detection system. These defenses can limit the number of connections if they detect suspicious activity, such as visiting millions of pages in just a few hours from a single device. But of course, Parler had not deployed such protection, yet basic and easy to access, and therefore did not limit the copying of all of its content. A boon for investigators, both private and public, who will know how to use this database.
*The article has been translated based on the content of Numerama by cyberguerre.numerama.com. If there is any problem regarding the content, copyright, please leave a report below the article. We will try to process as quickly as possible to protect the rights of the author. Thank you very much!
*We just want readers to access information more quickly and easily with other multilingual content, instead of information only available in a certain language.
*We always respect the copyright of the content of the author and always include the original link of the source article.If the author disagrees, just leave the report below the article, the article will be edited or deleted at the request of the author. Thanks very much! Best regards!