Luca 🔨

Wie das Werkzeug. Aber kaputt.

(Bitte keine lustigen Replys.)

Luca 🔨luca@det.social
2022-12-26

Es ist Zeit für mich zu gehen. Ich danke @janboehm, Unterhaltungsfernsehen Ehrenfeld und @stefan, dass ihr mich auf det.social gehostet habt. Es war ein tolles halbes Jahr.

Jetzt möchte ich mehr und dafür brauche ich meine eigene Instanz. 63k Tweets und 561 Toots habe ich bereits umgezogen. Followings und Follower kommen als nächstes.

Ihr findet mich ab sofort unter @luca@social.luca.run 👋 :det:

Luca 🔨luca@det.social
2022-12-25

I thought, importing Mastodon data would be easier, but it's not. Instead of getting the text in the same form it was posted, it's html and now I have to convert usernames back to the @<username>@<server> format before converting the rest to text with BeautifulSoup.

And I had to increase the delete limit as well because I noticed my misunderstandings too late. At least I had a list of the broken posts to easily delete them.

Luca 🔨luca@det.social
2022-12-25

What is this? I think, I will just use the API to move from instance to instance. #MastodonDataExport

Extracted archive of a mastodon export showing a deep nested folder structure to get to the media files.
Luca 🔨luca@det.social
2022-12-25

Now that Twitter works, let's try the same with Mastodon.

And then #Instagram, #Path and #Vine, Maybe #Google+ and #Facebook, but I will have to look at the archives first, if those should even be saved.

Path will be added as "only me". Once again, I would love to make it visible to a list of people, but #Mastodon isn't up to that task (I will probably do the same dance in a yet when I move to #GoToSocial.).

Mastodon settings showing started data export:
Dec 25, 2022, 15:28
Compiling your archive...
Luca 🔨luca@det.social
2022-12-25

@vyr works on extending #search: github.com/VyrCossont/mastodon

Would be nice to allow others to search my posts. I won't test it yet, but keep an eye on it.

Luca 🔨luca@det.social
2022-12-25

Alt texts work.

Screenshot of a screenshot of a Twitter list on a mastodon instance with alt text in the source code in the developer tools beside it: Screenshot of a Twitter List called Path friends.
Luca 🔨luca@det.social
2022-12-25

Something went wrong with the alt texts. They are shown as "#<ActionDispatch::Http::UploadedFile:0x0000151f32ec4b98>" instead of the text.

Oh. I put them into the files field instead of the data field. Let's try again.

Screenshot Firefox developer tools showing the broken alt text of an image as source code: 

<img src="https://social.luca.run/system/media_attachments/files/109/574/785/241/766/476/small/8137e2cfae25a576.jpg" srcset="https://social.luca.run/system/media_attachments/files/109/574/785/241/766/476/original/8137e2cfae25a576.jpg 1142w, https://social.luca.run/system/media_attachments/files/109/574/785/241/766/476/small/8137e2cfae25a576.jpg 592w" alt="#<ActionDispatch::Http::UploadedFile:0x0000151f32ec4b98>" title="#<ActionDispatch::Http::UploadedFile:0x0000151f32ec4b98>" style="object-position: 50% 50%;" sizes="263px">
Luca 🔨luca@det.social
2022-12-25

While I wait for the last 30k Tweets to import, let's see if I can mod the search to show results for all accounts and not just my own.

Luca 🔨luca@det.social
2022-12-25

Of my last 1994 Tweets with media attachments, 911 had alt texts.

Jupyter notebook cells with python code.

# get alt text for specific IDs
twitter_bearer_token = "xxx"
twitter_url = "https://api.twitter.com/2/tweets"
twitter_heeaders = {"Authorization": f"Bearer {twitter_bearer_token}"}
alt_texts = {}

def retrieve_alt_texts(tweet_ids):
    twitter_params = {'ids': ','.join(tweet_ids),
                    'tweet.fields': 'text,attachments,entities', 
                    'expansions' : 'attachments.media_keys',
                    'media.fields': 'alt_text'
                   }
    resp = requests.get(twitter_url, headers=twitter_heeaders, params=twitter_params)
    resp_json = resp.json()
    
    for media in resp_json['includes']['media']:
        if 'alt_text' in media:
            alt_texts[media['media_key']] = media['alt_text']

tweets_with_media = [tweet for tweet in tweets[50100:] if 'media' in tweet['entities']]
len(tweets_with_media)

output: 1994

tweet_ids = [str(tweets_with_media['id']) for tweets_with_media in tweets_with_media]
batches = [tweet_ids[idx:idx+100] for idx in range(0, len(tweet_ids), 100)]
len(batches)

output: 20
Luca 🔨luca@det.social
2022-12-25

I wanted to go the easy route and use a library for the Twitter API stuff. Turns out, I currently don't have the capacity to understand the design decisions of the libraries. I guess, I will use requests for that as well.

(It's funny because working with the Twitter API was a crucial part of my work for the last ten years.)

Luca 🔨luca@det.social
2022-12-25

Mrgn

Luca 🔨luca@det.social
2022-12-24

New error: "Mastodon::PrivateNetworkAddressError".

Wasn't able to find much about it.

Luca 🔨luca@det.social
2022-12-24

It's "media_ids[]" not "media_ids". Now* I need to fix all posts with media that were created without media.

*In the next few days.

Luca 🔨luca@det.social
2022-12-24

Next issue: I can't post image only posts. But I'm in my mobile, so I will just skip it for now.

Luca 🔨luca@det.social
2022-12-24

Surprisingly, there was only one bug in the script (sometimes Tweets have a "reply_to_screen_ma,e" field, but no "in_reply_to_status_id" field.

An issue I could have expected: I used a lot of twitpic and imgly in the early days. Those aren't in my archive. But I think I still have a backup of them somewhere. Will have to look into manually or automatically adding them through an edit.

Currently 20k of 70k imported.

screenshot mastodon profile stats showing 20k posts, 2 followings and 373 followersProgress bar in jupyter notebook at 41'%. 29456 of 50000 iterations. 2.55 per second.
Luca 🔨luca@det.social
2022-12-23

@anlomedad_real Unlisted posts are still distributes to followers. I don't think they would enjoy getting 70k posts within a few hours. And neither would their instance admins.

Luca 🔨luca@det.social
2022-12-23

@ixs Mastodon already supports scheduled posts. The change is only useful for backdating stuff. And I understand that there is much potential for abuse.

Luca 🔨luca@det.social
2022-12-23

It's 300 posts in 3 hours, but within 5 minutes. And why does it apply to admin accounts as well? I upgraded myself to 30k.

github.com/mastodon/mastodon/b

Luca 🔨luca@det.social
2022-12-23

Here are my incomplete notes:
github.com/lucahammer/fediport

I don't know yet, if I will put much more effort into it. It kinda works for what I want and I believe it will never become easy to use. A pull request to Mastodon would change that, but from what I have read, it's not wanted. I don't want to maintain a fork either. Maybe glitch-soc or hometown are interested. (But I am already running on empty. Bye.)

Luca 🔨luca@det.social
2022-12-23

Because alt-texts weren't added until 2016, I started with importing earlier posts and was quickly reminded that Mastodon has rate-limits as well. 300 posts per 5 minutes. 30 media uploads per 30 minutes.

Should I mod Mastodon further or slow down the import script?

Maybe just put what I have on Github and spend more time with my family.

Screenshot of a jupyter notebook with a progress bar stuck at 56%. It's red to indicate there was an error. 250 of 450 5.48 iterations per second.

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst