Rules for the sustainability of LinuxFr.org accounts, personal data and one-year effect

Rules for the sustainability of LinuxFr.org accounts, personal data and one-year effect
Rules for the sustainability of LinuxFr.org accounts, personal data and one-year effect

In February 2023, we announced the implementation of a retention period for personal data (DCP) on LinuxFr.org, with from June 28, 2023:

  • closure of accounts inactive for three years and deletion of their stored data that is unnecessary for the service;
  • deletion of data associated with the service that is unnecessary for accounts closed for more than a year.

The site help explains:

Since May 31, 2023, information on the date of last activity has been associated with each account. Let us add that since September 2023 access to this information is also reduced to the needs of the service (you can know the information of your own account; the admins only need to know if the last activity is less than a month old, ‘one year, three years or more, due to the aforementioned rules).

So here we are a year later, and this part of the rule applies for the first time. We will detail the effects in the second part of the dispatch.

Summary

Data minimization script and normal week

The removal of data unnecessary to the service currently relies on an external minimization script, launched manually. One of the reasons for the manual aspect is in particular the fact that we had not yet passed the first year, which marks a threshold as we will see later.

The previous execution of the script having taken place on May 19, 2024 at 11 a.m. (Paris), let’s see what it looks like over 12 days and a few hours:

Started at vendredi 31 mai 2024, 22:19:15 (UTC+0200) Dry run mode 13 inactive accounts never used to purge 0 users to minimize 0 accounts to minimize because inactive and not seen since 1 year 0 active accounts not seen since 3 years to inactivate and minimize 12 users without comments/contents to purge 12 accounts to purge 6 logs to purge 12 friendly_id_slugs to purge 0 taggings to purge 0 oauth_access_grants for an oauth_application to purge 0 oauth_access_tokens for an oauth_application to purge 0 oauth_applications to purge 0 oauth_access_grants to purge 0 oauth_access_tokens to purge 0 deleted comments to minimize 0 comments from non-public contents to purge 0 taggings from non-public contents to purge 0 wiki_versions from non-public wiki_pages to purge 0 slugs from non-public wiki_pages to purge 0 non-public wiki_pages to purge 0 slugs from non-public trackers to purge 0 non-public trackers to purge 0 slugs from non-public posts to purge 0 non-public posts to purge 0 poll_answers to from non-public polls to purge 0 slugs from non-public polls to purge 0 non-public polls to purge 0 slugs from non-public bookmarks to purge 0 non-public bookmarks to purge 0 slugs from non-public diaries to purge 0 diaries converted into non-public news to purge 0 non-public diaries to purge 1 news_versions from non-public news to purge 10 paragraphs from non-public news to purge 0 links from non-public news to purge 1 slugs from non-public news to purge 1 non-public news to purge 1 non-public contents to purge 

In pre-“1 year” operation, we only have a few accounts created but never used to clean (as well as everything associated with them, therefore the accounts “accounts”, the individuals “users”, the associated logs “logs” s There are some, shortcuts for site addresses (slugs) and associated non-public, therefore non-visible content, comments and tagging which are no longer necessary. So we’re talking about a handful of accounts and others per week.

“1 year” effect

A few hours later, the result is no longer the same:

Started at Sat Jun 1 10:55:34 CEST 2024 Dry run mode 15 inactive accounts never used to purge 250 users to minimize 2616 accounts to minimize because inactive and not seen since 1 year 0 active accounts not seen since 3 years to inactivate and minimize 1412 users without comments/contents to purge 1412 accounts to purge 2285 logs to purge 1412 friendly_id_slugs to purge 6 taggings to purge 0 oauth_access_grants for an oauth_application to purge 0 oauth_access_tokens for an oauth_application to purge 0 oauth_applications to purge 15 oauth_access_grants to purge 47 oauth_access_tokens to purge 147 deleted comments to minimize 98 comments from non-public contents to purge 288 taggings from non-public contents to purge 0 wiki_versions from non-public wiki_pages to purge 0 slugs from non-public wiki_pages to purge 0 non-public wiki_pages to purge 0 slugs from non-public trackers to purge 0 non-public trackers to purge 166 slugs from non-public posts to purge 165 non-public posts to purge 10 poll_answers to from non-public polls to purge 2 slugs from non-public polls to purge 2 non-public polls to purge 46 slugs from non-public bookmarks to purge 46 non-public bookmarks to purge 27 slugs from non-public diaries to purge 0 diaries converted into non-public news to purge 27 non-public diaries to purge 139 news_versions from non-public news to purge 1278 paragraphs from non-public news to purge 33 links from non-public news to purge 66 slugs from non-public news to purge 61 non-public news to purge 301 non-public contents to purge 

We have certainly gained 2 more never-used accounts to clean up, but above all we will minimize several thousand accounts and delete or minimize hundreds of contents, comments and tags. This is the moment where the hand must not shake and where we must have confidence in the cleaning script and in our database backups, because we will have to run it for real, and not just in “dry run” or repeat mode, empty test.

In practice, some very minor problems encountered on the large transaction made in the database: a deletion order problem and the impossibility of putting an empty string for the email address, because there is an index on it which asks uniqueness (an address .invalid specific to each account will therefore be used).

After execution, if we restart the script, we just end up with the number of accounts still open but without activity for a year:

Started at Sat Jun 1 13:30:16 CEST 2024 Dry run mode 0    inactive accounts never used to purge 0    users to minimize 905  accounts to minimize because inactive and not seen since 1 year (…) 

What’s the difference ?

Let’s look at the account statistics before and after the “1 year” cleaning (the changes have been made visible with a red dot):

Interpretation: these are the account statements in order of database identifier (temporally in the order of creation), grouped in packets of 10,000 consecutive. Almost no modification on very old accounts (there are much fewer), and the changes are concentrated on accounts from recent years. We have fewer accounts closed afterwards (we were able to purge some) and therefore more purged accounts (i.e. identifiers which are no longer used in the database). And the rest of the changes correspond to nominal site visits.

We can compare the statistics just before:

53,667 users who have or have had accounts (and still present in the database)
33216 accounts
2205 accounts used on the site over the last three months with 20.2 days on average without visits and 25.3 days on standard deviation
10 pending accounts
2809 accounts closed

And the current ones (at the time of writing this article):

51,943 users who have or have had accounts (and still present in the database)
31492 accounts
2208 accounts used on the site over the last three months with 20.0 days on average without visits and 25.3 days on standard deviation
1 pending account
1089 accounts closed

We also reoptimized the database tables (well we told the database to optimize what it could with a OPTIMIZE TABLE What). It should have between no effect and an imperceptible effect on performance, a priori.

And on the backup side, we went from dump gzip compressed from 2,088,253,834 bytes before to 2,086,608,391 bytes after, a whopping gain of 0.08%, in short nothing.

And after ?

Once “1 year” has passed, each week we will have the few accounts created but never used to clean up, as well as the few unnecessary non-public content, comments and associated labeling. But also the accounts which will have reached the year of inactivity in the current week (probably one or two dozen). And this until “3 years”.

From “3 years”, we will start to close accounts and there will be even more data affected each week.

And then we will have reached the nominal rate of closing accounts and minimizing associated data.

See you for the “3 years” in June 2026.

Go further

-

-

PREV Asus unveils 4 new ProArt 5K, 6K, QD-OLED and 8K monitors
NEXT Android Auto suffers from connection problems: are updates to blame?