Page 1 of 1

HD saturation due to the untimely creation of postgresql files

Published: January 7, 2020 - 8:57 PM
by Sophie
Good morning,
First of all, best wishes for the new year.

Since Monday, following a 15-day holiday, our Wapt server has crashed, saturating the hard drive with PostgreSQL files in /var/lib/postgresql/9.6/main/base/16385/
with files reaching a size of 1048576 Kb until the HD is saturated.

Below is the configuration of our virtual server:
  • Wapt version 1.7.4.6165
  • OS Debian 4.9
  • 4 GB RAM
  • 40GB HD
We have approximately 1800 machines in our inventory.

When the console is started, the following error message is displayed:
Unable to get hosts list: Error on server:
OperationalError('could not connect to the server: No such file or directory\n\tls the server running locally and accepting \n\tconnections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432."?\n',)


Below is the result of waptserver service status
waptserver.service - WAPT Server startup script
Loaded: loaded (/usr/lib/systemd/system/waptserver.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2020-01-07 12:17:02 CET; 8h ago
Main PID: 664 (python)
Tasks: 1 (limit: 4915)
CGroup: /system.slice/waptserver.service
└─664 /opt/wapt/bin/python /opt/wapt/waptserver/server.py

Jan 07 20:40:36 srv-wapt15 python[664]: conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
Jan 07 20:40:36 srv-wapt15 python[664]: OperationalError: could not connect to server: No such file or directory
Jan 07 20:40:36 srv-wapt15 python[664]: Is the server running locally and accepting
Jan 07 20:40:36 srv-wapt15 python[664]: connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
Jan 07 20:40:36 srv-wapt15 python[664]: , instance:
Jan 07 20:40:36 srv-wapt15 python[664]: 2020-01-07 20:40:36,449 CRITICAL SocketIO pong error for uuid 4C4C4544-004B-3710-8059-B1C04F4D3632 and sid b270e
Jan 07 20:40:36 srv-wapt15 python[664]: File "/opt/wapt/waptserver/server_socketio.py", line 278, in on_wapt_pong
Jan 07 20:40:36 srv-wapt15 python[664]: with wapt_db.atomic() as trans:
Jan 07 20:40:36 srv-wapt15 python[664]: File "/opt/wapt/lib/python2.7/site-packages/peewee.py", line 3533, in __enter__
Jan 07 20:40:36 srv-wapt15 python[664]: return self._helper.__enter__()


Restoring this VM to a previous state does not solve the problem: after a few minutes the phenomenon recurs.

Any ideas, please?
Thank you for your feedback

Re: HD saturation due to the unexpected creation of postgresql files

Published: January 8, 2020 - 11:38 AM
by dcardon
Hello Sophie,
Sophie wrote: January 7, 2020 - 8:57 PM First of all, best wishes for the new year.

Since Monday, following a 15-day holiday, our Wapt server has crashed, saturating the hard drive with PostgreSQL files in /var/lib/postgresql/9.6/main/base/16385/
with files reaching a size of 1048576 Kb until the HD is saturated.

Below is the configuration of our virtual server:
  • Wapt version 1.7.4.6165
  • OS Debian 4.9
  • 4 GB RAM
  • 40GB HD
We have approximately 1800 machines in our inventory.

When the console is started, the following error message is displayed:
Unable to get hosts list: Error on server:
OperationalError('could not connect to the server: No such file or directory\n\tls the server running locally and accepting \n\tconnections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432."?\n',)
It appears that the size of the JSON inventories (local WMI queries + dmidecode on local machines) has increased significantly, and the PostgreSQL database isn't handling VACUUMs quickly enough. Are you seeing messages like the ones below in your logs?

Code: Select all

ERROR: canceling autovacuum task CONTEXT: automatic vacuum of table "segmentation.pg_toast.pg_toast_237738400" LOG: checkpoints are occurring too frequently (8 seconds apart) HINT: Consider increasing the configuration parameter "max_wal_size
You can free up some space on the machine to allow Postgres to restart (but not the WAPT service initially) and then perform a VACUUM FULL on the Wapt database to verify that it frees up space.

Code: Select all

df -h 
systemctl stop waptserver
rm quelques_fichiers_inutiles_pour_recupere_un_peu_despace
systemctl restart postgresql
sudo -u postgres psql wapt -c "VACUUM FULL"
df -h


This doesn't solve the problem, but at least it confirms that this is indeed the issue. We have a client with a similar problem, and we're currently debugging it with them. There's probably a recent Windows update that changed something in the WMI requests.

Sincerely,

Denis

Re: HD saturation due to the unexpected creation of postgresql files

Published: January 8, 2020 - 2:31 PM
by dcardon
Hello again,

Just for your information, do you happen to have an audit package that regularly triggers WAPT.register()?

Regards,

Denis

Re: HD saturation due to the unexpected creation of postgresql files

Published: January 8, 2020 - 3:59 PM
by stephaneB
Hello,
thank you for your reply (I'm a colleague of Sophie's who started this thread).

We do indeed have pgsql logs like this:
"Checkpoints are occurring too frequently
(every 25 seconds)
... Consider increasing the 'max_wal_size' parameter."

The vacuum function did reclaim the space and restart the services.

Would it be a good idea to schedule a pgsql vacuum cron job in the meantime?

Regarding a possible package containing a wapt.register() function, I don't think so, but I'll look into it.

Re: HD saturation due to the unexpected creation of postgresql files

Published: January 8, 2020 - 4:40 PM
by dcardon
Hello StephaneB,
stephaneB wrote: January 8, 2020 - 3:59 PM Hello,
Thank you for this reply (I'm a colleague of Sophie who started this thread).

We do indeed have pgsql logs like this:
"Checkpoints are occurring too frequently
(every 25 seconds)
... Consider increasing the 'max_wal_size' parameter."

The vacuum function did allow us to reclaim the space and restart the services.

Would it be a good idea to schedule a pgsql vacuum cron job in the meantime?

Regarding a possible package containing a wapt.register() function, I don't think so, but I'll look into it.
The similar problem we identified is related to an excessive number of full inventory updates (around 100 to 150 per second). The database doesn't have enough time to clean up its core files (AUTOVACUUM), and the only way to reclaim space is to perform a FULL VACUUM, which locks the table (and therefore blocks other connections to give it time to clean up). Normally, the exclusive lock shouldn't last more than a few seconds, but during this time, no writes can occur to the table.

While waiting to find what is generating unnecessary full inventories, you can set up a cron job with VACUUM FULL.

Sincerely,

Denis

Note: to be more precise, the VACUUM issue does not arise for the Hosts table itself but for the corresponding TOAST table which stores the bjson and TEXT blobs.

Re: HD saturation due to the unexpected creation of postgresql files

Published: January 8, 2020 - 6:04 PM
by dcardon
Okay, in the other case, the excessive number of inventory updates is due to a package with a control file that has an incorrect value for the `forced_install_on` parameter. Did you use this parameter in your control file?

Re: HD saturation due to the unexpected creation of postgresql files

Published: January 10, 2020 - 3:54 PM
by Sophie
dcardon wrote: Jan 8, 2020 - 6:04 PM Okay, in the other case, the excessive number of inventory updates is due to a package with a control file that has an incorrect value for the forced_install_on parameter. Did you use this parameter in your control file?
Good morning,
No, we never use this parameter as we don't yet know exactly what it's for.

Re: HD saturation due to the unexpected creation of postgresql files

Published: January 13, 2020 - 08:38
by stephaneB
Hello, it seems no one has used the "forced_install_on" parameter in our package control files yet. With around 200 packages, checking them one by one would be tedious... Is there a way, a query, that would allow us to find it quickly? (A grep in the package directory had no effect, as expected...).
I edited the most recently modified packages (within the last month) and I didn't find a value for this parameter. (Some packages don't mention it; in most, it's listed as having no value).

Re: HD saturation due to the unexpected creation of postgresql files

Published: January 14, 2020 - 10:16 AM
by dcardon
Hello stephaneB,

the problem doesn't necessarily stem from the parameter in question. It's simply that, for some reason, the local dependency calculation engine is malfunctioning and restarting an inventory to check its package filtering parameters. The looping of `update-status` then generates the problem you're experiencing.

You would need to find the machines that are causing the issue (look for machines whose "last connection" date is always within the last few minutes), then restart the service locally in debug mode, or alternatively, remove all packages from the machine in question and add them back one by one to find the one causing the problem. Once the package is found, you can post the control file in this thread to see what the issue is.

Regards,

Denis

Re: HD saturation due to the unexpected creation of postgresql files

Published: January 14, 2020 - 10:46 AM
by vcardon
stephaneB wrote: January 13, 2020 - 8:38 AM Hello, it seems no one has used the "forced_install_on" parameter in our package control files yet. With around 200 packages, it's going to be tedious to check them one by one... Is there a way, a query, that would allow us to find it quickly? (A grep in the package directory has no effect, as expected...).
I edited the most recent packages modified in the last month and I didn't find a value for this parameter. (Some packages don't mention it, and in most it's listed as having no value).
Hello stephaneB,

We can provide you with more interactive support by phone with a support contract. It's very affordable, and it appears you may be eligible for the MESR Software Grouping offer and the promotion valid until the end of 2020. Contact us through our website https://www.tranquil.it/gerer-parc-info ... re-support if that option interests you.

Sincerely.

Vincent