HD saturation due to the untimely creation of postgresql files

Questions about WAPT Server / Requests and help related to the WAPT server
Forum Rules
Community Forum Rules
* English support on www.reddit.com/r/wapt
* French community support is available on this forum
* Please prefix the topic title with [RESOLVED] if it is resolved.
* Please do not edit a topic that is tagged [RESOLVED]. Open a new topic referencing the old one.
* Specify the installed WAPT version, full version, and build number (2.2.1.11957 / 2.2.2.12337 / etc.) as well as the Enterprise/Discovery edition.
* Versions 1.8.2 and earlier are no longer supported. The only questions accepted regarding version 1.8.2 are related to upgrading to a supported version (2.1, 2.2, etc.).
* Specify the server OS (Linux/Windows) and version (Debian Buster/Bullseye - CentOS 7 - Windows Server 2012/2016/2019).
* Specify the OS of the administration/package creation machine and the machine with the problematic agent, if applicable (Windows 7/10/11/Debian 11/etc.).
* Avoid asking multiple questions when opening a topic, otherwise it may be ignored. If there are multiple topics, open separate topics, preferably one after the other and not all at the same time (i.e., do not spam the forum).
* Include code snippets, screenshots, and other images directly in the post. Links to Pastebin, Bitly, and other third-party sites will be systematically removed.
* As with any community forum, support is provided voluntarily by members. If you require commercial support, you can contact Tranquil IT's sales department at 02.40.97.57.55
Locked
Sophie
Messages: 2
Registration: Apr 26, 2018 - 12:22

January 7, 2020 - 8:57 PM

Good morning,
First of all, best wishes for the new year.

Since Monday, following a 15-day holiday, our Wapt server has crashed, saturating the hard drive with PostgreSQL files in /var/lib/postgresql/9.6/main/base/16385/
with files reaching a size of 1048576 Kb until the HD is saturated.

Below is the configuration of our virtual server:
  • Wapt version 1.7.4.6165
  • OS Debian 4.9
  • 4 GB RAM
  • 40GB HD
We have approximately 1800 machines in our inventory.

When the console is started, the following error message is displayed:
Unable to get hosts list: Error on server:
OperationalError('could not connect to the server: No such file or directory\n\tls the server running locally and accepting \n\tconnections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432."?\n',)


Below is the result of waptserver service status
waptserver.service - WAPT Server startup script
Loaded: loaded (/usr/lib/systemd/system/waptserver.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2020-01-07 12:17:02 CET; 8h ago
Main PID: 664 (python)
Tasks: 1 (limit: 4915)
CGroup: /system.slice/waptserver.service
└─664 /opt/wapt/bin/python /opt/wapt/waptserver/server.py

Jan 07 20:40:36 srv-wapt15 python[664]: conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
Jan 07 20:40:36 srv-wapt15 python[664]: OperationalError: could not connect to server: No such file or directory
Jan 07 20:40:36 srv-wapt15 python[664]: Is the server running locally and accepting
Jan 07 20:40:36 srv-wapt15 python[664]: connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
Jan 07 20:40:36 srv-wapt15 python[664]: , instance:
Jan 07 20:40:36 srv-wapt15 python[664]: 2020-01-07 20:40:36,449 CRITICAL SocketIO pong error for uuid 4C4C4544-004B-3710-8059-B1C04F4D3632 and sid b270e
Jan 07 20:40:36 srv-wapt15 python[664]: File "/opt/wapt/waptserver/server_socketio.py", line 278, in on_wapt_pong
Jan 07 20:40:36 srv-wapt15 python[664]: with wapt_db.atomic() as trans:
Jan 07 20:40:36 srv-wapt15 python[664]: File "/opt/wapt/lib/python2.7/site-packages/peewee.py", line 3533, in __enter__
Jan 07 20:40:36 srv-wapt15 python[664]: return self._helper.__enter__()


Restoring this VM to a previous state does not solve the problem: after a few minutes the phenomenon recurs.

Any ideas, please?
Thank you for your feedback
User avatar
dcardon
WAPT Expert
Messages: 1932
Registration: June 18, 2014 - 09:58
Location: Saint Sébastien sur Loire
Contact :

January 8, 2020 - 11:38

Hello Sophie,
Sophie wrote: January 7, 2020 - 8:57 PM First of all, best wishes for the new year.

Since Monday, following a 15-day holiday, our Wapt server has crashed, saturating the hard drive with PostgreSQL files in /var/lib/postgresql/9.6/main/base/16385/
with files reaching a size of 1048576 Kb until the HD is saturated.

Below is the configuration of our virtual server:
  • Wapt version 1.7.4.6165
  • OS Debian 4.9
  • 4 GB RAM
  • 40GB HD
We have approximately 1800 machines in our inventory.

When the console is started, the following error message is displayed:
Unable to get hosts list: Error on server:
OperationalError('could not connect to the server: No such file or directory\n\tls the server running locally and accepting \n\tconnections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432."?\n',)
It appears that the size of the JSON inventories (local WMI queries + dmidecode on local machines) has increased significantly, and the PostgreSQL database isn't handling VACUUMs quickly enough. Are you seeing messages like the ones below in your logs?

Code: Select all

ERROR: canceling autovacuum task CONTEXT: automatic vacuum of table "segmentation.pg_toast.pg_toast_237738400" LOG: checkpoints are occurring too frequently (8 seconds apart) HINT: Consider increasing the configuration parameter "max_wal_size
You can free up some space on the machine to allow Postgres to restart (but not the WAPT service initially) and then perform a VACUUM FULL on the Wapt database to verify that it frees up space.

Code: Select all

df -h 
systemctl stop waptserver
rm quelques_fichiers_inutiles_pour_recupere_un_peu_despace
systemctl restart postgresql
sudo -u postgres psql wapt -c "VACUUM FULL"
df -h


This doesn't solve the problem, but at least it confirms that this is indeed the issue. We have a client with a similar problem, and we're currently debugging it with them. There's probably a recent Windows update that changed something in the WMI requests.

Sincerely,

Denis
Denis Cardon - Tranquil IT
Share your experiences on WAPT! Send us your blog and article URLs in the "Your Opinion of the forum, and we'll feature them on the WAPT
User avatar
dcardon
WAPT Expert
Messages: 1932
Registration: June 18, 2014 - 09:58
Location: Saint Sébastien sur Loire
Contact :

January 8, 2020 - 2:31 PM

Hello again,

Just for your information, do you happen to have an audit package that regularly triggers WAPT.register()?

Regards,

Denis
Denis Cardon - Tranquil IT
Share your experiences on WAPT! Send us your blog and article URLs in the "Your Opinion of the forum, and we'll feature them on the WAPT
stephaneB
Messages: 13
Registration: January 8, 2020 - 3:48 PM

January 8, 2020 - 3:59 PM

Hello,
thank you for your reply (I'm a colleague of Sophie's who started this thread).

We do indeed have pgsql logs like this:
"Checkpoints are occurring too frequently
(every 25 seconds)
... Consider increasing the 'max_wal_size' parameter."

The vacuum function did reclaim the space and restart the services.

Would it be a good idea to schedule a pgsql vacuum cron job in the meantime?

Regarding a possible package containing a wapt.register() function, I don't think so, but I'll look into it.
User avatar
dcardon
WAPT Expert
Messages: 1932
Registration: June 18, 2014 - 09:58
Location: Saint Sébastien sur Loire
Contact :

January 8, 2020 - 4:40 PM

Hello StephaneB,
stephaneB wrote: January 8, 2020 - 3:59 PM Hello,
Thank you for this reply (I'm a colleague of Sophie who started this thread).

We do indeed have pgsql logs like this:
"Checkpoints are occurring too frequently
(every 25 seconds)
... Consider increasing the 'max_wal_size' parameter."

The vacuum function did allow us to reclaim the space and restart the services.

Would it be a good idea to schedule a pgsql vacuum cron job in the meantime?

Regarding a possible package containing a wapt.register() function, I don't think so, but I'll look into it.
The similar problem we identified is related to an excessive number of full inventory updates (around 100 to 150 per second). The database doesn't have enough time to clean up its core files (AUTOVACUUM), and the only way to reclaim space is to perform a FULL VACUUM, which locks the table (and therefore blocks other connections to give it time to clean up). Normally, the exclusive lock shouldn't last more than a few seconds, but during this time, no writes can occur to the table.

While waiting to find what is generating unnecessary full inventories, you can set up a cron job with VACUUM FULL.

Sincerely,

Denis

Note: to be more precise, the VACUUM issue does not arise for the Hosts table itself but for the corresponding TOAST table which stores the bjson and TEXT blobs.
Denis Cardon - Tranquil IT
Share your experiences on WAPT! Send us your blog and article URLs in the "Your Opinion of the forum, and we'll feature them on the WAPT
User avatar
dcardon
WAPT Expert
Messages: 1932
Registration: June 18, 2014 - 09:58
Location: Saint Sébastien sur Loire
Contact :

January 8, 2020 - 6:04 PM

Okay, in the other case, the excessive number of inventory updates is due to a package with a control file that has an incorrect value for the `forced_install_on` parameter. Did you use this parameter in your control file?
Denis Cardon - Tranquil IT
Share your experiences on WAPT! Send us your blog and article URLs in the "Your Opinion of the forum, and we'll feature them on the WAPT
Sophie
Messages: 2
Registration: Apr 26, 2018 - 12:22

January 10, 2020 - 3:54 PM

dcardon wrote: Jan 8, 2020 - 6:04 PM Okay, in the other case, the excessive number of inventory updates is due to a package with a control file that has an incorrect value for the forced_install_on parameter. Did you use this parameter in your control file?
Good morning,
No, we never use this parameter as we don't yet know exactly what it's for.
stephaneB
Messages: 13
Registration: January 8, 2020 - 3:48 PM

January 13, 2020 - 08:38

Hello, it seems no one has used the "forced_install_on" parameter in our package control files yet. With around 200 packages, checking them one by one would be tedious... Is there a way, a query, that would allow us to find it quickly? (A grep in the package directory had no effect, as expected...).
I edited the most recently modified packages (within the last month) and I didn't find a value for this parameter. (Some packages don't mention it; in most, it's listed as having no value).
User avatar
dcardon
WAPT Expert
Messages: 1932
Registration: June 18, 2014 - 09:58
Location: Saint Sébastien sur Loire
Contact :

January 14, 2020 - 10:16

Hello stephaneB,

the problem doesn't necessarily stem from the parameter in question. It's simply that, for some reason, the local dependency calculation engine is malfunctioning and restarting an inventory to check its package filtering parameters. The looping of `update-status` then generates the problem you're experiencing.

You would need to find the machines that are causing the issue (look for machines whose "last connection" date is always within the last few minutes), then restart the service locally in debug mode, or alternatively, remove all packages from the machine in question and add them back one by one to find the one causing the problem. Once the package is found, you can post the control file in this thread to see what the issue is.

Regards,

Denis
Denis Cardon - Tranquil IT
Share your experiences on WAPT! Send us your blog and article URLs in the "Your Opinion of the forum, and we'll feature them on the WAPT
User avatar
vcardon
WAPT Expert
Messages: 278
Registration: Oct 06, 2017 - 10:55 p.m.
Location: Nantes, France

January 14, 2020 - 10:46

stephaneB wrote: January 13, 2020 - 8:38 AM Hello, it seems no one has used the "forced_install_on" parameter in our package control files yet. With around 200 packages, it's going to be tedious to check them one by one... Is there a way, a query, that would allow us to find it quickly? (A grep in the package directory has no effect, as expected...).
I edited the most recent packages modified in the last month and I didn't find a value for this parameter. (Some packages don't mention it, and in most it's listed as having no value).
Hello stephaneB,

We can provide you with more interactive support by phone with a support contract. It's very affordable, and it appears you may be eligible for the MESR Software Grouping offer and the promotion valid until the end of 2020. Contact us through our website https://www.tranquil.it/gerer-parc-info ... re-support if that option interests you.

Sincerely.

Vincent
Vincent CARDON
Tranquil IT
Locked