Database maintenance is a crucial ingredient for sustained performance and data integrity. For platform engineers, automating routine tasks within a PostgreSQL environment ensures both reliability and relieves team members from repetitive manual work. Here's a quick rundown of effective approaches: Fine-Tuning Autovacuum PostgreSQL's built-in autovacuum daemon automates essential VACUUM and ANALYZE operations. To get the most out of it: Verifying Autovacuum Enablement Ensure autovacuum

Marcelo Altmann
2024-03-27 · 3 min read
Database maintenance is a crucial ingredient for sustained performance and data integrity. For platform engineers, automating routine tasks within a PostgreSQL environment ensures both reliability and relieves team members from repetitive manual work. Here's a quick rundown of effective approaches:
PostgreSQL's built-in autovacuum daemon automates essential VACUUM and ANALYZE operations. To get the most out of it:
Ensure autovacuum is actively running on your databases:
SHOW autovacuum;
autovacuum
------------
on
(1 row)
Adjust autovacuum thresholds to align with your database's update patterns and size. Postgres uses this calculation to trigger an autovacuum:
autovacuum threshold = autovacuum_vacuum_threshold + (table_size * autovacuum_vacuum_scale_factor)
The default threshold value is 50 tuples, and the default scale factor is 0.2. Thus, a table with 1000 tuples will be autovacuumed after 250 have been updated/deleted. But if you have a one million row table, vacuuming won’t happen until more than 200,000 tuples have been updated, possibly leading to degraded performance. Decreasing the autovacuum_vacuum_scale_factor as your table grows will lead to more vacuuming.
Utilize autovacuum_vacuum_cost_delay and related parameters to fine-tune when vacuuming operations trigger, balancing performance impact with maintenance needs.
You might want to run scheduled VACUUM operations for predictive or aggressive cleanup. For on-demand VACUUM operations, create an Ansible playbook targeting your PostgreSQL databases. This offers centralized control and the ability to target specific databases or tables. This is easy using the community.general.postgresql_query module, which allows you to execute arbitrary SQL queries. This module is part of the community.general collection, so ensure you have it installed in your Ansible environment.
Here’s an example of an Ansible playbook that performs a VACUUM operation on a specific PostgreSQL database.
---
- name: Run VACUUM on PostgreSQL Database
hosts: database_servers
become: yes
tasks:
- name: VACUUM PostgreSQL database
community.general.postgresql_query:
db: "your_database_name"
login_user: "your_database_user"
login_password: "your_database_password"
login_host: "localhost"
login_port: "5432"
query: "VACUUM;"
register: vacuum_result
- name: Print VACUUM output
debug:
var: vacuum_result
Adjust this to your needs. For instance, use VACUUM ANALYZE instead of VACUUM to update statistics for the query planner.
Write custom scripts (in your preferred language) for advanced VACUUM and ANALYZE operations. Vacuum tables based on factors like size, dead tuples, or the last modification time. If you’re working in Python, you can try something like this:
# conditional_vacuum.py
def vaccum_table(table_name):
cursor = conn.cursor()
# Get the number of dead tuples and total tuples in the table
cursor.execute("""
SELECT n_dead_tup, n_live_tup
FROM pg_stat_user_tables
WHERE relname = %s;
""", (table_name,))
result = cursor.fetchone()
dead_tuples = result[0]
total_tuples = result[0] + result[1]
# Calculate the percentage of dead tuples
dead_tuple_percentage = (dead_tuples / total_tuples) * 100
# Logic for vacuuming - you can adjust the conditions below
if dead_tuple_percentage > 20:
cursor.execute(f"VACUUM ANALYZE {table_name}")
conn.commit()
print(f"Table {table_name} vacuumed and analyzed")
else:
print(f"Table {table_name} does not meet vacuum criteria")
By calculating the percentage of dead tuples, we ensure that the script triggers VACUUM only on tables that have a significant amount of reclaimable space. Customize these thresholds to your needs.
You then want to leverage tools like cron to execute your scripts during off-peak hours. Example for running the script daily at 2 AM:
0 2 * * * /usr/bin/python3 /path/to/your/conditional_vacuum.py
These extensions allow index rebuilding and defragmentation without the heavy locking restrictions commonly associated with index maintenance.
Set up scheduled jobs to identify and automatically reindex fragmented indexes during periods of low database usage.
By following these guidelines, platform engineers can efficiently automate their PostgreSQL database maintenance routines, enabling greater resource optimization and helping sustain database health over time.
Readyset Cloud is a MySQL and PostgreSQL wire-compatible caching platform that offers significant database scalability enhancements while reducing the complexity of query optimization. Sign up for Readyset Cloud today.
Modern applications demand instant performance, even under unpredictable load. Readyset helps you eliminate slow queries, stabilize latency, and scale confidently.
Revolutionize your database performance with Readyset
Serve requests at sub-millisecond latencies with the modern database scaling and query caching system for MySQL and PostgreSQL.
Join our newsletter
Stay updated with the latest news, insights, and developments from Readyset — straight to your inbox.