Automating database sync for a hassle-free workflow

A preview automated database sync process on development environment.

In any digital application, ensuring data consistency across development and production environments is crucial. The core nature of a website is to gather data from users while allowing the admin to make changes and updates in production. For instance, as the admin of my portfolio website, I need to create blog posts, update existing ones, and monitor user data on the go. This dynamic interaction between users and the admin highlights the importance of maintaining an up-to-date database.

If I were to add or update a blog post on the go, or manage user access to my portfolio content, these changes must reflect accurately. My website includes features such as a claps system that captures users' appreciation for each blog post. With these aspects in mind, having a constant sync between the production and local environments is essential to ensure seamless operations.

The solution

Manually backing up and restoring databases each time the local environment is set up can be inefficient and prone to errors. Automating this process ensures that the latest production data is always available locally, reducing the risk of discrepancies and improving workflow efficiency. Additionally, by structuring backups systematically, we prevent clutter and maintain an organized record of database versions.

Approach

To address this challenge, I have adopted a widely used approach of implementing an automated database backup and synchronization process within the codebase. This process securely connects to the production server, copies the latest database files to a dedicated local backup folder, and replaces outdated versions in the working environment. By leveraging this strategy, manual intervention is minimized, enabling a more efficient development workflow.

Implementation

To ensure that the database synchronization process runs only in the appropriate environment, I use the FLASK_ENV environment variable. This variable helps distinguish between development and production setups, preventing accidental overwrites or data loss.

In the .flaskenv file, I set the environment:

FLASK_ENV=development

On the production server, the environment variable is set to:

FLASK_ENV=production

Before running the backup process, the script checks the environment variable and prompts the user for confirmation:

if __name__ <mark> '__main__':
    # Call the backup function before starting the Flask app
    if os.environ.get('WERKZEUG_RUN_MAIN') </mark> 'true':
        flask_env = os.getenv('FLASK_ENV', 'development')

        # Prompt user for backup decision
        user_input = input("Would you like to perform a database backup? (yes/no): ").strip().lower()

        if user_input in ['yes', 'y']:
            print("Starting database backup process...")
            backup_and_replace_databases()
        else:
            print("Database backup skipped based on user response.")

This check ensures that database backups are only initiated in the local development environment, where the script prompts the user for confirmation before proceeding, preventing unintended overwrites.

In my setup, I use Python and Flask for web development, with Render serving as the production environment. The automation process involves the following steps:

1. Secure SSH connection: I configured SSH key authentication to allow secure communication between my local machine and the production server.

# Generate SSH key (if not already created)
ssh-keygen -t ed25519 -C "your-email@example.com"

# Copy the public key to Render
cat ~/.ssh/id_ed25519.pub | pbcopy  # Copy to clipboard
# Paste this key in Render SSH settings

2. Database backup and replacement: Before starting the local development server, the script conditionally runs based on the environment settings to prevent unintended overwrites in production. It automatically:

import os
import subprocess
import shutil
from datetime import datetime

SSH_KEY = "./render"
SSH_HOST = "your-host@ssh.render.com"
REMOTE_PATH = "/opt/render/project/src"
LOCAL_PATH = os.getcwd()
BACKUP_FOLDER = os.path.join(LOCAL_PATH, "Backup")
DATABASE_FILES = ["blog.db", "database.db"]

def backup_and_replace_databases():
    if not os.path.exists(BACKUP_FOLDER):
        os.makedirs(BACKUP_FOLDER)

    current_date_str = datetime.now().strftime('%m-%d-%Y')
    for db_file in DATABASE_FILES:
        remote_file_path = os.path.join(REMOTE_PATH, db_file)
        local_file_path = os.path.join(LOCAL_PATH, db_file)

        for backup_file in os.listdir(BACKUP_FOLDER):
            if db_file.replace('.db', '') in backup_file and current_date_str in backup_file:
                os.remove(os.path.join(BACKUP_FOLDER, backup_file))

        backup_file_name = f"{db_file.replace('.db', '')}_{current_date_str}.db"
        backup_path = os.path.join(BACKUP_FOLDER, backup_file_name)
        shutil.move(local_file_path, backup_path)

        scp_command = f"scp -i {SSH_KEY} {SSH_HOST}:{remote_file_path} {LOCAL_PATH}"
        subprocess.run(scp_command, shell=True, check=True)

By structuring backups in a manual folder, I ensure that old backups do not clutter the system, and I can always retrieve past versions if needed.

Why remove same day files?

Since I often work multiple times in a single day, retaining multiple backup instances would create unnecessary duplicates and occupy disk space. Therefore, the system automatically removes older backups from the same day, ensuring only the latest copy is retained. This approach keeps the backup folder organized and easy to manage.

Monitoring the process

To ensure clarity and facilitate debugging, the logging setup differs for production and development environments. In the development environment, verbose logs provide detailed insights to help troubleshoot issues, while in production, concise logging is used to capture essential events without cluttering logs. This distinction enhances maintainability and allows for efficient monitoring.

The automation script logs each step of the process, making it easy to track progress and identify potential issues. I can monitor messages in the terminal, such as confirmations of successful backups, errors in copying files, or database replacements, which helps maintain transparency and control.

print("Starting database backup and replacement process...")
print(f"Successfully copied {db_file} from Render to local.")

With this automated approach, I have streamlined the database management process, ensuring I always have the latest production data locally without manual effort.

Automating database sync for a hassle-free workflow

The solution

Approach

Implementation

Why remove same day files?

Monitoring the process

Managing portfolio access with precision and peace of mind

Building a responsive and accessible clapping feature

The solution

Approach

Implementation

Why remove same day files?

Monitoring the process

Enjoyed the read?