Upgrading Apache Cassandra to a newer version is a significant task that database administrators undertake to ensure their systems benefit from new features, enhanced security measures, and improved performance. This guide provides a detailed walkthrough for upgrading Apache Cassandra from version 3.1.15 and higher to the latest 4.1.x version, specifically on Ubuntu 20.04.5 LTS, with an emphasis on pre-upgrade cleaning operations to manage disk space effectively.
Pre-upgrade Preparation
Backup Configuration Directory:
Before initiating the upgrade, it’s crucial to back up the Cassandra configuration directory. This precaution allows for a swift restoration of the configuration should any issues arise during the upgrade process. Utilize the following command to create a backup, incorporating the current date into the folder name for easy identification:
1 |
# cp -r /etc/cassandra/ /root/cassandra-conf-bkp-$(date +%Y%m%d) |
Pre-Cleanup Operations
Preparation is key to a smooth upgrade. Begin with maintenance commands to guarantee data integrity and optimize space usage, especially important for systems with limited disk space.
Scrub Data:
Execute nodetool scrub
to clean and reorganize data on disk. Given that this operation may be time-consuming, particularly for databases with large amounts of data or limited disk space, it’s a critical step for a healthy upgrade process.
Clear Snapshots:
To further manage disk space, use nodetool clearsnapshot
to remove existing snapshots, freeing up space for the upgrade process. To delete all snapshots on the node, simply use this method if you’re running out of space:
1 |
# nodetool clearsnapshot --all |
Cleanup Data:
Perform a nodetool cleanup
to purge unnecessary data. In scenarios where disk space is a premium, it’s advisable to execute a scrub operation without generating a snapshot to conserve space:
1 |
# nodetool scrub --no-snapshot |
Draining and Stopping Cassandra
Drain the Node:
Prior to halting the Cassandra service, ensure all data in memory is flushed to disk with nodetool drain
.
1 |
# nodetool drain |
Stop the Cassandra Service:
Cease the running Cassandra services to proceed with the upgrade safely:
1 |
# systemctl stop cassandra.service |
Upgrading Cassandra
Update Source List:
Edit the repository sources to point to the new version of Cassandra by adjusting the cassandra.sources.list
file:
1 |
# echo "deb https://debian.cassandra.apache.org 41x main" > /etc/apt/sources.list.d/cassandra.sources.list |
Upgrade Packages:
With the repository sources updated, refresh the package list and upgrade the packages. When executing the apt upgrade
command, you can keep pressing Enter as the default option is ‘N’ (No):
1 |
# apt update && apt upgrade |
Modify Configuration:
Adjust the Cassandra configuration for version 4.1.x by commenting out or deleting deprecated options:
1 |
# for var in thrift_prepared_statements_cache_size_mb start_rpc rpc_port rpc_server_type thrift_framed_transport_size_in_mb request_scheduler; do sed -i "/$var:/s/^/#/" /etc/cassandra/cassandra.yaml; done |
Update JAMM Library:
Ensure the Java Agent Memory Manager (JAMM) library is updated to enhance performance:
1 |
# sed -i 's|jamm-0.3.0.jar|jamm-0.3.2.jar|g' /etc/cassandra/cassandra-env.sh |
Backup and update the JVM options file:
It’s a good practice to back up configuration files before making changes. This step renames the existing jvm-server.options
file to jvm-server.options.orig
as a backup. Then, it copies the jvm.options
file to jvm-server.options
to apply the standard JVM options for Cassandra servers.
1 2 |
# cd /etc/cassandra/ # mv jvm-server.options jvm-server.options.orig && cp -p jvm.options jvm-server.options |
Optimization and Verification
Optimize Memory Usage:
Post-upgrade, it’s beneficial to evaluate and optimize memory usage and swap space to ensure efficient Cassandra operation:
1 |
# swapoff -a && swapon -a |
Restart the Cassandra Service:
Apply the new version by restarting the Cassandra service:
1 |
# systemctl start cassandra.service |
Verify Upgrade:
Confirm the success of the upgrade by inspecting the cluster’s topology and state, ensuring all nodes are functional:
1 2 |
# nodetool describecluster # nodetool status |
By adhering to this comprehensive guide, database administrators can effectively upgrade Apache Cassandra to version 4.1.x, capitalizing on the latest advancements and optimizations the platform has to offer, while ensuring data integrity and system performance through careful pre-upgrade preparations.
Optimization and Verification
After successfully upgrading Apache Cassandra to version 4.1.x and ensuring the cluster is fully operational, it’s crucial to conduct post-upgrade maintenance to optimize the performance and security of your database system. This section outlines essential steps and considerations to maintain a healthy and efficient Cassandra environment.
Monitor Performance and Logs
In the immediate aftermath of the upgrade, closely monitor the system’s performance, including CPU, memory usage, and disk I/O, to identify any unexpected behavior or bottlenecks. Additionally, review the Cassandra system logs for warnings or errors that may indicate potential issues requiring attention.
Tune and Optimize
Based on the performance monitoring insights, you may need to adjust Cassandra’s configuration settings for optimal performance. Consider tuning parameters related to JVM options, compaction, and read/write performance, keeping in mind the specific workload and data patterns of your application.
Run nodetool upgradesstables
To ensure that all SSTables are updated to the latest format, execute nodetool upgradesstables
on each node in the cluster. This operation will rewrite SSTables that are not already in the current format, which is essential for taking full advantage of the improvements and features in Cassandra 4.1.x (Check the space, and if required, delete all snapshots as shown above.):
1 |
# time nodetool upgradesstables |
This process can be resource-intensive and should be scheduled during off-peak hours to minimize impact on live traffic.
Implement Security Enhancements
Cassandra 4.1.x includes several security enhancements. Review the latest security features and best practices, such as enabling client-to-node encryption, node-to-node encryption, and advanced authentication mechanisms, to enhance the security posture of your Cassandra cluster.
Review and Update Backup Strategies
With the new version in place, reassess your backup strategies to ensure they are still effective and meet your recovery objectives. Verify that your backup and restore procedures are compatible with Cassandra 4.1.x and consider leveraging new tools or features that may have been introduced in this release for more efficient data management.