Next problem to solve. Unexpected as always.
Who could think about it? My test system is still at AIX 7.3 TL1 SP1! It is July 2024 and we have AIX 7.3 TL2 SP2 out there. I must update it!
Of course I have some systems with AIX 7.3 and some with AIX 7.2. I must confess I still have AIX 7.1 and AIX 6.1. What I don’t have is AIX 5.3 and earlier. Thank you my colleagues that you don’t need them anymore! But we don’t talk about upgrades now. If you want to upgrade your systems, read my previous newsletter. Today we are talking about updates.
AIX patch management
Security strategy of almost every company knows the word “patching” or two words - “security patching”. It usually means we should install security fixes in a relative timely manner. Of course we can install each interim fix, IBM produces. One day I will write about it, but not today. Sometimes there is really no need to install each fix. Sometimes IBM already delivered a new service pack or a new technology level and it is easier to install the complete service pack instead of installing separate interim fixes. This is our case today. My test system is not so important to install interim fixes in 3 days after their delivery. I install only new SPs there.
AIX Live Kernel Update
One word before going deeply technical. There is a wonderful AIX technology called Live Kernel Upgrade or Live Update. I don’t use it. It is not implemented in our infrastructure. Or frankly speaking - I never searched for a way to implement it. I suppose it could be implemented, because all of our AIX LPARs are fully virtualized and we use IBM storage. It should work and if I’ll find time somewhere in the future I may implement it. Anyway there is no such requirement and I have my downtime to update systems.
Plan your downtime!
If we talk about downtime, the real question is: how much downtime do I need to update a single AIX server? Similar to upgrade I need 10 to 15 minutes if everything works according to the plan. Does it mean that I need 100 minutes if I update 10 servers? No way! I still need 10 to 15 minutes to update 10 or 100 servers. You know it - the only time you need to update your AIX server is to shut up down your application, restart your server and start up the application again. I have no idea how long it takes to restart your application. I saw applications which start less 1 minute and saw applications which start more than 1 hour. Anyway I think I didn’t see AIX, which requires more than 10 minutes for reboot during last 10 years.
OK, I am lying. I saw such AIX servers. But they were wrong configured and had network problems. That’s why they required more than 10 minutes to start. This is your task as a system administrator to know how your systems work. You must know how much time they need to restart (including applications running on them), what is installed and what everything must be updated during the update procedure.
I can provide you some simple steps to reduce your downtime and make your updates fun. But if they work in your environment - it is up to you. You must take the pieces of the information from this newsletter or other articles in the Internet, mix them with your knowledge about your environment and create the automation to update your environment.
How do I proceed with updates?
Similar to upgrades I prepare them before I have downtime or maintenance window. I use good old altinst_rootvg procedure to make it almost online. It means I need a separate disk which is at least so big as my original rootvg disk.
I use NIM server. Let me be honest. I use single NIM server for whole environment. I don’t see any need for high availability solutions for NIM server. If the NIM server is out, it only means that it is not available. Maybe I must restart it or to restore it from the backup. I can always reinstall it using my Ansible playbooks and nothing happens. Of course if you (mis)use NIM for some other functionality, like using it as a main jump server for the whole environment, it may be painful to lose the server. But in this case it is not NIM which requires high availability but your other functionality. May be you should split up this functionality to other server and make it highly available.
I saw environments using different NIM servers for different AIX versions.
I saw environments using different NIM servers for different networks.
I saw environments using different NIM servers for different stages.
I don’t like these concepts. I use single NIM server for all AIX versions, for all networks and all stages. Of course YMMV. Of course you may have another requirements and be ready to have many NIM servers.
If we have a NIM server, our AIX server must be registered as NIM client and must be working. I wrote about it last time, but I repeat it one more time. If nimsh service is started and running on your AIX server, it doesn’t mean that it works. As for me the easiest check if nimsh works, is to show boot log from the AIX client, like:
nim -o showlog -a log_type=boot aix01
If I’ve got the log in 1 minute, everything is OK. If it takes more time, maybe I should investigate the cause before going further.
The next step is to have lpp_source on the NIM with the AIX version we want to update to. If you read my previous newsletter, you know how I name my lpp_sources:
I hope, you remember. One of the reasons for this schema is ibm.power_aix.nim module. But we don’t use it this time. Let’s see what we use.
Variables
No, you will not get a list of Ansible modules used in the playbook. But you will get a list of variables used in the playbook.
Similar to AIX upgrade playbook we must know the name of the disk for our altinst_rootvg (alt_hdisk), the target AIX version (aix_version) and full AIX level (target_level). The playbook will be executed on the NIM client and even if I could get the name of NIM server from it, I defined it in the variable.
# alt_hdisk: (str) hdisk device to create altinst_rootvg alt_hdisk: hdiskX # aix_version: (str) AIX must be at version aix_version: 7.3 # target_level: (str) target AIX level target_level: 7300-02-02 # nim_server: (str) our NIM server nim_server: nim
How to get the name of the NIM master from NIM client? It is easy:
awk -F= '/NIM_MASTER_HOSTNAME/ {print $2}' /etc/niminfo
Playbook
OK, we defined all variables and can start with the playbook. First let’s check if we run on AIX:
- name: Get current AIX version ansible.builtin.setup: gather_subset: distribution_version - name: Stop if the target system is not AIX when: "ansible_facts.distribution != 'AIX'" block: - name: Print message ansible.builtin.debug: msg: "The target host is not AIX: {{ ansible_facts.distribution }}. Aborting update." - name: Abort update ansible.builtin.meta: end_play
I hope, you already know why I do it and why I stop the playbook gracefully in this case. No, I don’t do it in every playbook but only if I suppose that the playbook can be occasionally run on non-AIX server.
Next we check if we have correct AIX version. You can’t update AIX 7.2 to AIX 7.3 or vice versa. You must have AIX 7.2 to update it to some of AIX 7.2 service packs, or AIX 7.3 if you want to install AIX 7.3 service pack. Otherwise it doesn’t work.
- name: Stop if the target system has wrong AIX version when: "ansible_facts.distribution_version | string != aix_version | string" ansible.builtin.fail: msg: "We don't support AIX {{ ansible_facts.distribution_version }} for update to {{ target_level }}. Aborting update."
Even if we have correct AIX version (like 7.3 and we want to update to 7300-02-02), there is still a possibility that the server is already updated (it has 7300-02-02) or has even newer version (like 7300-02-03). We should its version and compare it to our target_level:
- name: Get full AIX version ansible.builtin.command: cmd: oslevel -s changed_when: false failed_when: false register: oslevel - name: "Stop if we already have AIX {{ target_level }} or newer" when: "oslevel.stdout_lines.0 is version(target_level, '>=')" block: - name: Print message ansible.builtin.debug: msg: "AIX is already updated to {{ target_level }}. Aborting update." - name: Abort upgrade ansible.builtin.meta: end_play
Now we are sure, that we could update this AIX server. We must be sure too, that we have a disk for our altinst_rootvg:
- name: Get devices information ansible.builtin.setup: gather_subset: devices - name: Check that hdisk for alt_rootvg exists ansible.builtin.fail: when: ansible_facts.devices[alt_hdisk].state != 'Available'
If we have the disk, we can clean it up from all previous attempts to update the server:
- name: Clean up altinst_rootvg if it exists ibm.power_aix.alt_disk: action: clean - name: Clean up old_rootvg if it exists ibm.power_aix.alt_disk: action: clean allow_old_rootvg: true - name: Clean up hdisk for alt_rootvg ansible.builtin.command: cmd: "chpv -C {{ alt_hdisk }}"
The disk is prepared now. We must find the name of the lpp_source and find its directory on the NIM server:
- name: Get names of lpp_source resources ansible.builtin.shell: cmd: "nimclient -l -t lpp_source | grep {{ target_level }} | sort | tail -1" register: lppsrc changed_when: false failed_when: false - name: Set target lpp_source ansible.builtin.set_fact: target_lppsource: "{{ lppsrc.stdout_lines.0 | split | first }}" - name: Check that target lpp_source exists ansible.builtin.command: cmd: "nimclient -l {{ target_lppsource }}" changed_when: false - name: Get lpp_source directory ansible.builtin.shell: cmd: "nimclient -ll {{ target_lppsource }} | awk -F= '/location +=/ {print $2}" changed_when: false register: lppsource_dir
We checked everything what we could. At least I hope that we did it. It is time to start the update procedure. Our first step is to create altinst_rootvg:
- name: Create altinst_rootvg ibm.power_aix.alt_disk: action: copy force: true targets: - "{{ alt_hdisk }}"
After it is created, we “wake” it up to start working with it. Unfortunately there is now ready to use Ansible module for it right now, so I use the command module:
- name: Wake up altinst_rootvg ansible.builtin.command: cmd: "alt_rootvg_op -W -d {{ alt_hdisk }}"
After altinst_rootvg is ready to use I mount my lpp_source from NIM into altinst_rootvg:
- name: Mount lpp_source into altinst_rootvg ibm.power_aix.mount: node: "{{ nim_server }}" mount_dir: "{{ lppsource_dir.stdout_lines.0 | trim }}" mount_over_dir: /alt_inst/mnt options: "soft,ro" state: mount
Did I already say that there is a shell script behind each Ansible playbook? If not, it is the time! I create a small script to remove all interim fixes from my altinst_rootvg and update it.
- name: Copy update script to altinst_rootvg ansible.builtin.copy: content: | #!/bin/sh export INUCLIENTS=1 emgr -l | awk '/^[0-9]/ { print "emgr -rL "$2 }' | sh installp -c all update_all -Yd /mnt dest: /alt_inst/tmp/update.sh owner: root group: system mode: "0755" - name: Execute update script ansible.builtin.command: cmd: chroot /alt_inst /tmp/update.sh chdir: /alt_inst
After the updates are installed we should check if everything went OK. We check new AIX level and consistency of packages. If something is wrong, we will get a failure here:
- name: Check AIX version ansible.builtin.command: cmd: chroot /alt_inst /usr/bin/oslevel -s chdir: /alt_inst changed_when: false register: oslevel - name: Stop if we have problems with the AIX version when: "oslevel.stdout_lines.0 is version(target_level, '<')" ansible.builtin.fail: msg: "AIX update failed! oslevel returns {{ oslevel.stdout_lines.0 }}" - name: Check LPP consistency ansible.builtin.command: cmd: chroot /alt_inst /usr/bin/lppchk -vm3 chdir: /alt_inst changed_when: false
If the update was successful we can unmount lpp_source and close our altinst_rootvg:
- name: Unmount lpp_source ansible.posix.mount: path: /alt_inst/mnt state: unmounted fstab: /dev/null - name: Close altinst_rootvg ansible.builtin.command: cmd: "alt_rootvg_op -S -t -d {{ alt_hdisk }}"
That’s it! Next time we have downtime, we can reboot the server from our new rootvg and get updated AIX version.
Ansible Automation Platform
No, sorry! This time I was too busy and didn’t prepare any screenshots. You can look in the previous newsletter and find screenshots there. This time they wouldn’t be too much different.
Did you miss something?
It is because I forgot something! :-)
There are much more in AIX updates, not only simple installation of all filesets from an lpp_source. But it is not today. Today you can check if the playbook works for you and you can update your AIX server using it.
Have fun with AIX updates!