Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSVM System VM fails to add static route to NFS server (ip route add x.x.x.x via null) #10163

Open
KobesM opened this issue Jan 5, 2025 · 11 comments

Comments

@KobesM
Copy link

KobesM commented Jan 5, 2025

ISSUE TYPE
  • Bug Report
COMPONENT NAME
secondarystoragevm
CLOUDSTACK VERSION
4.20.0.0
CONFIGURATION

Advanced Networking
Single Zone
physical network

  • Management gw 10.143.51.1, mask 255.255.255.0, vlan://untagged, start 10.143.51.151, end 10.143.51.200
  • Public gw 10.143.51.1, mask 255.255.255.0, vlan://51, start 10.143.51.101, end 10.143.51.150
OS / ENVIRONMENT

Almalinux 9.5 Manager and KVM hosts

SUMMARY

I just deployed Apache Cloudstack 4.20 as a fresh install but can't upload images to the secondairy storage. When uploading the image using the UI I receive the following error "Failed to upload Template - Error: Network Error".

STEPS TO REPRODUCE

When logging into the secondarystoragevm (ssh from KVM host) i found the following errors in the /var/log/cloud.log:

2025-01-05T20:45:36,290 WARN  [cloud.agent.Agent] (agentRequest-Handler-3:[]) Caught: com.cloud.utils.exception.CloudRuntimeException: Failed to get root directory from secondary storage URL [nfs://<nfs path>], using NFS version [null], due to [Unable to mount /192.168.1.21:<nfs path> at /mnt/SecStorage/d096bb1f-552a-3424-ab65-63c694685108 due to mount.nfs: No route to host].

and

2025-01-05T20:19:35,110 WARN  [storage.resource.NfsSecondaryStorageResource] (agentRequest-Handler-2:[]) Execution of process [3587] for command [/bin/bash -c ip route add 192.168.1.21 via null ] failed.

Requesting the route table there is no route to 192.168.1.21 (nfs server):

# ip route
default via 10.143.51.1 dev eth2
10.143.51.0/24 dev eth1 proto kernel scope link src 10.143.51.196
10.143.51.0/24 dev eth2 proto kernel scope link src 10.143.51.101
169.254.0.0/16 dev eth0 proto kernel scope link src 169.254.46.109
192.168.1.22 via 10.143.51.1 dev eth1
EXPECTED RESULTS
System VM would add a route to the NFS server.
ACTUAL RESULTS
It seams to me for some reason the system VM is not able to add the correct route to the NFS server, which in this case is a Synology NAS. The log states it wants to add a static route using the device "null"? I found the consoleproxy vm does not have this issue and adds the static route to 192.168.1.21 without issues (Synology NAS also acts as the internal DNS server).
@iishitahere
Copy link

Dear @KobesM @DaanHoogland ,

I hope this message finds you well.

I have reviewed the details of the issue regarding the secondary storage VM failing to add the correct route to the NFS server. It seems like a routing misconfiguration or a network connectivity issue is causing the problem.

I would like to take this opportunity to work on resolving this issue. If the issue is still open, could you please assign it to me? I will ensure to provide updates regularly and propose a fix after thoroughly investigating the root cause.

Looking forward to your confirmation and any additional input or guidance you may have regarding this issue.

Best regards,
Ishita Jaiswal

@iishitahere
Copy link

Hi @DaanHoogland ,
I have reviewed the issue and my approach to address it. The problem seems to be that the system VM is unable to add the correct route to the NFS server, as indicated by the error logs showing the use of null in the route table, which leads to the "No route to host" error.

My approach to solve this issue would involve the following steps:

Investigate Route Table Configuration: I'll start by reviewing the route table configuration on the system VM and identifying why the static route to the NFS server is not being added correctly. It seems that the null value is being passed where an actual device or gateway should be specified.

Verify Network Interfaces and Gateway Configuration: I will check the network interfaces on the secondary storage VM and confirm that the correct gateway is set, ensuring it is able to route traffic to the 192.168.1.21 NFS server.

Check NFS Mount Configuration: I'll verify the NFS configuration and ensure that the system VM has proper access permissions and that the NFS server is reachable from the secondary storage VM.

Manually Add Route (Testing): I’ll test adding the static route manually using the ip route add command to see if it resolves the issue and confirm that the system VM can reach the NFS server.

Implement Solution in Code: Once I have identified the root cause, I will make any necessary changes to the code to ensure the system VM correctly adds the route to the NFS server. This may involve updating the networking or route handling logic within CloudStack.

I will proceed with debugging the issue and keep you updated on my progress. Let me know if there are any additional details that could assist in troubleshooting.

Best Regards,
Ishita Jaiswal

@weizhouapache
Copy link
Member

Hi @DaanHoogland , I have reviewed the issue and my approach to address it. The problem seems to be that the system VM is unable to add the correct route to the NFS server, as indicated by the error logs showing the use of null in the route table, which leads to the "No route to host" error.

My approach to solve this issue would involve the following steps:

Investigate Route Table Configuration: I'll start by reviewing the route table configuration on the system VM and identifying why the static route to the NFS server is not being added correctly. It seems that the null value is being passed where an actual device or gateway should be specified.

Verify Network Interfaces and Gateway Configuration: I will check the network interfaces on the secondary storage VM and confirm that the correct gateway is set, ensuring it is able to route traffic to the 192.168.1.21 NFS server.

Check NFS Mount Configuration: I'll verify the NFS configuration and ensure that the system VM has proper access permissions and that the NFS server is reachable from the secondary storage VM.

Manually Add Route (Testing): I’ll test adding the static route manually using the ip route add command to see if it resolves the issue and confirm that the system VM can reach the NFS server.

Implement Solution in Code: Once I have identified the root cause, I will make any necessary changes to the code to ensure the system VM correctly adds the route to the NFS server. This may involve updating the networking or route handling logic within CloudStack.

I will proceed with debugging the issue and keep you updated on my progress. Let me know if there are any additional details that could assist in troubleshooting.

Best Regards, Ishita Jaiswal

@iishitahere
in my opinion, the root cause is obvious and the fix seems simple
you can create a PR if you have the fix

by the way, do you have an environment to verify the fix ?

@iishitahere
Copy link

Hi @DaanHoogland , I have reviewed the issue and my approach to address it. The problem seems to be that the system VM is unable to add the correct route to the NFS server, as indicated by the error logs showing the use of null in the route table, which leads to the "No route to host" error.
My approach to solve this issue would involve the following steps:
Investigate Route Table Configuration: I'll start by reviewing the route table configuration on the system VM and identifying why the static route to the NFS server is not being added correctly. It seems that the null value is being passed where an actual device or gateway should be specified.
Verify Network Interfaces and Gateway Configuration: I will check the network interfaces on the secondary storage VM and confirm that the correct gateway is set, ensuring it is able to route traffic to the 192.168.1.21 NFS server.
Check NFS Mount Configuration: I'll verify the NFS configuration and ensure that the system VM has proper access permissions and that the NFS server is reachable from the secondary storage VM.
Manually Add Route (Testing): I’ll test adding the static route manually using the ip route add command to see if it resolves the issue and confirm that the system VM can reach the NFS server.
Implement Solution in Code: Once I have identified the root cause, I will make any necessary changes to the code to ensure the system VM correctly adds the route to the NFS server. This may involve updating the networking or route handling logic within CloudStack.
I will proceed with debugging the issue and keep you updated on my progress. Let me know if there are any additional details that could assist in troubleshooting.
Best Regards, Ishita Jaiswal

@iishitahere in my opinion, the root cause is obvious and the fix seems simple you can create a PR if you have the fix

by the way, do you have an environment to verify the fix ?

Yes, I do have the environment to verify the fix. I’m currently working on the PR and will update you once it's ready.
I will submit by day after tomorrow for sure!

@KobesM
Copy link
Author

KobesM commented Jan 10, 2025

A small update from my side... I verified if manually adding the route to the nfs server would work:

After logging into the SSVM the /var/log/cloud.log has the following warning logged as stated earlier:

2025-01-08T21:53:23,824 WARN  [storage.resource.NfsSecondaryStorageResource] (agentRequest-Handler-2:[]) Execution of process [3611] for command [/bin/bash -c ip route add 192.168.1.21 via null ] failed.

At this point there is no route to 192.168.1.21:

root@s-2-VM:~# ip route
default via 10.143.71.1 dev eth2
10.143.51.0/24 dev eth1 proto kernel scope link src 10.143.51.111
10.143.71.0/24 dev eth2 proto kernel scope link src 10.143.71.102
169.254.0.0/16 dev eth0 proto kernel scope link src 169.254.17.84
192.168.1.22 via 10.143.51.1 dev eth1

So I manually added the route:

root@s-2-VM:~# ip route add 192.168.1.21 via 10.143.51.1

The route to 192.168.1.21 now does exist:

root@s-2-VM:~# ip route
default via 10.143.71.1 dev eth2
10.143.51.0/24 dev eth1 proto kernel scope link src 10.143.51.111
10.143.71.0/24 dev eth2 proto kernel scope link src 10.143.71.102
169.254.0.0/16 dev eth0 proto kernel scope link src 169.254.17.84
192.168.1.21 via 10.143.51.1 dev eth1
192.168.1.22 via 10.143.51.1 dev eth1

Also the /var/log/cloud.log reports it found the nfs server and created the required folders:

2025-01-10T19:34:57,532 INFO  [storage.resource.NfsSecondaryStorageResource] (agentRequest-Handler-5:[]) Determined host nfs01.local.ironhive.nl corresponds to IP 192.168.1.21
2025-01-10T19:34:57,927 INFO  [storage.resource.NfsSecondaryStorageResource] (agentRequest-Handler-5:[]) snapshots directory created/exists on Secondary Storage.
2025-01-10T19:34:57,939 INFO  [storage.resource.NfsSecondaryStorageResource] (agentRequest-Handler-5:[]) volumes directory created/exists on Secondary Storage.

At this point I was also able to upload an image from the UI.

@iishitahere
Copy link

Dear @DaanHoogland @weizhouapache,

I hope you're doing well.

I am currently working on an issue related to uploading images to secondary storage in Apache CloudStack, specifically with the SecondaryStorageVM component in version 4.20.0.0.

While I have been investigating this locally and reviewing the logs from the SSVM, I’ve encountered limitations in replicating the full CloudStack environment, I believe that testing in the official environment will provide more accurate results.

In my local environment, I was able to manually add the route to the NFS server, but I need to confirm the fix in the complete setup and infrastructure. Access to the official environment would greatly enhance my ability to troubleshoot and implement a comprehensive solution.

I would appreciate it if you could grant me access to the necessary resources for further testing and resolution of this issue.

Thank you for your time and consideration.

Best regards,
Ishita

@weizhouapache
Copy link
Member

Dear @DaanHoogland @weizhouapache,

I hope you're doing well.

I am currently working on an issue related to uploading images to secondary storage in Apache CloudStack, specifically with the SecondaryStorageVM component in version 4.20.0.0.

While I have been investigating this locally and reviewing the logs from the SSVM, I’ve encountered limitations in replicating the full CloudStack environment, I believe that testing in the official environment will provide more accurate results.

In my local environment, I was able to manually add the route to the NFS server, but I need to confirm the fix in the complete setup and infrastructure. Access to the official environment would greatly enhance my ability to troubleshoot and implement a comprehensive solution.

I would appreciate it if you could grant me access to the necessary resources for further testing and resolution of this issue.

Thank you for your time and consideration.

Best regards, Ishita

@iishitahere
I can give you some hints

if there is no storage network, cloudstack will use the management network.
in this case, the storageip in SSVM is null, we should use management network ip as storage ip in SSVM.
so the fix is simple like

if (_storageIp == null) {
   _storageIp = eth1ip;
}

@iishitahere
Copy link

Dear @DaanHoogland @weizhouapache,
I hope you're doing well.
I am currently working on an issue related to uploading images to secondary storage in Apache CloudStack, specifically with the SecondaryStorageVM component in version 4.20.0.0.
While I have been investigating this locally and reviewing the logs from the SSVM, I’ve encountered limitations in replicating the full CloudStack environment, I believe that testing in the official environment will provide more accurate results.
In my local environment, I was able to manually add the route to the NFS server, but I need to confirm the fix in the complete setup and infrastructure. Access to the official environment would greatly enhance my ability to troubleshoot and implement a comprehensive solution.
I would appreciate it if you could grant me access to the necessary resources for further testing and resolution of this issue.
Thank you for your time and consideration.
Best regards, Ishita

@iishitahere I can give you some hints

if there is no storage network, cloudstack will use the management network. in this case, the storageip in SSVM is null, we should use management network ip as storage ip in SSVM. so the fix is simple like

if (_storageIp == null) {
   _storageIp = eth1ip;
}

Thanks for the suggestion. I'll proceed with this approach and ensure the correct IP is used when the storage network is unavailable. Please let me know if there are any additional considerations or if I can submit the pull request once the changes are made.

@iishitahere
Copy link

Hi @DaanHoogland @weizhouapache ,

I hope you're doing well.

I have submitted a PR that addresses the issue related to static route configuration in the secondary storage VM (NfsSecondaryStorageResource). This fix resolves the problem where the route addition fails due to a null storage IP, causing errors like ip route add ... via null.

I sincerely apologize for not being active in recent weeks. Due to my exams and some unforeseen circumstances, I was unable to contribute as consistently as I had hoped. However, I am fully available now to respond to feedback and make any changes necessary to improve the PR.

Thank you for your time and for reviewing my contribution. I look forward to your feedback and guidance.

Best regards,
Ishita Jaiswal

@DaanHoogland
Copy link
Contributor

Hi @DaanHoogland @weizhouapache ,

I hope you're doing well.

I have submitted a PR that addresses the issue related to static route configuration in the secondary storage VM (NfsSecondaryStorageResource). This fix resolves the problem where the route addition fails due to a null storage IP, causing errors like ip route add ... via null.

thanks @iishitahere , I have editted #10304 , in the assumption that that is the PR you mention. We will get to reviewing it. I have added the reference to this issue in it.

One other initial remark; you have based the fix on main, but the issue is reported on 4.20. Do you think you could rebase your solution on the 4.20 release branch?

I sincerely apologize for not being active in recent weeks. Due to my exams and some unforeseen circumstances, I was unable to contribute as consistently as I had hoped. However, I am fully available now to respond to feedback and make any changes necessary to improve the PR.

Please don't apologise, some of us have other duties at work, others have studies, and again others work completely in their own time. But thanks for letting us know anyway.

@iishitahere
Copy link

Hi @DaanHoogland @weizhouapache ,
I hope you're doing well.
I have submitted a PR that addresses the issue related to static route configuration in the secondary storage VM (NfsSecondaryStorageResource). This fix resolves the problem where the route addition fails due to a null storage IP, causing errors like ip route add ... via null.

thanks @iishitahere , I have editted #10304 , in the assumption that that is the PR you mention. We will get to reviewing it. I have added the reference to this issue in it.

One other initial remark; you have based the fix on main, but the issue is reported on 4.20. Do you think you could rebase your solution on the 4.20 release branch?

I sincerely apologize for not being active in recent weeks. Due to my exams and some unforeseen circumstances, I was unable to contribute as consistently as I had hoped. However, I am fully available now to respond to feedback and make any changes necessary to improve the PR.

Please don't apologise, some of us have other duties at work, others have studies, and again others work completely in their own time. But thanks for letting us know anyway.

Thank you for understanding! I appreciate the support and flexibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

4 participants