Forum Replies Created
-
AuthorPosts
-
You are most likely running into the IPv6 vs IPv4 issue. The site you are trying to reach linux.mellanox.com is an IPv4-only site. FIU is an IPv4 site, which is why it works seamlessly there. Whereas DALL and SEAT are IPv6-only sites. Since you are using ip forwarding rule,s as you mentioned, you may have to put those rules in the ip6tables .
The FABRIC VMs use the FABRIC DNS server which have the NAT64 capability on these racks. So i believe the issue may possibly lie with your forwarding rules.
i just did
I see the issue on your VM. I believe you are using a FABNETv4 EXT and FABnetv6 EXT. During its configuration, you may have accidentally added the NIC to be used in the default route. This caused the system to have two default routes going out two different interfaces.
2: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc fq_codel state UP group default qlen 1000
link/ether fa:16:3e:07:66:5e brd ff:ff:ff:ff:ff:ff
inet 10.30.6.168/23 metric 100 brd 10.30.7.255 scope global dynamic enp3s0
valid_lft 58217sec preferred_lft 58217sec
inet6 2001:400:a100:3030:f816:3eff:fe07:665e/64 scope global dynamic mngtmpaddr noprefixroute
valid_lft 86383sec preferred_lft 14383sec
inet6 fe80::f816:3eff:fe07:665e/64 scope link
valid_lft forever preferred_lft forever
3: enp7s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 1000
link/ether 2a:77:c8:60:d3:bf brd ff:ff:ff:ff:ff:ff
inet 10.129.130.253/24 scope global enp7s0
valid_lft forever preferred_lft forever
inet 23.134.235.195/28 scope global enp7s0
valid_lft forever preferred_lft forever
inet6 2602:fcfb:101::3/28 scope global
valid_lft forever preferred_lft forever
inet6 2602:fcfb:101:0:2877:c8ff:fe60:d3bf/64 scope global dynamic mngtmpaddr
valid_lft 2591807sec preferred_lft 604607sec
inet6 fe80::2877:c8ff:fe60:d3bf/64 scope link
valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether ca:03:f6:60:d5:34 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft foreverroot@SenderSTAR:~# ip -6 route show
::1 dev lo proto kernel metric 256 pref medium
2001:400:a100:3030::/64 dev enp3s0 proto ra metric 100 expires 86371sec pref medium
2001:400:a300::/48 via 2602:fcfb:101::1 dev enp7s0 metric 1024 pref medium
2602:fcfb:101::/64 dev enp7s0 proto kernel metric 256 expires 2591914sec pref medium
2602:fcf0::/28 dev enp7s0 proto kernel metric 256 pref medium
fe80::a9fe:a9fe via fe80::f816:3eff:fe79:edec dev enp3s0 proto ra metric 100 expires 271sec pref medium
fe80::/64 dev enp3s0 proto kernel metric 256 pref medium
fe80::/64 dev enp7s0 proto kernel metric 256 pref medium
default via fe80::f816:3eff:fe79:edec dev enp3s0 proto ra metric 100 expires 271sec mtu 9000 pref medium
default via fe80::c28b:2aff:fe82:6d02 dev enp7s0 proto ra metric 1024 expires 1714sec hoplimit 64 pref mediumAs soon as i disable the enp7s0 NIC using ip link set dev enp7s0 down , the VM started working via ssh.
It is possible that there is a routing issue when FABNETv6 is used at STAR with STAR Bastion. I will ask for this use case to be investigated.
Good news is that i have narrowed the issue down to your VM at STAR. Is the issue when using VMs at other sites too ?
Can you post the full ssh command? The issue may be the connection between STAR and the destination rack
I do see in the bastion star logs sucessfull logins from your id even from early morning today.
The fablib logs do not tell which bastion it tried. Possibly enabling verbose debug. The other thing we can try to do is ssh directly to the bastion one by to see which one fails to connect (all will fail to SSH since direct SSH is not allowed)
Here is the list of all the bastions https://learn.fabric-testbed.net/knowledge-base/frequently-asked-starter-questions/ (last question in the FAQ)
Hello Jiri,
The problem you show is different. Your attempt fails at authenticating to the bastion, which indicates you are using an incorrect bastion key. Others could not even connect to the bastion. PLease make sure you are using the right bastion key in your configuration.
One thing that stands out is that i dont see any IPv6 addresses of the bastion host in your name lookup. We had been seeing issues on IPv6 from home networks, but we believe that the workaround we placed for that has worked, as reported by other users. I would also like to see the fablib.log file. Also, could you provide your source IP, as it’s possible that one of the bastions may have banned it?
Hello Ilya,
We have added two new bastions recently and modified one of the pre-existing ones. Can you please post the result of
“nslookup bastion.fabric-testbed.net” from the machine where this failed?
The issue has been resolved. Thank you for your patience
September 22, 2025 at 1:37 pm in reply to: can’t see nvidia card though VM shows component assigned #9027You should be able to see the Nvidia card in your VM now.
Hello Nirmala,
The permission has to be 600 for ssh private keys.
chmod 600 /home/fabric/work/fabric_config/Nirmala
June 26, 2025 at 11:15 am in reply to: FABRIC NEWY and FABRIC LBNL – Network Maintenance on June 27 at 4:30 pm EST #8656No, there should be no data loss. There will be a loss of network connectivity only. Expected downtime is 2 hours, but may be extended.. You should be able to access your nodes as soon as the maintenance is complete.
I will email you the Zoom link for the meeting in a few minutes. Let me know if you do not see any email by tomorrow.
-
AuthorPosts