Forum Replies Created
-
AuthorPosts
-
Hi Raghav,
The data plane interfaces on your VMs connected via L2STS do not have IP addresses configured.
The
enp3s0
interface on your VMs is designated as the management interface and should be used solely for SSH access. For your experiment, please use the data plane interfaces, which areenp7s0
on both VMs.I recommend exploring the JH example—Wide Area Link (Layer 2)—using manual, auto, or user-defined configurations, as it demonstrates how IP addresses should be set up. Please, let us know if you encounter any further issues.
Snapshot from the VMs:
root@4f3a79fa-6e29-454e-9ec4-d1bfbda81a17-bapi-v2:~# ifconfig -a
enp1s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000
inet 10.30.6.167 netmask 255.255.254.0 broadcast 10.30.7.255
inet6 fe80::f816:3eff:fe82:7b9 prefixlen 64 scopeid 0x20
inet6 2001:400:a100:3070:f816:3eff:fe82:7b9 prefixlen 64 scopeid 0x0
ether fa:16:3e:82:07:b9 txqueuelen 1000 (Ethernet)
RX packets 51778 bytes 150077282 (150.0 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 25537 bytes 2608566 (2.6 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enp6s0: flags=4098<BROADCAST,MULTICAST> mtu 1500
ether 06:b7:27:d2:b5:0b txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10
loop txqueuelen 1000 (Local Loopback)
RX packets 178 bytes 23663 (23.6 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 178 bytes 23663 (23.6 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
root@bd65ee61-46a2-4cb2-b89e-c6b385052336-bapi-vm1:~# ifconfig -a
enp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000
inet 10.20.5.38 netmask 255.255.254.0 broadcast 10.20.5.255
inet6 fe80::f816:3eff:fe55:c84f prefixlen 64 scopeid 0x20
ether fa:16:3e:55:c8:4f txqueuelen 1000 (Ethernet)
RX packets 15231 bytes 146475806 (146.4 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 13258 bytes 1020159 (1.0 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enp7s0: flags=4098<BROADCAST,MULTICAST> mtu 1500
ether 16:8a:89:5e:75:97 txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10
loop txqueuelen 1000 (Local Loopback)
RX packets 238 bytes 37767 (37.7 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 238 bytes 37767 (37.7 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
Thanks,
Komal
Hi Kriti,
I think the attachment got lost. Could you please email it to me directly at kthare10@renci.org?
Thanks,
Komal
Hi Kriti,
I can verify that your slice is StableOK. Please check the portal to verify that.
This looks like a bug, where fablib may be reflecting the stale state. If possible, could you please share your notebook, so I can try to reproduce this on my end and fix the bug? Appreciate your help with this.
Slice Name: Eibp_large_PRIN Slice ID: 02ff4c5b-140e-4e2a-ab2a-02ca8bd45ca3 Project ID: 787adfc9-d37e-42f2-8efe-8e32793e0bb8 Project Name: Expedited Internet Bypass Protocol
Graph ID: 53e16605-016c-4e2a-8754-336effb71188
Slice owner: { name: orchestrator, guid: orchestrator-guid, oidc_sub_claim: 5cf08403-b13e-458a-a17d-ee69abdacffa}
Slice state: StableOK
Lease time: 2025-02-05 18:27:56+00:00
Thanks,
Komal
Hi Kriti,
Could you please share your slice id?
Thanks,
Komal
February 4, 2025 at 1:30 pm in reply to: Error message: strptime() argument 1 must be str, not None #8157Hi Vaneshi,
I am unable to reproduce this with any of the JH containers. I do notice a small error in the API posted above.
quiet
parameter is a boolean.Could you please check the following?
- You have a valid token in
~/.tokens.json
- Try the snippet below
from fabrictestbed_extensions.fablib.fablib import FablibManager as fablib_manager
fablib = fablib_manager()
cx5_column_name = 'nic_connectx_5_available'
cx6_column_name = 'nic_connectx_6_available'
sites_connectx_json = fablib.list_sites(
output="json",
quiet=True,
filter_function=lambda x: x[cx6_column_name] > 0 or x[cx5_column_name] > 0,
latlon=False,
)
print(sites_connectx_json)
Please let me know if you still run into errors.
Thanks,
Komal
Hi Raghav,
Could you please check if the interfaces on the VMs have the IP addresses configured?
Also, please share the Slice ID for your slice. This will help us take a look at it as well.
Thanks,
Komal
Hi Ilya,
Thanks for the kind words—we appreciate the feedback!
To ensure the network interfaces retain their configuration after rebooting, please use the following code to reconfigure all nodes in the slice:
for n in slice.get_nodes(): n.config()
This will restore the network configurations.
Regarding DPUs, we’re currently exploring BlueField 2 DPUs and targeting summer for initial support, with more details coming soon. Stay tuned for updates!
Please let us know if you run into any other issues.
Best Regards,
Komal
Hi Sourya,
It appears that your slice utilizes a Port Mirror service, which may not yet be supported by the Slice Viewer. We will check with Yaxue, who works on the Portal, to confirm this. We are working on adding support for this feature in the next release. Apologies for any inconvenience.
Thanks,
Komal
Hi Tanay,
Based on the details of the ConnectX-6 from one of the FABRIC VMs and the DPDK documentation you shared (DPDK MLX5 Guide), it appears that the ConnectX-6 available in FABRIC is not supported. We are currently working on integrating new BlueField DPUs, which may provide a suitable solution.
[root@Node2 ~]# lspci | grep X-6
06:00.0 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6]
07:00.0 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6]
[root@Node2 ~]# mstconfig -d 06:00.0 query
Device #1:
----------
Device type: ConnectX6
Name: MCX653106A-ECA_Ax
Description: ConnectX-6 VPI adapter card; H100Gb/s (HDR100; EDR IB and 100GbE); dual-port QSFP56; PCIe3.0 x16; tall bracket; ROHS R6
Device: 06:00.0
Thanks,
Komal
Hi Rajiv,
L2STS links should work with SharedNICs. Could you please share your slice details where this is not working?
Thanks,
Komal
I also just noticed the attached screenshot. The Permission error is related to SmartNIC usage. To use SmartNICs for your project, you still need the Component.SmartNIC permission.
Thanks,
KomalHi Vaneshi,
The fix for this was deployed. I will check this again if you are still noticing this error.
Please note the pre-requisite for this is that both the listener and the monitored node are in the same slice.
Thanks,
Komal
January 21, 2025 at 9:05 am in reply to: SSH from Windows Terminal not working sent by Raghav Sinha 17.January 2025 #8108Hi Jiri,
Could you please share the Fingerprint for your old SSH key which is not expired but not working now?
Thanks,
Komal
January 18, 2025 at 11:11 am in reply to: Issue Connecting via SSH to Specific Node in Topology #8100Hi Yuanjun,
Your slice is already in a Dead state, meaning all associated resources have been released.
Please try creating your slice again and let us know if the issue persists. To help us investigate potential problems before expiration, consider extending your slice’s lifetime if you encounter this issue again.
Slice Name: byteps_8node_GPN_lamb Slice ID: 0e99c5ea-76d2-4189-ba2e-817a80fa8d29 Project ID: 34a45f8f-be0e-4efc-a91c-38358ce4ca29 Project Name: Ensemble Inference
Graph ID: 070d665f-5fcc-467e-9afa-d1d9f2c2f11c
Slice owner: { name: orchestrator, guid: orchestrator-guid, oidc_sub_claim: 82e78849-be30-4290-a225-50040c065e4e, email: yuanjun.dai@case.edu}
Slice state: Dead
Lease time: 2025-01-31 02:15:43+00:00
Thanks,
Komal
January 17, 2025 at 4:39 pm in reply to: L2 Interfaces on my slice transitioning to DOWN State #8097Subject: Network Configuration Issue on Slice VMs
Hi Prateek,
I checked your Slice. Could you share the VMs and sites where the network configuration was lost?
The WASH and STAR site workers were rebooted due to another issue, which may have caused this disruption. Please note that, in the current version, fablib configures interfaces using
ip
commands, which are not persistent across reboots.We are working on making this configuration reboot persistent. In the meantime, please consider using NetworkManager or netplan to configure the interfaces in a way that persists after a reboot.
Additionally, we are addressing the underlying issue that required the worker node reboots.
Apologies for the inconvenience, and thank you for your patience!
Best,
Komal - You have a valid token in
-
AuthorPosts