Charles Carpenter

Forum Replies Created

Viewing 15 posts - 1 through 15 (of 23 total)

1 2 →

Author

Posts
June 19, 2025 at 9:35 am in reply to: Node Exporter is missing many network metrics? #8627
Charles Carpenter
Participant
It would help to know how this has been setup. Did you use one of the examples from either jupyter-examples or from the Archive manager?
May 30, 2025 at 6:17 pm in reply to: Prometheus/Grafana/Node Exporter example not working? #8565
Charles Carpenter
Participant
Please note that running docker container as root is a security risk and should not be normally done. I was suggesting it for debugging purposes only.
May 30, 2025 at 11:43 am in reply to: Prometheus/Grafana/Node Exporter example not working? #8561
Charles Carpenter
Participant
I was unable to get the notebook to run correctly. It looks like this is an older notebook that needs updating. There is a problem with Grafana’s provisioning of the data source and dashboards that needs to be fixed. As a test fix, you could change the user: “472” in the monitoring docker compose to user: root This will allow the Grafana provisioning to add the data source and the dashboards. However, in my experiment the connections to the nodes are being blocked for port 9100.
May 26, 2025 at 10:45 pm in reply to: Prometheus/Grafana/Node Exporter example not working? #8527
Charles Carpenter
Participant
I can look into this for you. Can you tell me exactly which example you are following? ie jupyter-examples-rel1.8.0 and file name or from the Archive Manager?
April 1, 2025 at 3:48 pm in reply to: Unable to query Prometheus metrics with token #8406
Charles Carpenter
Participant
Greetings,

Users must be approved to use the Prometheus API. We will be adding a role for approved users. Currently we use the uuid to approve users. I have added you to the approved list.

You should be able to use the API now. Let me know if that is not the case.
February 4, 2025 at 4:40 pm in reply to: Infrastructure-metrics query locally #8175
Charles Carpenter
Participant
Try again now & check your email for more information.
February 4, 2025 at 4:38 pm in reply to: Infrastructure-metrics query locally #8173
Charles Carpenter
Participant
It looks like the formatter is messing with the double dash and changing it to a single dash. There should be a double dash before data-urlencode in the below command.
```
curl https://infrastructure-metrics.fabric-testbed.net/query -H “Authorization: fabric-token xxxx” --data-urlencode ‘query=rate(ifHCOutOctets[5m])’
```
February 4, 2025 at 4:25 pm in reply to: Infrastructure-metrics query locally #8168
Charles Carpenter
Participant
There is a minor typo in the above query should be — before data

curl https://infrastructure-metrics.fabric-testbed.net/query -H “Authorization: fabric-token xxxx” –data-urlencode ‘query=rate(ifHCOutOctets[5m])’

I did find an error in the new code that would cause a 500 error if the given token did not represent a valid user.
February 4, 2025 at 3:50 pm in reply to: Infrastructure-metrics query locally #8167
Charles Carpenter
Participant
Here is the code snippet.
curl https://infrastructure-metrics.fabric-testbed.net/query -H “Authorization: fabric-token YOURTOKEN” –data-urlencode ‘query=rate(ifHCOutOctets[5m])’
February 4, 2025 at 2:38 pm in reply to: Infrastructure-metrics query locally #8163
Charles Carpenter
Participant
I’m not sure I understand what the fail is that gives you the 500 error.

Are you saying that you can see the data on the website Grafana GUI but making the same calls to https://infrastructure-metrics.fabric-testbed.net/query does not work?

Or do some calls to https://infrastructure-metrics.fabric-testbed.net/query work and some do not depending on where the call is made from?

There were some updates made late Friday to the query code, but those “should” not affect the calls. I can still use the example curl calls successfully from jupyter hub or from my local laptop.
If you were trying on Friday, you may have unluckly hit the updating time.
January 23, 2025 at 11:31 am in reply to: Power consumption of VM #8118
Charles Carpenter
Participant
We currently do not have any ways to monitor the power usage of a VM.
I will take a look at the tools you have mentioned.
You could also try installing those tools on a VM to see if they are compatible. I would be interested in those results.
December 11, 2024 at 1:07 pm in reply to: Infrastructure-metrics queries #7939
Charles Carpenter
Participant
I am not sure how you define the “correct” value for the HC counter. It can cover exa- values (10^18) which would be 100s of years at 100M Packets per second. I assume the HC counter is reset on device restart, so roll over should not be a worry. I don’t know what other events would cause the counter to be reset.

In general interfaces that are:
< 20Mbs use 32 bit counters
> 20Mbs & < 650Mbs use 32 or 64 bit counters
> 650Mbs use 64 bit counters
see https://www.cisco.com/c/en/us/support/docs/ip/simple-network-management-protocol-snmp/26007-faq-snmpcounter.html#toc-hId–1387592458
and https://www.ietf.org/rfc/rfc2233.txt
32 bit counters should reset to 0 on overflow.

Counters are relative to the time frame you are analyzing. A benefit of using Prometheus to query the data are the built-in functions.
Consider using the “rate” function. This function is aware of counter resets and will adjust the values as needed. see https://prometheus.io/docs/prometheus/latest/querying/functions/#rate
November 6, 2024 at 6:22 pm in reply to: Regarding measuring CPU load, RAM utilization and other metrics in a slice #7775
Charles Carpenter
Participant
Here are the instructions for getting metric data programmatically using curl.

Create a jupyter notebook with the following cells.

# Import MFLib Class
from mflib.mflib import MFLib
slice_name = “<your slice name>”
mf = MFLib(slice_name)

# Get the ht_user & ht_password for the slice’s meas_node.
data = {}
# Set the info you want to get.
data[“get”] = [“ht_user”, “ht_password”]
# Call info using data
info_results = mf.info(“prometheus”, data)
print(info_results)

Alternatively you can just add the second cell above to the existing prometheus_grafana.ipynb notebook.

Create a tunnel thru the bastion host for port 9090. This is similar to the tunnel needed for accessing grafana but using port 9090:localhost:9090 instead of 10010:localhost:443.
The above cell should print out the meas_node_ip as “Found meas node as meas-node at <your meas_node_ip>”

ssh -L 9090:localhost:9090 -F ssh_config -i slice_key ubuntu@<your meas_node ip>

Then make API call using curl or python requests etc…
Here is a simple example using the ht_user and ht_password retrieved above to get the latest up metrics.

curl -k -u <ht_user>:<ht_password> https://localhost:9090/api/v1/query\?query\=up
October 2, 2024 at 10:43 am in reply to: Regarding measuring CPU load, RAM utilization and other metrics in a slice #7593
Charles Carpenter
Participant
There is a REST API that can be used to access the metrics programmatically in addition to the Grafana views. I will have to add the documentation for that. I’ll post back here once that is done.
August 9, 2024 at 12:27 am in reply to: MFLIib overhead/measurements #7396
Charles Carpenter
Participant
The measurement framework sets up its own network to get data from the experimental nodes, therefore experimental networks are mostly unaffected. You could use the metrics, node_network_receive_packets_total and node_network_transmit_packets_total, to see data going in/out of the network interface used for the measurement network to get an idea of the network use.

Most of the Prometheus/Grafana & ELK monitoring processes take place on the measurement node, therefore they have little effect on the experiment nodes.
What is running on the experiment nodes are the node_exporters & file beats.
The node exporter is a binary that only uses resources when it is asked for metrics. see https://github.com/prometheus/node_exporter for information about the resources it uses. You can see what it is doing via top/htop. The node exporter is written in GO and exports some go metrics about its own resource usage. Use the Explore page to search for

{ __name__=~”go_.*”, job=”node”}

ELK beats are similar but may use more resources since they monitor logs and periodically ship out data. You may also see filebeat, metricbeat and packet beat resource usage using top/htop. I am not sure of the current status of self monitoring of ELK beats.
Author

Posts

Viewing 15 posts - 1 through 15 (of 23 total)

1 2 →