1. Charles Carpenter

Charles Carpenter

Forum Replies Created

Viewing 15 posts - 1 through 15 (of 18 total)
  • Author
    Posts
  • in reply to: Infrastructure-metrics query locally #8175
    Charles Carpenter
    Participant

      Try again now & check your email for more information.

       

      in reply to: Infrastructure-metrics query locally #8173
      Charles Carpenter
      Participant

        It looks like the formatter is messing with the double dash and changing it to a single dash. There should be a double dash before data-urlencode in the below command.

        curl https://infrastructure-metrics.fabric-testbed.net/query -H “Authorization: fabric-token xxxx” --data-urlencode ‘query=rate(ifHCOutOctets[5m])’

         

        in reply to: Infrastructure-metrics query locally #8168
        Charles Carpenter
        Participant

          There is a minor typo in the above query should be — before data

          curl https://infrastructure-metrics.fabric-testbed.net/query -H “Authorization: fabric-token xxxx” –data-urlencode ‘query=rate(ifHCOutOctets[5m])’

          I did find an error in the new code that would cause a 500 error if the given token did not represent a valid user.

           

          in reply to: Infrastructure-metrics query locally #8167
          Charles Carpenter
          Participant

            Here is the code snippet.
            curl https://infrastructure-metrics.fabric-testbed.net/query -H “Authorization: fabric-token YOURTOKEN” –data-urlencode ‘query=rate(ifHCOutOctets[5m])’

            in reply to: Infrastructure-metrics query locally #8163
            Charles Carpenter
            Participant

              I’m not sure I understand what the fail is that gives you the 500 error.

              Are you saying that you can see the data on the website Grafana GUI but making the same calls to https://infrastructure-metrics.fabric-testbed.net/query does not work?

              Or do some calls to  https://infrastructure-metrics.fabric-testbed.net/query work and some do not depending on where the call is made from?

              There were some updates made late Friday to the query code, but those “should” not affect the calls.  I can still use the example curl calls successfully from jupyter hub or from my local laptop.
              If you were trying on Friday, you may have unluckly hit the updating time.

              in reply to: Power consumption of VM #8118
              Charles Carpenter
              Participant

                We currently do not have any ways to monitor the power usage of a VM.
                I will take a look at the tools you have mentioned.
                You could also try installing those tools on a VM to see if they are compatible. I would be interested in those results.

                in reply to: Infrastructure-metrics queries #7939
                Charles Carpenter
                Participant

                  I am not sure how you define the “correct” value for the HC counter. It can cover exa- values (10^18) which would be 100s of years at 100M Packets per second.  I assume the HC counter is reset on device restart, so roll over should not be a worry. I don’t know what other events would cause the counter to be reset.

                  In general interfaces that are:
                  < 20Mbs use 32 bit counters
                  > 20Mbs & < 650Mbs use 32 or 64 bit counters
                  > 650Mbs use 64 bit counters
                  see https://www.cisco.com/c/en/us/support/docs/ip/simple-network-management-protocol-snmp/26007-faq-snmpcounter.html#toc-hId–1387592458
                  and https://www.ietf.org/rfc/rfc2233.txt
                  32 bit counters should reset to 0 on overflow.

                  Counters are relative to the time frame you are analyzing. A benefit of using Prometheus to query the data are the built-in functions.
                  Consider using the “rate” function. This function is aware of counter resets and will adjust the values as needed. see https://prometheus.io/docs/prometheus/latest/querying/functions/#rate

                  Charles Carpenter
                  Participant

                    Here are the instructions for getting metric data programmatically using curl.

                    Create a jupyter notebook with the following cells.

                    # Import MFLib Class
                    from mflib.mflib import MFLib
                    slice_name = “<your slice name>”
                    mf = MFLib(slice_name)

                    # Get the ht_user & ht_password for the slice’s meas_node.
                    data = {}
                    # Set the info you want to get.
                    data[“get”] = [“ht_user”, “ht_password”]
                    # Call info using data
                    info_results = mf.info(“prometheus”, data)
                    print(info_results)

                    Alternatively you can just add the second cell above to the existing prometheus_grafana.ipynb notebook.

                    Create a tunnel thru the bastion host for port 9090. This is similar to the tunnel needed for accessing grafana but using port 9090:localhost:9090 instead of 10010:localhost:443.
                    The above cell should print out the meas_node_ip as “Found meas node as meas-node at <your meas_node_ip>”

                    ssh -L 9090:localhost:9090 -F ssh_config -i slice_key ubuntu@<your meas_node ip>

                    Then make API call using curl or python requests etc…
                    Here is a simple example using the ht_user and ht_password retrieved above to get the latest up metrics.

                    curl -k -u <ht_user>:<ht_password> https://localhost:9090/api/v1/query\?query\=up

                    Charles Carpenter
                    Participant

                      There is a REST API that can be used to access the metrics programmatically in addition to the Grafana views.  I will have to add the documentation for that. I’ll post back here once that is done.

                       

                      in reply to: MFLIib overhead/measurements #7396
                      Charles Carpenter
                      Participant

                        The measurement framework sets up its own network to get data from the experimental nodes, therefore experimental networks are mostly unaffected. You could use the metrics, node_network_receive_packets_total and node_network_transmit_packets_total, to see data going in/out of the network interface used for the measurement network to get an idea of the network use.

                        Most of the Prometheus/Grafana & ELK monitoring processes take place on the measurement node, therefore they have little effect on the experiment nodes.
                        What is running on the experiment nodes are the node_exporters & file beats.
                        The node exporter is a binary that only uses resources when it is asked for metrics. see https://github.com/prometheus/node_exporter for information about the resources it uses. You can see what it is doing via top/htop.  The node exporter is written in GO and exports some go metrics about its own resource usage. Use the Explore page to search for

                        { __name__=~”go_.*”, job=”node”}

                        ELK beats are similar but may use more resources since they monitor logs and periodically ship out data. You may also see filebeat, metricbeat and packet beat resource usage using top/htop. I am not sure of the current status of self monitoring of ELK beats.

                        Charles Carpenter
                        Participant

                          MFLib currently only performs measurements on a slice.

                          The Portal, Fabric Portal (fabric-testbed.net) , has information about available resources. Some of that information is available via fablib as KC points to in the previous answer.

                          Specific infrastructure metrics are available on the infrastructure-metrics.fabric-testbed.net site. These include memory use, cpu load etc… on head and worker nodes. These values are visible using Grafana. There is also a REST API soon to be available to query those values programmatically.

                          in reply to: Mflib – Prometheus instrumentize error #7091
                          Charles Carpenter
                          Participant

                            The ELK mirror problem with Centos/Rocky 8 has been fixed.

                             

                            in reply to: Mflib – Prometheus instrumentize error #7042
                            Charles Carpenter
                            Participant

                              The MeasurementFramework has been updated to fix the docker conflict.
                              Prometheus system is now working. There is an error in a script due to a mirror problem, but this does not affect the Prometheus install.

                              The ELK install has a fatal mirror problem that remains to be fixed. I will post here when that is completed.

                              in reply to: Mflib – Prometheus instrumentize error #7040
                              Charles Carpenter
                              Participant

                                Thanks for bringing this to our attention.

                                In the above result, setting up Prometheus returned ‘success’: False, ‘msg’: ‘Prometheus playbook install failed’ This means that Grafana, which is part of the Promtheus install was most likely not installed.
                                The ssh tunnel to trying to connect to Grafana’s port is unable to connect since there is no Grafana running, thus the channel 3: open failed: connect failed: Connection refused error.
                                The error was caused by an ansible related update which broke the installation process. We have found the problem and will be pushing out a fix today.

                                 

                                Charles Carpenter
                                Participant
                                  There are a couple of methods to export metrics from your running python code.
                                  For a general overview on writing prometheus exporters see https://prometheus.io/docs/instrumenting/writing_exporters/
                                  Option 1) Create your own exporter in python. This will run a small http server and will allow Prometheus to query your code every x seconds (30 seconds is usually the default). This is best for a consistently running process. Python has a promtheus_client module, pip install prometheus-client that handles most of the work for you. You will need to add a function that will be called whenever a prometheus instance makes the request. See https://prometheus.github.io/client_python/getting-started/three-step-demo/  and  https://pypi.org/project/prometheus-client/  and  https://github.com/prometheus/client_python
                                  You can do a test of the running exporter using curl or wget with the address of the exporter and the path /metrics.
                                  Next configure the prometheus on the meas_node to scrape your newly created exporter. The config file is /opt/fabric_prometheus/prometheus/prometheus_config.yml . ssh to the meas_node and sudo vim /opt/fabric_prometheus/prometheus/prometheus_config.yml Add the new scrape section at the end of the file.
                                  Something like

                                   

                                  # My Exporter
                                  - job_name: 'my_exporter_name'
                                    static_configs:
                                    - targets: ['my-exporter-address:port']
                                  The scrape will default to the /metrics path.

                                  Save the file and

                                  docker restart fabric_prometheus_prometheus
                                  Use the Explore tab in Grafana with the PromQl {job=”my_exporter_name”} to see the metrics.

                                   

                                  Option 2) Use the node_exporters textfile collector. This is best for sporatic metrics, perhaps a cron job that runs hourly. The collector reads text files found in the /var/lib/node_exporter directory. The files need to be in the format found at https://prometheus.io/docs/instrumenting/exposition_formats/#text-based-format There is a python module for writing out text files See https://prometheus.github.io/client_python/exporting/textfile/ https://github.com/prometheus/node_exporter#textfile-collector

                                  That should get you started. Let me know if you have more questions.
                                  -Charles

                                   

                                Viewing 15 posts - 1 through 15 (of 18 total)
                                FABRIC invites nominations for four awards recognizing innovative uses of FABRIC resources—Best Published Paper, Best FABRIC Matrix, Best FABRIC Experiment, and Best Classroom Use of FABRIC — submissions due by **Monday, February 24 at 11:59 PM ET**, and winners announced at KNIT10. [>>>Submit Form](https://docs.google.com/forms/d/e/1FAIpQLSeTp3i2iDhB7bHgN8ryMxZci8ya87yjeQd7_JMZImUodNinVA/viewform)

                                KNIT10 Call for Demos Now Open! Submit your demo by **February 24**. [>>>Submit Demo](https://docs.google.com/forms/d/e/1FAIpQLScRIWqHliNP3DFWBCnalYN_fBXJXVM0PpP9YWWJdSebC95TvA/viewform)
                                FABRIC invites nominations for four awards recognizing innovative uses of FABRIC resources—Best Published Paper, Best FABRIC Matrix, Best FABRIC Experiment, and Best Classroom Use of FABRIC — submissions due by **Monday, February 24 at 11:59 PM ET**, and winners announced at KNIT10. [>>>Submit Form](https://docs.google.com/forms/d/e/1FAIpQLSeTp3i2iDhB7bHgN8ryMxZci8ya87yjeQd7_JMZImUodNinVA/viewform)

                                KNIT10 Call for Demos Now Open! Submit your demo by **February 24**. [>>>Submit Demo](https://docs.google.com/forms/d/e/1FAIpQLScRIWqHliNP3DFWBCnalYN_fBXJXVM0PpP9YWWJdSebC95TvA/viewform)