Kubernetes | 17 November 2023

Troubleshooting Agents Installed with the connectware-agent Helm Chart

Prerequisites

Troubleshooting Agent Problems

When having problems with agents installed using the connectware-agent Helm chart, the first step is usually to delete any pod stuck in a state other than Running and Ready. This can easily happen, because the agents are StatefulSets, which do not automatically get rescheduled if they are unhealthy when their controller is updated, so they need manual intervention.

Deleting Pods in Faulty State

Simply use kubectl get pod -l app.kubernetes.io/component=protocol-mapper-agent command to display all agent pods, then delete any pod that is in a faulty state using the kubectl delete pod <podname> command.

Example

kubectl get pod -l app.kubernetes.io/component=protocol-mapper-agent -n <namespace>
Code-Sprache: YAML (yaml)
kubectl -n <namespace> delete pod <podname>
Code-Sprache: YAML (yaml)

If this does not help, you need to look at the faulty pods events and log to check for helpful error messages.

Depending on the Pods state, you should look at a different detail information to find the issue.

Pod stateKind of problemWhere to check
Pending, ContainerCreatingKubernetes is trying to create the pod.Pod events or description (see Checking pod state).
Running, but not ready or not behaving as expected.Pod unready, application not working correctly.Current pod logs (see Checking agent pod logs).
UnknownPod status is unknown, Kubernetes cluster problem.Kubernetes cluster state and events (see https://kubernetes.io/docs/tasks/debug/debug-cluster/).
ImagePullBackOffImage for the pod can’t be pulled.Helm value configuration (see Verifying container image configuration).
CrashLoopBackOffApplication is crashing.Previous pod logs (see Checking agent pod logs).

Checking Pod State

When you have problem with a pod not being scheduled, there can be different reasons, that can be classified in two categories:

  • Issues with your configuration.
  • Issue with your Kubernetes cluster.

For both categories, you will find events detailing the problem associated with the pod. We will assume, that you have already identified the pod through the previous steps in this article. You will need to know the name and namespace of the pod you are trying to debug.

Use the following command to display events associated with your pod:

<code>kubectl get event -n <namespace> --field-selector involvedObject.name=<podname></code>
Code-Sprache: YAML (yaml)

Example

Info: You can also view the events at the end of the output of kubectl describe pod <podname>

Issues with you Kubernetes cluster can take very many forms and are beyond the scope of this article, but you can use Debug pods as a starting point to debug any events you see that indicate a problem with your Kubernetes cluster.

Common Problems

Following are a few common scenarios that include issues with your configuration and how to address them.

Event mentionsLikely problemLikely solution
FailedScheduling, Insufficient cpu, Insufficient memoryYou specified CPU and memory resources for your agents, that your Kubernetes cluster can’t provide.Review Configuring compute resources for the connectware-agent Helm chart and adjust the configured resources to something that is available in your Kubernetes cluster.
FailedScheduling, didn’t match pod anti-affinity rulesThere are no available Kubernetes nodes that can schedule the agent because of podAntiAffinity rules.Review Configuring podAntiAffinity for the connectware-agent Helm chart and adjust your settings, or add additional nodes to your Kubernetes cluster.
FailedMount in combination with the names you chose as mTLS secret or CA chain, or “mtls-agent-keypair” / „mtls-ca-chain“You enabled mTLS for an agent without providing the necessary ConfigMap and Secret for CA chain and key pair.Review Using Mutual Transport Layer Security (mTLS) for agents with the connectware-agent Helm chart and adjust your configuration accordingly.
FailedMount in combination with the names of volumes (starting with “data-”The currently used storage provider is unable to provide the necessary volumes.Review Configuring agent persistence for the connectware-agent Helm chart and choose a Kubernetes StorageClass that can provide the necessary volumes.

Checking Agent Pod Logs

When your pods are scheduled, but don’t work the way you expect, are unready, or keep crashing (Status: “CrashLoopBackOff”), then you need to check the logs of this pod for details.

For pods that are ready or unready, check the current logs. For pods in status “CrashLoopBackOff” you need to check the logs of the previous container, to see why it crashed.

Checking Current Pod Logs

To check the current logs of your pod, use the kubectl logs command with the pod name, and look for error messages.

<code>kubectl logs -n <namespace> <podname></code>
Code-Sprache: YAML (yaml)

Example

Checking Previous Pod Logs

To check the logs of a previous container, follow Checking current pod logs, but add the parameter –previous to the command:

<code>kubectl logs -n <namespace> <podname> --previous</code>
Code-Sprache: YAML (yaml)

Common Problems

Event mentionsLikely problemLikely solution
Agent with mTLS enabled not connecting to broker

Agent log shows Reconnecting to mqtts://connectware:8883

Broker log shows:[warning] can’t authenticate client {„ssl“,<<„someName“>>} from someIp due to <<„Authentication denied“>>
mTLS not enabled in Connectware.Enable mTLS in Connectware. Set Helm value global.authentication.mTLS.enabled to true.
Agent not connecting to broker when mTLS in Connectware is enabled

Agent log showsVRPC agent connection to broker lost
Reconnecting to mqtts://someIp:8883
mTLS enabled in Connectware, but not in agent.Enable mTLS in agent as described in Using Mutual Transport Layer Security (mTLS) for agents with the connectware-agent Helm chart
Agent with mTLS enabled does not connect to broker

Agent log showsError: Client network socket disconnected before secure TLS connection was established
Agent is connecting to the wrong MQTTS port in broker.If your setup requires manual configuration due to additional NAT or something similar, review Configuring target Connectware for the connectware-agent Helm chart and adjust your configuration accordingly.If you are not aware of any special requirements of your environment, try removing all advanced MQTT target parameters.
Agent with mTLS enabled does not connect to broker

Agent log shows Failed to read certificates during mTLS setup please check the configuration
The certificates provided to the agent are either not found or faulty.Review Using Mutual Transport Layer Security (mTLS) for agents with the connectware-agent Helm chart and Full mTLS Examples for the connectware-agent Helm chart, and make sure your mTLS certificates fulfill the requirements.
Allowing an mTLS enabled agent in Connectware Client Registry fails with the message “An Error has occurred – Registration failed”

auth-server logs show: Unable to process request: ‚POST /api/client-registry/confirm‘, because: Certificate Common Name does not match the username. CN: someCN, username: agentName
Agent’s certificate invalid.Review Using Mutual Transport Layer Security (mTLS) for agents with the connectware-agent Helm chart and Full mTLS Examples for the connectware-agent Helm chart, and make sure your mTLS certificate CN matches the name of the agent.
Agent with mTLS enabled does not connect to broker

Agent log shows: Can not register protocol-mapper agent, because: socket hang up
Agent’s certificate invalid.Review Using Mutual Transport Layer Security (mTLS) for agents with the connectware-agent Helm chart and Full mTLS Examples for the connectware-agent Helm chart, and make sure your mTLS certificate is signed by the correct Certificate Authority (CA).
Agent with mTLS enabled does not connect to broker

Agent log shows: Failed to register agent. Response: 409 Conflict. A conflicting registration might be pending, or a user with the same username
The username of the agent is already taken.Every agent needs a user with the username of the value configured in the Helm value name for this agent.Verify that the agent’s name is uniqueVerify there is no old agent with the same name, if there is:Delete the Agent using the Systems => Agents UIDelete the user using the User Management => Users and Roles UIIf you created a user with the agent’s name for something else you have to choose a different name for the agent
Agent pod enters state CrashLoopBackOff

Agent log shows:{„level“:30,“time“:1670940068658,“pid“:8,“hostname“:“welder-robots-0″,“service“:“protocol-mapper“,“msg“:“Re-starting using cached credentials“}2{„level“:50,“time“:1670940068759,“pid“:8,“hostname“:“someName“,“service“:“protocol-mapper“,“msg“:“Failed to query license at https://someIp/api/system/info probably due to authentication\“: 401 Unauthorized.“}
The agent’s credentials are not correct anymore.The agent needs to be re-registered:Delete the Agent using the Systems => Agents UIDelete the user using the User Management => Users and Roles UI
Delete the agents StatefulSet:
kubectl -n <namespace> delete sts <release-name>-<chart-name>-<agent-name>Delete the agents PersistentVolumeClaim:kubectl -n <namespace> delete pvc data-<release-name>-<chart-name>-<agent-name>-0Re-apply your configuration through helm upgrade as described in Configuring agents with the connectware-agent Helm chart.

Verifying Container Image Configuration

When an agent pod is in the Status “ImagePullBackOff” it means that Kubernetes is unable to pull the container image required for this agent.

By default Connectware agents use the official protocol-mapper image from Cybus‘ official container registry. This requires a valid secret of the type kubernetes.io/dockerconfigjson to be used, but you have different ways of achieving this. Another option is to provide the images through a mirror, or even using custom images.

This leaves a lot of options to control the image, for which you have to find the right combination for your use case. How to configure these parameters is discussed in these articles:

To see the effect of your settings, you need to inspect the complete image definition of your agent pods.

To do so, you can use this command:

<code>kubectl -n <namespace> get pod -l app.kubernetes.io/component=protocol-mapper-agent -o custom-columns="NAME:metadata.name,IMAGE:spec.containers[0].image"</code>
Code-Sprache: YAML (yaml)

Example

In this example you can see, that agent “painter-robots” is trying to use an invalid image name, which needs to be corrected using the image.name Helm value inside the agents entry in the protocolMapperAgents section of the Helm values.

Ihr Browser unterstützt diese Webseite nicht.

Liebe Besucher:innen, Sie versuchen unsere Website über den Internet Explorer zu besuchen. Der Support für diesen Browser wurde durch den Hersteller eingestellt, weshalb er moderne Webseiten nicht mehr richtig darstellen kann.
Um die Inhalte dieser Website korrekt anzeigen zu können, benötigen Sie einen modernen Browser.

Unter folgenden Links finden Sie Browser, für die unsere Webseite optimiert wurde:

Google Chrome Browser herunterladen Mozilla Firefox Browser herunterladen

Sie können diese Website trotzdem anzeigen lassen, müssen aber mit erheblichen Einschränkungen rechnen.

Diese Website trotzdem anzeigen.