Pepr Best Practices
9 minute read
Table of Contents
Core Development
When developing new features in Pepr Core, it is recommended to use npx pepr deploy -i pepr:dev
, which will deploy Pepr’s Kubernetes manifests to the cluster with the development image. This will allow you to test your changes without having to build a new image and push it to a registry.
The workflow for developing features in Pepr is:
- Run
npm test
which will create a k3d cluster and build a development image calledpepr:dev
- Deploy development image into the cluster with
npx pepr deploy -i pepr:dev
Debugging
Welcome to the the debugging section! 🐛
Pepr is composed of Modules
(ie, what happens when you issue npx pepr init
), Capabilities like hello-pepr.ts
, and Actions (ie, the blocks of code containing filters and Mutate
, Validate
, Watch
, Reconcile
, OnSchedule
). You can have as many Capabilities as you would like in a Module.
Pepr is a webhook-based system, meaning it is event-driven. When a resource is created, updated, or deleted, Pepr is called to perform the actions you have defined in your Capabilities. It’s common for multiple webhooks to exist in a cluster, not just Pepr. When there are multiple webhooks, the order in which they are called is not guaranteed. The only guarantee is that all of the MutatingWebhooks
will be called before all of the ValidatingWebhooks
. After the admission webhooks are called, the Watch
and Reconcile
are called. The Reconcile
and Watch
create a watch on the resources specified in the When
block and are watched for changes after admission. The difference between reconcile and watch is that Reconcile
processes events in a queue to guarantee that the events are processed in order where as watch does not.
Considering that many webhooks may be modifying the same resource, it is best practice to validate the resource after mutations are made to ensure that the resource is in a valid state if it has been changed since the last mutation.
When(a.Pod)
.IsCreated()
.InNamespace("my-app")
.WithName("database")
.Mutate(pod => {
pod.metadata.labels["pepr"] = "true";
return pod;
})
// another mutating webhook could removed labels
.Validate(pod => {
if (pod.metadata.labels["pepr"] !== "true") {
return pod.Approve("Label 'pepr' must be 'true'");
}
return pod.Deny("Needs pepr label set to true")
});
If you think your Webhook is not being called for a given resource, check the *WebhookConfiguration
.
Debugging During Module Development
Pepr supports breakpoints in the VSCode editor. To use breakpoints, run npx pepr dev
in the root of a Pepr module using a JavaScript Debug Terminal. This command starts the Pepr development server running at localhost:3000
with the *WebhookConfiguration
configured to send AdmissionRequest
objects to the local address.
This allows you to set breakpoints in Mutate()
, Validate()
, Reconcile()
, Watch()
or OnSchedule()
and step through module code.
Note that you will need a cluster running:
k3d cluster create pepr-dev --k3s-arg '--debug@server:0' --wait
When(a.Pod)
.IsCreated()
.InNamespace("my-app")
.WithName("database")
.Mutate(pod => {
// Set a breakpoint here
pod.metadata.labels["pepr"] = "true";
return pod;
})
.Validate(pod => {
// Set a breakpoint here
if (pod.metadata.labels["pepr"] !== "true") {
return ["Label 'pepr' must be 'true'"];
}
});
Logging
Pepr can deploy two types of controllers: Admission and Watch. The controllers deployed are dictated by the Actions called for by a given set of Capabilities (Pepr only deploys what is necessary). Within those controllers, the default log level is info
but that can be changed to debug
by setting the LOG_LEVEL
environment variable to debug
.
To pull logs for all controller pods:
kubectl logs -l app -n pepr-system
Admission Controller
If the focus of the debug is on a Mutate()
or Validate()
, the relevenat logs will be from pods with label pepr.dev/controller: admission
.
kubectl logs -l pepr.dev/controller=admission -n pepr-system
More refined admission logs – which can be optionally filtered by the module UUID – can be obtained with npx pepr monitor
npx pepr monitor
Watch Controller
If the focus of the debug is a Watch()
, Reconcile()
, or OnSchedule()
, look for logs from pods containing label pepr.dev/controller: watcher
.
kubectl logs -l pepr.dev/controller=watcher -n pepr-system
Internal Error Occurred
Error from server (InternalError): Internal error occurred: failed calling webhook "<pepr_module>pepr.dev": failed to call webhook: Post ...
When an internal error occurs, check the deployed *WebhookConfiguration
resources’ timeout and failurePolicy configurations. If the failurePolicy is set to Fail
, and a request cannot be processed within the timeout, that request will be rejected. If the failurePolicy is set to Ignore
, given the same timeout conditions, the request will (perhaps surprisingly) be allowed to continue.
If you have a validating webhook, the recommended is to set the failurePolicy to Fail
to ensure that the request is rejected if the webhook fails.
failurePolicy: Fail
matchPolicy: Equivalent
timeoutSeconds: 3
The failurePolicy and timeouts can be set in the Module’s package.json
file, under the pepr
configuration key. If changed, the settings will be reflected in the *WebhookConfiguration
after the next build:
"pepr": {
"uuid": "static-test",
"onError": "ignore",
"webhookTimeout": 10,
}
Read more on customization here.
Pepr Store
If you need to read all store keys, or you think the PeprStore is malfunctioning, you can check the PeprStore CR:
kubectl get peprstore -n pepr-system -o yaml
You should run in npx pepr dev
mode to debug the issue, see the Debugging During Module Development section for more information.
Deployment
Production environment deployments should be declarative
in order to avoid mistakes. The Pepr modules should be generated with npx pepr build
and moved into the appropriate location.
Development environment deployments can use npx pepr deploy
to deploy Pepr’s Kubernetes manifests into the cluster or npx pepr dev
to active debug the Pepr module with breakpoints in the code editor.
Keep Modules Small
Modules are minified and built JavaScript files that are stored in a Kubernetes Secret in the cluster. The Secret is mounted in the Pepr Pod and is processed by Pepr Core. Due to the nature of the module being packaged in a Secret, it is recommended to keep the modules as small as possible to avoid hitting the 1MB limit of secrets.
Recommendations for keeping modules small are:
- Don’t repeat yourself
- Only import the part of the library modules that you need
It is suggested to lint and format your modules using npx pepr format
.
Monitoring
Pepr can monitor Mutations and Validations from Admission Controller the through the npx pepr monitor [module-uuid]
command. This command will display neatly formatted log showing approved and rejected Validations as well as the Mutations. If [module-uuid]
is not supplied, then it uses all Pepr admission controller logs as the data source. If you are unsure of what modules are currently deployed, issue npx pepr uuid
to display the modules and their descriptions.
✅ MUTATE pepr-demo/pepr-demo (50c5d836-335e-4aa5-8b56-adecb72d4b17)
✅ VALIDATE pepr-demo/example-2 (01c1d044-3a33-4160-beb9-01349e5d7fea)
❌ VALIDATE pepr-demo/example-evil-cm (8ee44ca8-845c-4845-aa05-642a696b51ce)
[ 'No evil CM annotations allowed.' ]
Multiple Modules or Multiple Capabilities
Each module has it’s own Mutating, Validating webhook configurations, Admission and Watch Controllers and Stores. This allows for each module to be deployed independently of each other. However, creating multiple modules creates overhead on the kube-apiserver, and the cluster.
Due to the overhead costs, it is recommended to deploy multiple capabilities that share the same resources (when possible). This will simplify analysis of which capabilities are responsible for changes on resources.
However, there are some cases where multiple modules makes sense. For instance different teams owning separate modules, or one module for Validations and another for Mutations. If you have a use-case where you need to deploy multiple modules it is recommended to separate concerns by operating in different namespaces.
OnSchedule
OnSchedule
is supported by a PeprStore
to safeguard against schedule loss following a pod restart. It is utilized at the top level, distinct from being within a Validate
, Mutate
, Reconcile
or Watch
. Recommended intervals are 30 seconds or longer, and jobs are advised to be idempotent, meaning that if the code is applied or executed multiple times, the outcome should be the same as if it had been executed only once. A major use-case for OnSchedule
is day 2 operations.
Security
To enhance the security of your Pepr Controller, we recommend following these best practices:
- Regularly update Pepr to the latest stable release.
- Secure Pepr through RBAC using scoped mode taking into account access to the Kubernetes API server needed in the callbacks.
- Practice the principle of least privilege when assigning roles and permissions and avoid giving the service account more permissions than necessary.
- Use NetworkPolicy to restrict traffic from Pepr Controllers to the minimum required.
- Limit calls from Pepr to the Kubernetes API server to the minimum required.
- Set webhook failure policies to
Fail
to ensure that the request is rejected if the webhook fails. More Below..
When using Pepr as a Validating
Webhook, it is recommended to set the Webhook’s failurePolicy
to Fail
. This can be done in your Pepr module in thevalues.yaml
file of the helm chart by setting admission.failurePolicy
to Fail
or in the package.json
under pepr
by setting the onError
flag to reject
, then running npx pepr build
again.
By following these best practices, you can help protect your Pepr Controller from potential security threats.
Reconcile
Fills a similar niche to .Watch() – and runs in the Watch Controller – but it employs a Queue to force sequential processing of resource states once they are returned by the Kubernetes API. This allows things like operators to handle bursts of events without overwhelming the system or the Kubernetes API. It provides a mechanism to back off when the system is under heavy load, enhancing overall stability and maintaining the state consistency of Kubernetes resources, as the order of operations can impact the final state of a resource. For example, creating and then deleting a resource should be processed in that exact order to avoid state inconsistencies.
When(WebApp)
.IsCreatedOrUpdated()
.Validate(validator)
.Reconcile(async instance => {
// Do WORK HERE
Pepr Store
The store is backed by ETCD in a PeprStore
resource, and updates happen at 5-second intervals when an array of patches is sent to the Kubernetes API Server. The store is intentionally not designed to be transactional
; instead, it is built to be eventually consistent, meaning that the last operation within the interval will be persisted, potentially overwriting other operations. In simpler terms, changes to the data are made without a guarantee that they will occur simultaneously, so caution is needed in managing errors and ensuring consistency.
Watch
Pepr streamlines the process of receiving timely change notifications on resources by employing the Watch
mechanism. It is advisable to opt for Watch
over Mutate
or Validate
when dealing with more extended operations, as Watch
does not face any timeout limitations. Additionally, Watch
proves particularly advantageous for monitoring previously existing resources within a cluster. One compelling scenario for leveraging Watch
is when there is a need to chain API calls together, allowing Watch
operations to be sequentially executed following Mutate
and Validate
actions.
When(a.Pod)
.IsCreated()
.InNamespace("my-app")
.WithName("database")
.Mutate(pod => // .... )
.Validate(pod => // .... )
.Watch(async (pod, phase) => {
Log.info(pod, `Pod was ${phase}.`);
// do consecutive api calls