Thursday, January 3, 2019

Integrating the workshop notes with the image

If you are still following this series of blog posts, we now have a dashboard for our workshop environment which combines workshop notes with the interactive terminal in the users browser.

This enabled us to have the instructions right next to the terminal where workshop attendees execute the commands. Further, it was possible to have it so that they need only click on the commands and they would be automatically executed in the terminal. This way they didn't need to manually enter commands or cut and paste them, both which can be prone to mistakes.

In this blog post we will look at how workshop notes for a specific workshop can be combined with the dashboard to create a self contained image, which can be deployed to a local container runtime, to OpenShift as a standalone instance, or using JupyterHub in a multi user workshop.

Building a custom dashboard image

We already covered previously the methods for creating a custom version of the terminal image which included additional command line tools and source files required for a workshop. The process is the same, but instead of using the terminal base image, the dashboard base image is used instead.

The two methods that could be used were to use the base image as a Source-to-Image (S2I) builder, or to build a custom image from a Dockerfile.

If you were running the build in OpenShift as an S2I build, with your files in a hosted Git repository, you could create the build by running the command:

  oc new-build quay.io/openshiftlabs/workshop-dashboard:latest~https://your-repository-url \
      --name my-workshop-dashboard

All you now need to know is the layout for the workshop notes so they are picked up and displayed correctly.

Layout for the custom workshop notes

The workshop notes were being hosted using Raneto, a knowledge base application implemented in Node.js. This was installed as part of the dashboard image which extends the terminal base image. When the image is started, Raneto will be started along side Butterfly and the proxy, with supervisord managing them all.

When building the custom image, the files for Raneto should be placed in the raneto sub directory. Within the raneto directory, the Markdown files for the workshop notes should be placed in the content subdirectory. Files should be Markdown files with .md extension.

The Markdown files can be placed directly into the content directory, or you can create further subdirectories to represent a category or set of multi part exercises.

By default, when pages are listed in the catalog on the home page for the workshop notes, they will appear in sorted order based on the names of the files. In the case of a subdirectory, they will be sorted within that category. If you need to control sorting, you can use meta data within the Markdown files to control relative file order, or using a sort file when needing to control the order that categories are shown. Check the Raneto documentation for more details on sorting.

If you want to replace the default home page which shows the categories and files within each category, you can provide an index.md file in the content directory.

A typical layout for the files might therefore be:

  raneto/content/index.md
  raneto/content/setup.md
  raneto/content/exercise-01/01-first-part-of-exercise.md
  raneto/content/exercise-01/02-second-part-of-exercise.md
  raneto/content/exercise-01/03-third-part-of-exercise.md
  raneto/content/finish.md

The index.md file would be a custom home page giving an overview of the workshop. The setup.md file would be any pre-requisite setup that may be required, such as logging in from the command line, creating projects, adding special roles etc. The finish.md file would be a final summary shown at the end of the workshop.

The subdirectories will then be where the exercises are kept, with the exercises broken into parts of a manageable size.

Defining the navigation path for the workshop

As Raneto is intended as being a knowledge base, it normally doesn't dictate a navigation path to direct the order in which pages should be visited. The templates used with the dashboard add to Raneto the ability to define a path to follow by specifying the navigation path as meta data in the Markdown files.

The metadata at the head of the index.md file would be:

  ---
  Title: Workshop Overview
  NextPage: setup
  ExitSign: Setup Environment
  Sort: 1
  ---

That for setup.md would be:

  ---
  Title: Setup Environment
  PrevPage: index
  NextPage: exercise-01/01-first-part-of-exercise
  ExitSign: Start Exercise 1
  Sort: 2
  ---

That for the last part of the last exercise:

  ---
  PrevPage: 02-second-part-of-exercise
  NextPage: ../finish
  ExitSign: Finish Workshop
  ---

And that for finish.md:

  ---
  Title: Workshop Summary
  PrevPage: exercise-02/02-second-part-of-exercise
  Sort: 3
  ---

The Title field allows the generated page title based on the name of the file to be overridden. The NextPage and PrevPage define the navigation path. The ExitSign allows you to override the label on the button at the end of a page which you click on to progress to the next page in the navigation path.

Each of the pages making up the exercises should similarly be chained together to define the navigation path to be followed.

Interpolation of variables into workshop notes

A set of predefined variables are available which can be automatically interpolated into the page content. These variables are:

  • base_url - The base URL for the root of the workshop notes.
  • username - The name of the OpenShift user/service account when deployed with JupyterHub.
  • project_namespace - The name of an existing project namespace that should be used.
  • cluster_subdomain - The sub domain used for any application routes created by OpenShift.

How these are set will depend on how the dashboard image is being deployed. Workshop notes will have to be tolerant of different deployment options and may need to specify alternate sets of steps if expected to be deployed in different ways.

To use the variables in the Markdown files, use the syntax:

  %username%

Additional variables for interpolation can be supplied for a workshop by providing the file raneto/config.js, including:

  var config = {
      variables: [
        {
          name: 'name',
          content: 'value'
        }
      ]
  };

  module.exports = config;

The contents of this config file will be merged with that for Raneto. The contents of the file must be valid Javascript code. You can use code in the file to lookup environment variables or files to work out what to set values to. The Javascript code will be executed on the server, in the container for the users environment.

This config file can also be used to override the default title for the workshop by setting the site_title attribute of the config dictionary.

Formatting and executable/copyable code blocks

You can use GitHub flavoured Markdown when needing to do formatting in pages.

In order to mark code blocks so that clicking on them will cause them to be copied to the first terminal and run, add execute or execute-1 after the leading triple back quotes which starts the code block, on the same line and with no space between then. If you want the second terminal to be used instead, use execute-2.

To have the contents of the code block copied into the copy and paste buffer when clicked on, use copy instead of execute.

Embedded images in workshop notes

The workshop notes can include embedded images. Place the image in the same directory as the Markdown file and use appropriate Markdown syntax to embed it.

  ![Screenshot](./screenshot.png)

Using HTML for more complex formatting

As with many Markdown parsers, it is possible to include HTML markup. This could be used to render complex tables, or could also be used in conjunction with Javascript, to add conditional sections based on the values passed in using the interpolated values. Note that any such Javascript is executed in the browser, not on the server.

Defining additional build and runtime steps

As when creating a custom terminal image using the S2I build process, you can define an executable shell script .workshop/build to specify additional steps to run. This can be used to check out Git repositories and pre-build application artifacts for anything used in the workshop. Similarly, a .workshop/setup script can be included to define steps that should be run each time the container is started.

Coming up next, deploying the full workshop

This post provides a rough guide on how to add workshop notes to the image when using the dashboard image in an S2I build. How to deploy the dashboard image for a multi user workshop is similar to what was described previously, you just need to supply the custom dashboard image instead of the custom terminal image.

Deploying a multi user workshop where the users are known in advance by virtue of performing user authentication against OpenShift, isn't the only way that JupyterHub could be used to create a workshop environment.

In the next post we will revise how to deploy the multi user workshop using the custom dashboard image and the existing JupyterHub deployment, but we will have a look at an alternate scenario as well.

This scenario is where you accomodate anonymous users, where they are given an ephemeral identity in the cluster using a service account, along with a temporary project namespace to work in that is automatically deleted when they are done.

Wednesday, January 2, 2019

Dashboard combining workshop notes and terminal

The workshop environment described so far in this series of posts only targeted the problem of providing an in browser interactive terminal session for workshop attendees. This approach was initially taken because we already had a separate tool called workshopper for hosting and displaying workshop notes.

I didn't want to try and modify the existing workshopper tool as it was implemented in a programming language I wasn't familiar with, plus starting with it would have meant I would have had to extend it to know about user authentication, as well as add the capability to spawn terminals for each user which would be embedded in the workshopper page.

In the interests of trying not to complicate things and create too much work for myself, I therefore went the path of keeping the per user terminal instances completely separate from the existing tool. Now that I have deployment of terminals for multiple users working using JupyterHub as the spawner, what can be done about also hosting the workshop notes in a combined dashboard view.

Self contained workshop in a container

There are a couple of different approaches I have seen used to create these dashboard views which combine workshop notes with a terminal.

The first way, is to have a single common web site which provides the workshop notes. A user would visit this web site, going through a login page if necessary. Once they have been assigned an identifier, a backend instance for the terminal would be dynamically created. The common web site would embed the view for that terminal in the dashboard for that user, along with the workshop notes.

The second way, has the common web site only be a launchpad, with it dynamically creating a backend instance which hosts a web application which provides both the workshop notes and the terminal. That is, each user has their own instance of the application hosting the workshop notes, rather than it being provided by a single frontend application.

In both cases, depending on what the training system was for, the backend could have used a virtual machine (or set of virtual machines) for each user, or it could have used containers hosted on some container platform.

Whatever they use, these were always hosted services and the only way to do the workshop, was to go to their service.

When we run workshops, we don't have an always running site or infrastructue where someone wanting to do the workshop could go. We would host the workshop notes and terminals in the one OpenShift cluster where workshop attendees are also running the exercises for the workshop. When the workshop was over, the OpenShift cluster would be destroyed.

At the end of the workshop, we would inevitably be asked, how can I go through the workshop later, or can I share the workshop with a colleague so they can do it. Because it was an ephemeral system, this wasn't really possible.

One of the goals I had in mind in coming up with what I did, was the idea that everything you needed was self contained, with the container image being the packaging mechanism. So although we might run a multi user workshop and use JupyterHub as a means of spawning each users environment with the interactive terminal, you weren't dependent on having JupyterHub.

You have already seen this play out where I showed how you could deploy the terminal image standalone in an OpenShift cluster of your own, without needing JupyterHub. The only thing missing right now is including the workshop notes and way to view it in that same image.

The deployment model used is therefore sort of similar to the second option above, except that you only get a container for your terminal instance as a container in Kubernetes, you aren't being given a complete set of VMs or cluster of your own. When you run the exercises, you would be working in the same cluster as the terminal was deployed, and the same cluster that other workshop attendees were using.

This approach works well with OpenShift, because unlike plain out of the box Kubernetes, OpenShift enforces multi tenancy and isolation between projects and users. If necessary, you can further lock down users and what they can do through quotas and additional role based access control, although the default access controls are usually sufficient.

Because of the multi tenancy features of OpenShift, it is quite reasonable goal therefore, to package up everything for a workshop in a container image, and allow them to pull down that image later and deploy it into their own OpenShift cluster, knowing that it should work.

As well as using an image to deliver up a self contained workshop as an image which can be deployed in any cluster, it could also prove useful for packaging up the instructions, and anything else required, for when demonstrating how to deploy applications in OpenShift to customers.

Image implementing a combined dashboard view

In creating a combined dashboard view, because a standalone terminal was still required for use with the existing workshopper tool, addition of a dashboard was done using an image of its own. Rather than this being completely from scratch, the base terminal image was designed in a way that it could be extended, with plugins being able to be registered with the proxy to add additional route handlers.

This way if it was decided to stick with the workshopper tool for the workshop notes, just the terminal base image would be used. If a combined dashboard view with integrated workshop notes was more appropriate, the dashboard image which extends the terminal image would be used.

To see how the dashboard view would look, you can deploy the dashboard image using:

$ oc new-app https://raw.githubusercontent.com/openshift-labs/workshop-terminal/develop/templates/production.json \
    --param TERMINAL_IMAGE=quay.io/openshiftlabs/workshop-dashboard:latest

This uses the same approach before of relying on the OpenShift cluster to perform user authentication, and only an admin of the project can access it. Once user authentication has been completed, a user is redirected back to the dashboard.

img-alternative-text

Although the dashboard image doesn't have any pre-loaded content for a specific workshop yet, you can see how on the left you have the workshop notes. On the right you have a pair of terminals.

In the workshop notes, you can have one or more pages, with pages grouped into sub directories as categories if necessary. You can then define how the pages are chained together so users are guided from one page to the next as they perform the exercises.

The content for the workshop notes can be specified using markdown formatting. In the case of code blocks, these can be optionally annotated so that if the code block is clicked on, it will automatically be copied to a terminal on the right hand side and executed. This saves the user needing to type the commands. A code block can also be annotated so that if clicked on, the content will be copied into the browser copy buffer. This can then be pasted into the terminal or other web page.

Embedding a view for workshop notes

In the dashboard view, the Butterfly application is still used for the embedded terminal view. As Butterfly supports multiple sessions from the one instance, it was a simple matter of referring to a different named session in each of the terminal panes.

For the workshop notes, like with Butterfly, I preferred to not write a half baked implementation of my own, so I opted to use a knowledge base application called Raneto which fitted the requirements I had. These were that markdown could be used, it supported variable substitution, supported page meta data, and allowed for the theme and layout to be overriden in templates and CSS.

Raneto even has some features that I am not using just yet which could prove useful down the track. One of those is that it is possible to edit content through Raneto. Unfortunately that feature doesn't work properly when hosting Raneto at a sub URL, but I have a pull request submitted to the author which fixes that, so hopefully editing can be allowed at some point for when you are authoring content.

In order to combine the workshop notes hosted by Raneto, and the terminal panes implemented using Butterfly, I had to write some custom code, specifically the dashboard framing, with adjustable sizing of panels. For this part, I relied on the fact that the proxy had been made pluggable and overrode the default page handler to redirect to the dashboard view. I then used PUG in Node.js express to create the layout.

People will know me as a Python fan boy and this was really the first time using Javascript, Node.js and express on anything serious. I must say it felt a bit liberating, although I had to rely on a lot of googling, and cutting and pasting of Javascript code snippets until I got it doing what I wanted. It could still do with a bit more polish and tweaking as there are still a couple of issues I haven't been able to solve due to a lack of understanding of Javascript and the DOM page rendering model.

Coming up next, adding workshop notes

In addition to being able to run up the dashboard standalone, you can also deploy it using JupyterHub for a multi user workshop environment. The next step therefore is to add your own workshop notes.

This is done in the same was as when extending the terminal to add additional command line tools, or source files, including the ability to be able to add a build script to run your own steps during the build phase. That is, using a Source-to-Image (S2I) build, or a build from a Dockerfile. The result is the same, a self contained container image, but this time adding the workshop notes and the way of displaying them in a single dashboard view.

In the next post I will explain how you would layout your content for your workshop notes, how to markup code blocks so they can be clicked on to be executed, and how to link pages together to create the path users should follow when working through the exercises.

Tuesday, January 1, 2019

Administration features of JupyterHub

You have seen now in the last post how you can use JupyterHub to deploy a multi user workshop environment where each user is given access to their own interactive shell environment in their web browser. This is by having JupyterHub spawn a terminal application instead of the usual Jupyter notebooks it would be used for.

The aim in being able to provide this out of the box experience, is that it avoids the problem of workshop attendees wasting time trying to install any command line tools on their own local computer that may be needed for a workshop. Instead, everything they need, be it command line tools, or copies of source code files, are ready for them and they can immediately get started with their workshop. This can save up to 20 minutes or more at the start of a workshop, especially if running the workshop at a conference where there is poor wifi connectivity and downloads are slow.

For this, we are relying on JupyterHub to co-ordinate user authentication and spawning of a user environment, but JupyterHub can do more than that, providing a means to monitor user sessions, including allowing a course instructor to access an admin page where they can see logged in users and control their sessions.

The JupyterHub adminstration panel

In order to designate that a specific user has administration privileges, when deploying the multi user workshop environment, you can specify a list of users as a template parameter.

$ oc new-app https://raw.githubusercontent.com/openshift-labs/workshop-jupyterhub/master/templates/hosted-workshop-production.json \
    --param ADMIN_USERS="opentlc-mgr"

The list of admin users can also be set after deployment by updating the ADMIN_USERS environment variable in the deployment config.

When a user is designated as an administrator, they can access the JupyterHub admin panel using the /hub/admin URL path.

img-alternative-text

From the admin panel, a course instructor is able to see all the users who have logged in and whether that users instance is currently running. If necessary they can stop or start a users instance.

If a workshop attendee is having some sort of problem and the course instructor needs to be able to see the error they are getting for the command they have run, or needs to debug an issue in that particular users environment, they can from the admin panel click on "access server" for that user. This allows the course instructor to impersonate that user and see the exact same terminal view as the user themselves sees. Each can enter in commands in their own browser view and the other will see it.

The admin panel provides the ability to perform basic operations related to users. For more control, JupyterHub also provides a REST API. This could be used for ad-hoc operations, or you could create a separate web application as an alternative to the bundled admin panel, exposing more of the operations which can be performed via the REST API.

Customising JupyterHub configuration

The workshop environment created using the template, and the image it uses, is intended to provide a reasonable starting point suitable for any basic workshop. If you need to perform heavy customisation for your own purposes, you would fork the Git repository and directly modify it. For simple configuration changes, it is possible to override the JupyterHub configuration by adding additional configuration into a Kubernetes config map for the deployed JupyterHub instance.

In the case where the application deployment is called terminals the name of the config map is terminals-cfg. You can place any configuration in here that JupyterHub or the components it uses understands, although what you may want to override is probably limited.

One example of what you can use the config map for, is to define a user whitelist. A user whitelist is used where you don't want any user in the OpenShift cluster to be able to access the workshop environment, but instead want to limit it to a set list of users.

img-alternative-text

Culling of idle user terminal sessions

The intent with the deployment and configuration is that it caters for the typical use case, providing default configuration for features that you would often want to be used. You can then override exactly how that feature works through template parameters at the point of creation, or by updating environment variables in the deployment configuration.

One example of a feature that comes pre-configured is idle termination of user sessions.

In the case of this feature, because JupyterHub is able to monitor web traffic passing through the configurable HTTP proxy it uses to direct traffic, it knows when a terminal session is no longer being used. This enables JupyterHub to shutdown the pod for the user session if the period of inactivity exceeds a certain time. This might be useful to reduce resource usage for terminal sessions which were never used.

By default the inactivity timeout is set to 7200 minutes (2 hours), but you can override it using the IDLE_TIMEOUT template parameter, or deployment config environment variable.

Coming up next, hosting workshop notes

The purpose of this post is to again show that by using JupyterHub we get a lot of features for free without needing to implement them ourselves. JupyterHub is very configurable though and so if necessary things can be tweaked even further. This includes for example being able to override the templates used to render JupyterHub pages such as the spawn progress page, control panel and admin panel. So if you wanted to give it your own look and feel or add branding, you could do that as well.

Getting back to the original problem of running a workshop, what you have seen so far deals with the issue of being able to provide each user with an in browser interactive terminal where they can work, with all command line tools they need, ready to go.

During the workshop, the workshop attendees still need the workshop notes for the exercises they need to work though. In the next post, we will look at how the hosting of workshop notes can be integrated with the workshop environment we have running so far.

Monday, December 31, 2018

Deploying a multi user workshop environment

In this fourth post of this series of posts, we finally get to how to deploy a multi user workshop environment using OpenShift, or at least, how to deploy an interactive terminal session per user accessible in the browser. Right now we are still working on the assumption that the workshop notes are deployed separately, but we will get to that in a later post.

From the second post in this series we know we could have each user log into OpenShift and deploy an instance of the terminal themselves, but that implies they know how to do that using just the OpenShift web console, or they need to be instructed how to do it.

In the interests of making things easy for attendees of the workshop at the beginning, a better workflow is to give them all the same URL. When they visit this URL, they would log in with the access credentials they were given, and be immediately dropped into their own terminal session. They can then proceed to follow the workshop notes and do the exercises.

As already mentioned, the solution I am using to handle the user authentication and spawning of a per user environment is JupyterHub. The JupyterHub software is usually used to spawn Jupyter notebooks, but it can be used to spawn other applications as well.

When using JupyterHub for applications other than Jupyter notebooks, and access to the per user instance needs to be protected, the application will though need to authorise access by checking with JupyterHub that the user is allowed to access that instance. As described, this is already catered for by the terminal image. Similarly, the other requirement of the application being able to be hosted at a set sub URL is also handled already.

Deploying the JupyterHub application

Deploying JupyterHub to a plain Kubernetes cluster can be a fiddly process. The process for doing this which the Jupyter project team provides is by using a set of Helm templates, but Helm is not a standard part of Kubernetes. Thus it would be necessary if using plain Kubernetes to deploy Helm first.

When using OpenShift, instead of using Helm templates, when deploying JupyterHub it is possible to use native OpenShift templates instead.

The original OpenShift templates for deploying JupyterHub to OpenShift were created as part of the Jupyter on OpenShift project. You can find them as part of the JupyterHub Quickstart example repository.

That example repository is intended to get you started with JupyterHub on OpenShift. It provides the source code for a JupyterHub image that can be deployed to OpenShift, along with example templates.

For deploying our workshop environment, we will be using the JupyterHub image created out of that repository, but will be using our own templates, which combines the custom configuration for JupyterHub that we need, with that image, and then deploys it.

The command for deploying an initial workshop enviroment is:

$ oc new-app https://raw.githubusercontent.com/openshift-labs/workshop-jupyterhub/master/templates/hosted-workshop-production.json

You do not need to be a cluster admin to run this command. As long as you have an OpenShift user account and have a project you can use which has sufficient quota for running JupyterHub and an instance of the terminal for each user, you are good to go.

When the oc new-app command is run, it will create a deployment config called terminals, along with a bunch of other resources. If you want to monitor the progress of the deployment so you know when it is complete, you can run:

$ oc rollout status deploymentconfig terminals

When complete, you can determine the URL for accessing the workshop environment, the URL which you would give the workshop attendees, by running:

$ oc get route terminals

The template being used for the deployment in this case is setup to use OpenShift for performing user authentication. The workshop attendee when visiting the URL will therefore be redirected to the user login page for the OpenShift cluster, where they should enter the user credentials they were provided for the workshop.

img-alternative-text

Once login has been completed, they will be redirected back to JupyterHub. If they are one of the first users to login and the terminal image hasn't yet been pulled to the node in the OpenShift cluster, they may briefly see a JupyterHub page which tracks progress as their instance is started.

img-alternative-text

When it is ready, they will end up in the interactive terminal session in their browser. They will still need to login from the command line using their user credentials, but they will not need to indicate the address of the OpenShift cluster, as that has already been setup.

img-alternative-text

If projects have been pre-created for users, these will be visible, otherwise the user would need to create any projects as the workshop notes describe.

Persistent storage for user sessions

When running applications in containers, the file system accessible to the application is by default ephemeral. This means that if the container is shutdown, any updates made to the file system will be lost when a new instance of the application to replace it is created.

For a workshop, where exercises may involve creating or modifying files pre-populated into the image, this would mean loss of any progress and it would be necessary to start over.

Because this would be unacceptable in a workshop that may go for a whole day, the configuration for JupyterHub used with this template will allocate a persistent volume for each user session.

With how the terminal image is setup, the home directory for a user in the container will be /opt/app-root/src. A user can though write to anywhere under the /opt/app-root directory. As a result, the persistent volume will be mounted at /opt/app-root.

If you are aware of how file system mounting works, you may see a problem here. If a persistent volume is mounted at /opt/app-root, it will hide any files that may have been added in a custom terminal image.

To avoid this, when the user environment is started for the first time, an init container is used in the Kubernetes pod to mount the persistent volume at a temporary location. The contents of the /opt/app-root directory will be copied into the persistent volume from the image. For the main container, the persistent volume will then be mounted at /opt/app-root.

Using this mechanism, the persistent volume can be populated the first time with the files from the image. Any subsequent updates to any files under /opt/app-root will therefore be persistent. In the event that the users instance is shutdown, they need only refresh their web browser and JupyterHub will re-create the user environment, but where the workspace is using what was already saved away in the persistent volume, and so no work is lost.

Using your custom terminal image

When the template above is used to deploy JupyterHub for the workshop, it will by default use the workshop terminal base image. The image will provide a default set of command line tools, but will not contain any files specific to the workshop.

If a custom terminal image has been created using the steps explained in the prior blog post, this image can be used for the workshop, by passing its name as parameter to the template when deploying JupyterHub.

$ oc new-app https://raw.githubusercontent.com/openshift-labs/workshop-jupyterhub/master/templates/hosted-workshop-production.json \
    --param TERMINAL_IMAGE=my-workshop-terminal:latest

If the JupyterHub instance has already been deployed, you can override what terminal image it is using by setting the TERMINAL_IMAGE environment variable on the deployment config.

$ oc set env deploymentconfig terminals TERMINAL_IMAGE=my-workshop-terminal:latest

The value given to TERMINAL_IMAGE in each case, can be a full name for an image hosted on an image registry, or it can be a reference to an image stream and tag in the current project.

Coming up next, other JupyterHub goodies

I found JupyterHub convenient because it meant I didn't have to reinvent the wheel. JupyterHub could be setup to work with an external OAuth provider for user authentication, in this case the same sign on system as the OpenShift cluster was configured to use. Spawning and management of each users instance was also managed by JupyterHub, we just happened to be spinning up a terminal application rather than a Jupyter notebook.

In the next post I will go through some of the other built in functionality that JupyterHub provides. This includes its administration web interface and REST API, as well as its ability to the monitor web traffic passing through to a users application and culling of idle user sessions.

Sunday, December 30, 2018

Creating your own custom terminal image

In this series of posts I have been talking about some of the work I have been doing with creating environments to host workshops when needing to train users in using a software product such as OpenShift.

In the first post I explained how JupyterHub can be used to deploy web applications other than Jupyter notebooks, and how I use it to deploy user environments which give each attendee of a workshop, access to an interactive command line shell in their browser, with all the command line client tools and files they need for the workshop.

In the second post I delved into how the container image was constructed, as well as how it could be run independent of using JupyterHub where a user may want to deploy it themselves and wasn't running a workshop.

In this post I am going to explain more about how the terminal image is constructed and how it can be extended to add additional command line tools and files required for a specific workshop.

Contents of the base container image

In addition to the Butterfly application, which provides the interactive terminal in the browser, and the auth proxy, which handles authentication and authorisation, the terminal base image includes a range of command line tools that may be needed in the workshops.

As the main target of any workshops is going to be learning about Kubernetes and OpenShift, the command line clients kubectl, oc and odo are included. You also have commonly used UNIX command utilities such as git, vi, nano, wget, and curl, as well as tools required to work with specific programming languages. The language runtimes which are provided are Java, Node.js and Python.

When you run specific workshops, you may need additional command line tools to be available, or you may want a users environment to be pre-populated with files for the workshop. This could include a checkout of an existing remote Git respository, so the user will already have all the files necessary present, without needing to pull them down themselves.

In order to add these on top of the base container image, you can extend it using a Source-to-Image (S2I) build, or using buildah or docker builds.

Running a Source-to-Image build

A Source-to-Image (S2I) build is a means of creating custom container images. It works by starting out with a S2I builder image, running it, injecting source files into it, and then running an assemble script within the container which prepares the contents of the image being created. A container image is then created from the running container.

If you are familar with hosting services such as Heroku, or Cloud Foundry, which use what are called build packs, S2I is similar to that, except that S2I was purpose built to create container images. In fact, the S2I builder is itself a container image and in our case, the terminal image is what is called S2I enabled, and acts as the S2I builder image.

An S2I build can be run from the command line on a host where the docker container runtime is available, using the s2i tool. An S2I build can also be run inside of any container platform which supports it, such as OpenShift.

When using the s2i command line tool, you have two options for where the files you want to inject into the build come from. The first is that files can be injected from the local file system, and the second is for the files to be pulled from a hosted Git repository, such as on GitHub, Gitlab, or Bitbucket.

In either case, when the S2I build is run, the directory of files used as input will be placed in the directory /opt/app-root/src of the image. This is the same directory which acts as the home directory for a users session when they access the terminal session through the browser.

In addition to any files from the build directory or Git repository being available, you can also create other files as part of the build process for the custom terminal image. This is done by supplying an executable shell script file at the location .workshop/build. This script will be run after files have been placed in the /opt/app-root/src directory.

One use for the build script might be to checkout copies of a remote Git repository so that it is included in the image and thus available to the user immediately they access the terminal session. You might also pre-compile any build artifacts from the source code.

#!/bin/bash

set -x
set -eo pipefail

git clone https://github.com/openshift-evangelists/Wild-West-Backend.git backend
git clone https://github.com/openshift-evangelists/Wild-West-Frontend.git frontend

By pre-compiling build artifacts once as part of the image build, it avoids the need for every user in a workshop to have to run the build. The only time they would need to re-compile build artifacts is if they make code changes, and even then it would only be necessary to rebuild anything corresponding to the changes.

Note that to ensure that the S2I build of the image fails if any build step fails, you should include the line:

set -eo pipefail

in the .workshop/build script.

By including this line, the whole build will fail immediately any time one specific step fails, and you do not need to manually check the result of every step and explicitly exit the script if a step fails. Failing the build like this ensures that you don't unknowingly create and use an image where a build step didn't fully succeed, but the remainder of the build ran to completion anyway.

With any files you want to include in place, or any build script, if you had them in the current directory, you run s2i as:

s2i build . quay.io/openshiftlabs/workshop-terminal:latest my-workshop-terminal

If you had the files in a hosted Git repository, replace the directory path "." with the Git repository URL.

Once the build has finished, you can verify that the image has been built correctly using docker by running:

docker run --rm -p 10080:10080 my-workshop-terminal

and accessing http://localhost:10080.

img-alternative-text

If you were using OpenShift, and wanted to do the build in the same project where a workshop will be run from, you can use:


oc new-build quay.io/openshiftlabs/workshop-terminal:latest~https://your-repository-url \
  --name my-workshop-terminal

If necessary you could also use a binary input build and build from files in a local directory.

Building an image from a Dockerfile

If you don't want to use the s2i tool, but want to use buildah build or docker build instead, you can use a Dockerfile.

When using a Dockerfile you could manually perform all the steps yourself, but it is usually better to setup everything as if an S2I build was triggered and then run the S2I assemble script. In this case the Dockerfile would be:

FROM quay.io/openshiftlabs/workshop-terminal:latest

USER root

COPY . /tmp/src

RUN rm -rf /tmp/src/.git* && \
    chown -R 1001 /tmp/src && \
    chgrp -R 0 /tmp/src && \
    chmod -R g+w /tmp/src

USER 1001

RUN /usr/libexec/s2i/assemble

You do not need to provide CMD or ENTRYPOINT as these are inherited from the base image.

When ready, run buildah build or docker build to create the custom image.

Building a custom image using these tools from a Dockerfile would be necessary where you want to install additional system packages that can only be installed as the root user. This is because an S2I build is always run as an unprivileged user.

Defining steps for when the image is run

The .workshop/build script is run during the build phase when using an S2I build using s2i or when emulating it using the above Dockerfile. If you want to have special steps run when a container is started up from the image, you can supply an executable shell script called .workshop/setup. Depending on the environment in which the image is known to be running, this might run steps such as creating projects in OpenShift for use by the user, or set up special service accounts or role bindings, so the user doesn't need to do those things themselves.

Note that the .workshop/setup script can only define commands to run, it cannot be used to set environment variables that would be inherited by the users terminal session. If you need to set environment variables, you can create a shell script file in the directory /opt/app-root/etc/profile.d. For example, if using a Dockerfile build you had installed the Ruby package from the Software Collections Library (SCL), you would create the file /opt/app-root/etc/profile.d/ruby.sh which contained:

source scl_source enable rh-ruby25

This would activate the Ruby runtime and ensure that any appropriate environment variables are set and the PATH updated to include the bin directory for the Ruby distribution.

Coming up next, deploying the workshop

In this post we covered how the terminal base container image can be extending using the Source-to-Image (S2I) build process, or a build using a Dockerfile. It touched on some of the extension points used by this specific S2I builder implementation, but also when the image is run in a container. This isn't all the ways that the base image can be extended. How the web interface for the terminal can for example be customised will be covered in a later post.

The benefit of using a base image and then building a custom image on top which contains everything needed for a workshop, is that the image becomes the packaging mechanism for the content required, but also contains everything needed to run it standalone if necessary.

One of the issues that comes up with running workshops is that people will ask how they can do over the workshop themselves again later. If the only way to work through the exercises is to deploy the complete multi user workshop environment, that isn't necessarily practical. Because everything needed is in the image, they can run the image themselves, as was covered in the prior post. So if you wanted to allow attendees to do it again, you need only host the image on a public image repository such as quay.io so they can pull it down and use it later.

Now that you can see how you can create a custom terminal image with special content for a workshop, we will move onto how you could run a multi user workshop using JupyterHub. That will be the topic of the next post in this series.

Saturday, December 29, 2018

Running an interactive terminal in the browser

In the last blog post I explained that JupyterHub can be used to spawn instances of applications other than Jupyter notebooks. Because JupyterHub can also handle user authentication, this made it a handy way of spinning up distinct user environments when running workshops. Each attendee of the workshop would be given access to a shell environment running in a separate pod inside of Kubernetes, with access via an interactive terminal running in their web browser. This meant that they did not need to install anything on their local computer in order to complete the workshop.

For the interactive terminal, the Butterfly application was used. This is a backend Python application implemented using the Tornado web framework, with the in browser terminal support being based on term.js.

Butterfly being a self contained application provides an easy way of deploying an interactive terminal session to a remote system in your web browser without needing to implement a custom web application from scratch. It isn't a complete drop in solution though, the reason being that for authentication it relies on PAM. That is, it requires that it be run as the root user, and that a user and corresponding password exist in the UNIX user database.

For a container based deployment using OpenShift, because the default security model prohibits the running of containers as root, this meant that the existing container image for Butterfly isn't suitable. We also don't want Butterfly handling authentication anyway and want JupyterHub to handle it, albeit that we still need access to the Butterfly instance be authorised in some way.

The latter is necessary because although JupyterHub handles user authentication, once that is done, all subsequent web traffic for the users instance only traverses the JupyterHub configurable HTTP proxy, which doesn't authorise subsequent access to the users instance.

When deploying Jupyter notebooks behind JupyterHub, to ensure that subsequent access is correctly authorised, the Jupyter notebook instance will perform its own OAuth handshake against JupyterHub to get an access token for the user, determine who the user is, and then check whether the user should have access to that instance.

If a similar handshake is not done when deploying an application other than a Jupyter notebook behind JupyterHub, once a user has logged in, a second user could use the URL for the first users instance and wouldn't be blocked.

Access via a lightweight auth proxy

The solution which can be used to add the required authorisation handshake is to place in front of Butterfly a lightweight auth proxy. For the proxy, Node.js was used as packages were available which made it easy to create a HTTP and web socket proxy in conjunction with authentication/authorisation mechanisms such as OAuth.

With the proxy being a separate process, and using a different language runtime, many will argue you must use a separate container for the proxy, adding it as a side car container to the first container in the same Kubernetes pod. If you want to be purist then yes, you could do this, but to be frank, where building an integrated solution where the fact that a proxy was used should be transparent, and where deployment should be as easy as possible, it was simpler in this case to run the proxy as a separate process in the same container as Butterfly is running.

When running applications in containers, under most circumstances the application is the only thing which is running, and so you would run it as process ID 1 inside of the container. Because both Butterfly and the lightweight auth proxy needed to be run at the same time, supervisord was added to the container to manage running them. Having supervisord present, actually turned out to be useful later on when wanting to layer on extra functionality for displaying content in a dashboard, as well as for special cases where for a workshop it might be necessary to run additional local services in the container.

Deploying the terminal standalone

Discussion up till now has been focused on providing interactive terminals to multiple users at the same time for a workshop, using JupyterHub to control user authentication and spawning of the per user environments. Although that is a primary goal, we still want to enable a single user to create an instance of the terminal themselves in their own project in OpenShift, without needing to deploy JupyterHub.

This can be done in OpenShift using:

$ oc new-app --docker-image quay.io/openshiftlabs/workshop-terminal:latest --name terminal
$ oc expose svc terminal

You can then use:

$ oc get route terminal

to determine the public URL for accessing the terminal in your web browser. With access to the terminal, then login to the OpenShift cluster from the command line using your username and password, or with a token, and start working with the cluster.

img-alternative-text

This method of deployment is quick, but is not protected in any way. That is, you are not challenged to supply any login credentials to get to the terminal, nor is the connection secure. This means that if someone else knew the URL, they could also access it.

A slightly better way is to use:

$ oc new-app --docker-image quay.io/openshiftlabs/workshop-terminal:latest --name terminal \
    --env AUTH_USERNAME=grumpy --env AUTH_PASSWORD=password
$ oc create route edge terminal --service=terminal

This activates the requirement to provide login credentials using HTTP Basic authentication. A secure connection is also used so no one can sniff your session traffic.

img-alternative-text

This ability to secure access to the terminal works as use of HTTP Basic authentication was added as one option to the auth proxy, in addition to the support added for using the terminal in conjunction with JupyterHub.

Linking to OpenShift authentication

Having support for HTTP Basic authentication is only a little bit more work to setup and because it isn't linked to using JupyterHub, it means you can technically run the terminal image in a local container runtime using podman run or docker run and still have some access control.

For the case of deploying the terminal standalone to an OpenShift cluster, a better arrangement would be to use the same authentication system as the OpenShift cluster itself uses. This way you wouldn't have to worry about supplying a username and password at all when deploying the terminal, and you would use OpenShift to handle user authentication.

Because of the benefits, this particular scenario is also supported, but because it involves additional configuration that is not easily done when deploying from the command line, an OpenShift template is provided to make it easier.

To deploy the terminal in your OpenShift project using this method use:

$ oc new-app https://raw.githubusercontent.com/openshift-labs/workshop-terminal/develop/templates/production.json

This will create the following resources.

serviceaccount "terminal-user" created
rolebinding.authorization.openshift.io "terminal-admin" created
imagestream.image.openshift.io "terminal" created
deploymentconfig.apps.openshift.io "terminal" created
service "terminal" created
route.route.openshift.io "terminal" created

What is different for this deployment is that a service account and rolebinding are also created.

The service account is necessary as part of the mechanism for using OpenShift for user authentication, specifically the service account ends up being the client ID when using OAuth against OpenShift.

With this in place and the auth proxy configured to use it, it means that when you visit the public URL for the terminal, you will be redirected to OpenShift to authenticate.

img-alternative-text

You then login using the same credentials as the OpenShift cluster. Once that is successful, you will be redirected back to your terminal.

img-alternative-text

One difference using this deployment method is that since OpenShift is used to handle user authentication, and so can be viewed with a little bit more trust, as a convenience so you don't then have to login to the OpenShift cluster from the command line, this will be done for you automatically. This means you can immediately start working in the current project.

This works because the service account created to support the use of OAuth for user authentication is also used for the pod when deploying the terminal. That service account is also given admin rights over the project it is deployed in via the role binding which was created. You can therefore deploy further applications, or even delete the project.

Do note though that the service account is not a normal authenticated user. This means it cannot create new projects, it can only work within the current project. If it was necessary that through the service account access was needed to other projects, an appropriate role binding would need to be created in the other project to allow access. Alternatively, the user can login from the command line with oc using their own username and password, or an access token. The user could then access any other projects they are allowed to, or create new projects if that capability hasn't been disabled for authenticated users.

One other very important point. When using OpenShift for user authentication in this way, similar to what happens when the terminal is used with JupyterHub, it only authenticates the user, it doesn't then authorise whether that user should have access to whatever that application is providing access to. For this reason, when using this configuration, the auth proxy will only allow an OpenShift user access to the terminal in the project, if the user is themselves an admin of the project the terminal is deployed in. If a user only had edit or view role, or was not even a member of that project, they will not be able to access the terminal.

Coming up next, extending the image

In this post we got a bit side tracked from the goal of explaining how the terminal can be used with JupyterHub by looking at how the terminal image can be deployed standalone without using JupyterHub.

This is useful as you may not be running a workshop, but want a way of being able to develop with OpenShift on your own project using the command line, but in your browser, without needing to install any tools locally. In the steps above the command line was used to do the deployment, but you could have done it from the OpenShift web console as well, totally avoiding ever needing to deploy the command line tools locally.

For the current terminal image it includes command line clients for OpenShift 3.10 and 3.11. From the terminal it will use the correct version corresponding to the version of OpenShift the terminal is deployed to. Especially for workshops, you may want to install additional command line tools, or populate the image with files used during the workshop.

In the next post I will talk about how you can extend the terminal image using a traditional docker build, as well as using a Source-to-Image (S2I) build. After that I will return to talking about integrating the terminal with JupyterHub for running multi user workshops.

Before moving on though, one last warning. The current incarnation of the work being described here should still be regarded as being in proof of concept phase. Earlier iterations have been used in actual workshops, but the current refresh of the code base to create the terminal image and integrate it with JupyterHub hasn't yet been used in anger. So use at your own risk. Do your own due diligence as to whether you believe it is secure. Be aware that names of images or locations of any respositories used to create them still may change as a proper home is found for them.

Friday, December 28, 2018

Using JupyterHub as a generic application spawner

As part of my day job I occasionally need to give workshops on using OpenShift, Red Hat's distribution of Kubernetes. The workshop notes that attendees follow to do the exercises we host using an internally developed tool called workshopper. We host the workshop notes using this tool inside of the same OpenShift cluster that users will be working in.

To access the OpenShift cluster and do the exercises, attendees use the OpenShift command line clients oc or odo. They may also need to use kubectl. Because these are client side tools they would need to be installed somewhere where the attendees can run them, usually this is on their own local computer.

Requiring that attendees install a client on their own local computer can often be a challenge. This is because when dealing with enterprise customers, their employee's computers may be locked down such that they are unable to install any additional software.

A solution one often sees to this problem is to enhance the tool used to host the workshop notes to embed an interactive terminal which is then used through the web browser. Behind the scenes that terminal would be connected up to some backend system where a shell environment is run for the specific user accessing the workshop. Finally, in that shell environment they would have access to all the command line tools, as well as other files, that may be needed for that workshop.

In my case, modifying the workshopper tool didn't look like a very attractive option. This is because it was implemented in Ruby, a programming language I don't know. Also, because workshopper only hosted content, it knew nothing of the concept of users and authentication, nor did it have any ability itself to spawn up separate isolated shell environments for each user.

Rather than modify the existing workshopper tool, another option would be to swap it out and use an open source tool such as Open edX which already encompassed such functionality. A problem with using Open edX is that it has a very large footprint and is way more complicated than we needed. The OpenShift clusters we used for training might only be running for a day, and setting up Open edX and loading it with content for the workshop would be a lot more work and wouldn't provide the one click setup solution we were after.

After thinking about the problem for a while it dawned on me, why not use JupyterHub, a tool I was well familar with and had already done a lot of work with around making it work on OpenShift and be easy to deploy.

Now you may be thinking, but JupyterHub is for spawning Jupyter notebook instances, how does that help. Although the Jupyter notebook web interface does have the capability to spawn an interactive terminal in the browser, it would be a quite confusing workflow for a user to access it in this way.

This is true, but what most people who may have been exposed to JupyterHub probably wouldn't know is that you don't have to use JupyterHub to spawn Jupyter notebooks. JupyterHub can in fact be used to spawn an arbitrary application, or when we are talking about Kubernetes, an arbitrary container image can be used for the per user environment that JupyterHub creates.

What does JupyterHub provide us?

JupyterHub calls itself a "multi-user server for Jupyter notebooks". It is made up of three sub systems:

  • a multi-user Hub (tornado process)
  • a configurable http proxy (node-http-proxy)
  • multiple single-user Jupyter notebook servers (Python/IPython/tornado)

The architecture diagram from the JupyterHub documentation is shown as:

img-alternative-text

As already noted, you aren't restricted to spawning Jupyter notebooks, it can be used to spawn any long running application, albeit usually a web application. That said, to integrate properly with JupyterHub as the spawner and proxy, the application you run does need to satisfy a couple of conditions. We'll get to what those are later.

The key things we get from JupyterHub by using it are:

  • can handle authentication of users using PAM, OAuth, LDAP and other custom user authenticators
  • can spawn applications in numerous ways, including as local processes, in local containers or to a Kubernetes cluster
  • proxies HTTP traffic, and monitors that traffic, facilitating detection and culling of idle application instances
  • provides an admin web interface for monitoring and controlling user instances of an application
  • provides a REST API for controlling JupyterHub and, monitoring and controlling user instances of an application

With these capabilities already existing, all we had to do for our use case was to configure JupyterHub to perform user authentication against the OAuth provider endpoint of the same OpenShift cluster they were using for the workshop. Next we configure JupyterHub to be able to spawn an application instance for each user as a distinct pod in Kubernetes.

The workflow for the user would be to visit the URL for the JupyterHub instance. This would redirect them to the login page for the OpenShift cluster where they would log in as the user they had been assigned. Once successfully logged in, they would be redirected back to JupyterHub and to their specific instance of an interactive terminal accessible through their web browser.

This interactive terminal would provide access to a shell environment within their own isolated pod in kubernetes. The image used for that pod is pre-loaded with the command line tools so they are all ready to go, without needing to install anything on their own local computer. The pod can also be backed by a per user persistent volume, so that any changes made to their workspace will not be lost in the event of their instance being restarted.

So the idea at this point is not to try and augment the original workshopper tool with an embedded terminal, but to provide access to the per user interactive terminal in a separate browser window.

In browser interactive terminal

In some respects, setting up JupyterHub is the easy part as through the prior work I had been doing on deploying JupyterHub to OpenShift, I already had a suitable JupyterHub image, and a workflow for customising its configuration to suit specific requirements.

What was missing and had to be created was an application to provide an in browser interactive terminal, except that I didn't need to even do that, as such an application already exists, I just had to package it up as a container image for JupyterHub to deploy in Kubernetes.

The application I used for the in browser interactive terminal was Butterfly. This is a Python application implemented using the Tornado web framework. It provides capabilities for handling one or more interactive terminal sessions for a user.

Because I was needing to deploy Butterfly to Kubernetes, I needed a container image. Such an image doesn't exist on DockerHub, but I couldn't use that. This is because to ensure access is fully protected and that only the intended user can access a specific instance, you need to perform a handshake with JupyterHub to verify the user. This is the first of those conditions I mentioned, and what is represented by the /api/auth call from the notebook to JupyterHub in the JupyterHub architecture diagram.

A second condition was the ability to force the web application being created for each user, to be hosted at a specific base URL. This is where you see only web traffic for /user/[name] being directed to a specific notebook instance in the JupyterHub architecture diagram. Butterfly already supported this capability, or at least it tried to. The hosting at a sub URL in Butterfly was actually broken, but with a bit of help from Florian Mounier, the author of Butterfly, I got a pull request accepted and merged which addressed the problem.

Coming up next, the terminal image

To cover all the details is going to be too much for a single post, so I am going to break everything up into a series of posts. This will include details of how the terminal application image was constructed and what it provides, as well as how JupyterHub was configured.

I will also in the series of posts explain how I then built on top of the basic functionality for providing a set of users with their own interactive terminal for the workshop, to include the capability to embed that terminal in a single dashboard along with the workshop notes, as well as potentially other embedded components such as the ability to access the OpenShift web console at the same time.

As the series of posts progresses I will reveal how you might try out what I have created yourself, but if eager to find out more before then, you can ping me on Twitter to get any conversation started.