Tanzu Challenge #1

Goals

Additional text from mail:

You are architecting the entire solution; consider how data (media content) would be provided to the Plex platform, how that data would be managed and protected alongside the application server itself (Plex app). Consider how any audit / logging could be provided, updates to the application, monitoring performance of the application and delivery of media streams, recovery/high availability of the platform. In the world of cloud native applications, applications should never be ‘down’.

Your solution can be documented in medium of choice but a blog post would be a preferred one so it can be then shared with the community.

The challenge is open until the 28th of January 2022.

Keith & Robbie

Research

To avoid to reinvent the wheel, I always do a research on existing open source solutions.

I found this official docker support and a very useful media server operator for k8s with interesting add-ons (additional sources, torrent support, ARM). Checkout Tautulli as well, this is a nice add on for monitoring the service on user, category and media level. This is another nice write-up with focus on the performance on bare metal.

For the PoC I decided to use a cloud storage service for a shared file system that could be deployed in various redundancy scenarios. Let’s discuss what “production grade” means in the target architecture paragraph. My research revealed an interesting clustering project which gives a lot of important hints to the limitation of the media server component itself.

Proof of Concept

I decided to give it a try and do a PoC on AWS as I’m preparing for the AWS Solution Architect Professional certification as well. The installation instructions for TCE are well documented and clear, additional complexity comes with the differences of a given hyperscaler.

I started with the tanzu-bootstrap machine on which I installed all prerequisites for the TCE binary.

awsmgmt ist the Tanzu provisioned management cluster, plex-1 the workload cluster, both with bastion hosts (not really needed having the tanzu-bootstrap). This configuration is more than 25 $ a day.

The installation worked like a charm, and for the media library operator I just had to figure out how to use the EFS as an NFS storage class:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: mediaserver-pv
  labels:
    storage.k8s.io/name: nfs
spec:
  accessModes:
    - ReadWriteOnce
    - ReadOnlyMany
    - ReadWriteMany
  capacity:
    storage: 10Gi
  storageClassName: ""
  persistentVolumeReclaimPolicy: Recycle
  volumeMode: Filesystem
  mountOptions:
    - nfsvers=4.1
    - rsize=1048576
    - wsize=1048576
    - hard
    - timeo=600
    - retrans=2
    - noresvport 
  nfs:
    server: <your efs service ip>
    path: /
    readOnly: no

If you manage to open the network connection between the nodes and the EFS service, you have a nice service that automates globally distributed data with concurrent read/write-access.

At this point I concluded my research and had to clarify additional questions.

Target Architecture

Plex.tv is an interesting sevice that offers a central streaming directory combinable with user managed content that can be shared with other users. So I see it as a gateway to integrated and subscribed content provider together with a social platform for users. The freely provided Media Server runs on a lot of platforms even directly on NAS. It is optimized for a small footprint for personal use.

So if we try to find the scope of the term “production grade”, we have to define the business case: are we talking about a big streaming provider or a sharing platform for private users?

Business Models

Streaming Services for a high bandwith will rely on Content Delivery Networks to completely offload the streaming/converting work from the business components. But yes, if you want to create a CDN on k8s you could follow this lead. I think we should leave the delivery of high amounts of raw persistent data to specialized endpoints.

Member Services could be a more realistic use case for the given components, the question is, how this model could differentiate itself from plex.tv if you simply offer uptime and storage in the cloud to install a personal instance per user (multi-tenancy by multi-instance).

Reliable Personal Service a solution for one user only that want to manage an share his contents in the plex.tv community, maybe an artist or producer.

Technical Constraints

Plex Media Server works by scanning local drives (configurable) and by querying online offers by plex.tv and other pluggable sources. It uses a SQLite database per default and does not document any support for running in a cluster. There is an API available, and this project already started working on the necessary enhancements. A lot of the add-ons are open source but I did not find the code for the server itself – any hints in the comments are appreciated.

Without code change it won’t be easy to have a loadbalancer doing a proper session handling if you don’t enable sticky sessions. And you have to snychronize users/passwords between the instances by API if possible.

The file storage is no problem at all; with cloud storage models you can meet a wide range of requirements. It’s simply a shared storage.

Architectural Decision Record

Given the most probable use case and the technical limitations of the selected components, I will work on the variant for a Reliable Personal Service for an artist or producer that wants to spend the money to get a “production grade” and powerful platform where they can store, manage and share their content. For personal use in a home network I recommend to use the media server as is – maybe the torrent service of the project I used could be a driver to have a cluster with a minimal footprint.

You are architecting the entire solution; consider how data (media content) would be provided to the Plex platform, how that data would be managed and protected alongside the application server itself (Plex app). Consider how any audit / logging could be provided, updates to the application, monitoring performance of the application and delivery of media streams, recovery/high availability of the platform. In the world of cloud native applications, applications should never be ‘down’.

  • Data is provided by the EFS service. It can be managed by directly accessing the file share or by the client functionality.
  • Audit and Logging could be provided by standard tooling like prometheus, grafana, alertmanager, elk…
  • monitoring performance of the application and delivery of media streams could be done by Tautulli or by a customized prometheus exporter that converts the periodically called api functions into the prometheus metrics format
  • Updating should be simply possible by stateful sets with rolling update. In addition, the high availablity of the platform ensures a smooth upgrade as well. If you want to introduce application changes a blue/green deployment can be enabled with service mesh services like Istio. I doubt that this would be needed if the full 3 zone availability of TCE is implemented.
  • High Availability is achieved by using 3 AZs on the AWS region, the artist is mostly asked for. The storage is on a redundant EFS storage in all three AZs. We use EFS also for backups and restore by velero.
  • How many requirements could be met with the VMware Tanzu Mission Control Starter?

With these decisions in mind I have been drawing the setup on https://app.cloudcraft.co/


Click on the image to get redirected to the cloudcraft source.

This diagram depicts the use of the CloudFront CDN by AWS to completely free k8s from serving static content. This is only reasonable if you need to serve at a high bandwith. For other use cases I fall back on the EFS storage hosting the files.

For the implementation design, I recommend to use the packages from the tanzu open source components for ingress, certificate management, dns management, logging and monitoring.

With Tanzu Community Edition, this architecture is applicable to other deployment targets as well, you can scale it from your homelab up to any hyperscaler.
It’s up to you, if you want to run a minimal cluster or a full scale setting. You just have to adapt the storage class to your needs.

I am looking forward to learning about the engagement from VMware in the open source space, I really appreciate it!

Thanks and Credits

Thanks to Keith & Robbie for creating an interesting challenge!

This challenge was conducted within the vExpert Application Modernization Subprogram of VMware. If you’re interested to apply for vExpert and what the advantages of it are, check out vexpert.vmware.com.


Be the first to comment

Leave a Reply

Your email address will not be published.


*