March 21, 2023

We are improving quality of service

By CLORE.AI

From now on there will only machines proven working will be visible on the marketplace.

What have changed?

  • New monitoring system has been put in place as a container on all machines provided on clore.ai. This container monitors GPU(s) in the machine and reports their status to clore.ai platform. Machines that are reporting issues are hidden from the marketplace and are waiting for their owners to fix them.
  • We also thought about another scenario, that docker could have issues on the machines, so also machines where the monitoring system isn't running are also hidden from the marketplace.
  • Right GPU type - GPUs in the machines are reported to clore.ai platform in realtime, so if the machine GPU(s) get changed it will be automatically updated on the platform.
  • PCI-e lanes are reported and shown on the marketplace, this could be especially useful when choosing the right machine for your workload, different workloads might require different bandwidth between CPU and GPU, so you will see on the platform the PCI-e lane count and PCI-e generation.

There are 5 Server states

You can now see Server status in your server details

1. Offline

  • Your server is offline
  • Server is hidden from the marketplace

2. Starting up

  • Your server was turned on a while ago, but monitoring system didn't start yet, it is uncertain if your server works properly
  • Server is hidden from the marketplace

3. Docker failure

  • Your server is online for over 10 minutes and monitoring system isn't running, it is most likely issue with your docker installation on you machine
  • Server is hidden from the marketplace

4. CUDA failure

  • Issue with your GPU(s) was detected, for example it might be: driver crash, GPU(s) failure...
  • Server is hidden from the marketplace

5. Working properly

  • Your server is working properly
  • Server is available on the marketplace