With Data Science Stack, Canonical launches into data science

Thursday 19th September 2024 10:26 AM

OSs are great… But we end up going around in circles. So Canonical is looking to move up the infrastructure stacks. And one of them is now becoming critical: the Data Science brick stack. Canonical is infiltrating this market with an open source approach and its Data Science Stack solution.

Canonical is best known for its Linux distributions for servers, workstations, the embedded world and the maker world. But the publisher is looking for new horizons. The data opens up an obvious one for it. Canonical announced this week a new solution called Data Science Stack (DSS). This platform aims to simplify the implementation and management of environments dedicated to data science (starting with ML and AI needs of course).

Completely open source and free, Data Science Stack is a software stack designed to be used primarily on Ubuntu, although it is also compatible with other Linux distributions and notably via WSL on Windows and Multipass on macOS.

For the publisher, Canonical DSS is characterized by its rapid installation in three simple commands, making an initial configuration possible in 10 to 30 minutes depending on the user’s expertise.

This software stack combines and integrates key and well-known tools from data science chains such as Jupyter Notebook for model development, MLflow for experiment tracking, and essential ML frameworks like Pytorch and Tensorflow. Users also have the ability to customize the stack by adding libraries specific to their needs.

A notable feature of DSS is the integration of Intel’s PyTorch and TensorFlow distributions, ITEX and IPEX, which optimize hardware performance with technologies such as advanced vector extensions and GPU acceleration. This provides enterprise workloads with significant improvements in data processing time and faster AI experiments.

Canonical is also committed to maintaining the security of all included software packages, proactively patching vulnerabilities to protect software and data. This simplified dependency and version management reduces the technical challenges data scientists often face when deploying AI models and relieves IT administrators of complex patch management.

With integration with Kubernetes and native Ubuntu support, Data Science Stack is optimized for deployments in hybrid or multi-cloud environments.

The fact remains that Canonical is entering a complicated and competitive market. Even if we can see its “Data Science Stack” as a lower layer of Data Science platforms and therefore as a direct competitor to the DataOps solution of the French Saagie (and its DataFactory), the offer ultimately also competes with the Cloud platforms of hyperscalers (Microsoft Fabric, Google Vertex AI, Amazon SageMaker, OVHcloud Data Platform), with the multicloud platforms that are Dataiku, Databricks, DataRobot, Cloudera Data Platform, SASViya, Alteryx and others, but also the open source platforms that are Posit, Knime, RapidMiner (now in the Altair fold).

Canonical highlights not only its open source orientation and the accessibility of its solution but also its flexibility for companies and developers wishing to fully customize their data science working environment.

With Data Science Stack, Canonical launches into data science

Also read:

Adjusting the “data” information system to the professions

Serverless, AI, Lakeflow, open source… All the announcements from Databricks Summit 2024

OVHcloud announces the “Beta” of its “Data Platform” and launches “Managed Rancher Service”

Saagie offers its DataOps platform in a “sovereign” version on OVHcloud