This publish is co-written with Marc Neumann, Amor Steinberg and Marinus Krommenhoek from BMW Group.
The BMW Group – headquartered in Munich, Germany – is pushed by 149,000 staff worldwide and manufactures in over 30 manufacturing and meeting amenities throughout 15 international locations. At present, the BMW Group is the world’s main producer of premium vehicles and bikes, and supplier of premium monetary and mobility providers. The BMW Group units tendencies in manufacturing expertise and sustainability as an innovation chief with an clever materials combine, a technological shift in direction of digitalization, and resource-efficient manufacturing.
In an more and more digital and quickly altering world, BMW Group’s enterprise and product improvement methods rely closely on data-driven decision-making. With that, the necessity for information scientists and machine studying (ML) engineers has grown considerably. These expert professionals are tasked with constructing and deploying fashions that enhance the standard and effectivity of BMW’s enterprise processes and allow knowledgeable management selections.
Information scientists and ML engineers require succesful tooling and adequate compute for his or her work. Due to this fact, BMW established a centralized ML/deep studying infrastructure on premises a number of years in the past and constantly upgraded it. To pave the best way for the expansion of AI, BMW Group wanted to make a leap relating to scalability and elasticity whereas lowering operational overhead, software program licensing, and {hardware} administration.
On this publish, we are going to discuss how BMW Group, in collaboration with AWS Skilled Companies, constructed its Jupyter Managed (JuMa) service to deal with these challenges. JuMa is a service of BMW Group’s AI platform for its information analysts, ML engineers, and information scientists that gives a user-friendly workspace with an built-in improvement atmosphere (IDE). It’s powered by Amazon SageMaker Studio and gives JupyterLab for Python and Posit Workbench for R. This providing permits BMW ML engineers to carry out code-centric information analytics and ML, will increase developer productiveness by offering self-service functionality and infrastructure automation, and tightly integrates with BMW’s centralized IT tooling panorama.
JuMa is now out there to all information scientists, ML engineers, and information analysts at BMW Group. The service streamlines ML improvement and manufacturing workflows (MLOps) throughout BMW by offering a cost-efficient and scalable improvement atmosphere that facilitates seamless collaboration between information science and engineering groups worldwide. This ends in sooner experimentation and shorter concept validation cycles. Furthermore, the JuMa infrastructure, which relies on AWS serverless and managed providers, helps scale back operational overhead for DevOps groups and permits them to concentrate on enabling use circumstances and accelerating AI innovation at BMW Group.
Challenges of rising an on-premises AI platform
Previous to introducing the JuMa service, BMW groups worldwide had been utilizing two on-premises platforms that offered groups JupyterHub and RStudio environments. These platforms had been too restricted relating to CPU, GPU, and reminiscence to permit the scalability of AI at BMW Group. Scaling these platforms with managing extra on-premises {hardware}, extra software program licenses, and help charges would require vital up-front investments and excessive efforts for its upkeep. So as to add to this, restricted self-service capabilities had been out there, requiring excessive operational effort for its DevOps groups. Extra importantly, the usage of these platforms was misaligned with BMW Group’s IT cloud-first technique. For instance, groups utilizing these platforms missed a straightforward migration of their AI/ML prototypes to the industrialization of the answer working on AWS. In distinction, the information science and analytics groups already utilizing AWS instantly for experimentation wanted to additionally handle constructing and working their AWS infrastructure whereas guaranteeing compliance with BMW Group’s inside insurance policies, native legal guidelines, and laws. This included a variety of configuration and governance actions from ordering AWS accounts, limiting web entry, utilizing allowed listed packages to conserving their Docker pictures updated.
Overview of answer
JuMa is a completely managed multi-tenant, safety hardened AI platform service constructed on AWS with SageMaker Studio on the core. By counting on AWS serverless and managed providers as the primary constructing blocks of the infrastructure, the JuMa DevOps group doesn’t want to fret about patching servers, upgrading storage, or managing some other infrastructure parts. The service handles all these processes robotically, offering a robust technical platform that’s typically updated and able to use.
JuMa customers can effortlessly order a workspace by way of a self-service portal to create a safe and remoted improvement and experimentation atmosphere for his or her groups. After a JuMa workspace is provisioned, the customers can launch JupyterLab or Posit workbench environments in SageMaker Studio with only a few clicks and begin the event instantly, utilizing the instruments and frameworks they’re most conversant in. JuMa is tightly built-in with a variety of BMW Central IT providers, together with id and entry administration, roles and rights administration, BMW Cloud Information Hub (BMW’s information lake on AWS) and on-premises databases. The latter helps AI/ML groups seamlessly entry required information, given they’re licensed to take action, without having to construct information pipelines. Moreover, the notebooks could be built-in into the company Git repositories to collaborate utilizing model management.
The answer abstracts away all technical complexities related to AWS account administration, configuration, and customization for AI/ML groups, permitting them to completely concentrate on AI innovation. The platform ensures that the workspace configuration meets BMW’s safety and compliance necessities out of the field.
The next diagram describes the high-level context view of the structure.
Consumer journey
BMW AI/ML group members can order their JuMa workspace utilizing BMW’s customary catalog service. After approval by the road supervisor, the ordered JuMa workspace is provisioned by the platform totally automatedly. The workspace provisioning workflow consists of the next steps (as numbered within the structure diagram).
An information scientist group orders a brand new JuMa workspace in BMW’s Catalog. JuMa robotically provisions a brand new AWS account for the workspace. This ensures full isolation between the workspaces following the federated mannequin account construction talked about in SageMaker Studio Administration Greatest Practices.
JuMa configures a workspace (which is a Sagemaker area) that solely permits predefined Amazon SageMaker options required for experimentation and improvement, particular customized kernels, and lifecycle configurations. It additionally units up the required subnets and safety teams that make sure the notebooks run in a safe atmosphere.
After the workspaces are provisioned, the licensed customers log in to the JuMa portal and entry the SageMaker Studio IDE inside their workspace utilizing a SageMaker pre-signed URL. Customers can select between opening a SageMaker Studio non-public house or a shared house. Shared areas encourage collaboration between totally different members of a group that may work in parallel on the identical notebooks, whereas non-public areas permit for a improvement atmosphere for solitary workloads.
Utilizing the BMW information portal, customers can request entry to on-premises databases or information saved in BMW’s Cloud Information Hub, making it out there of their workspace for improvement and experimentation, from information preparation and evaluation to mannequin coaching and validation.
After an AI mannequin is developed and validated in JuMa, AI groups can use the MLOPs service of the BMW AI platform to deploy it to manufacturing shortly and effortlessly. This service gives customers with a production-grade ML infrastructure and pipelines on AWS utilizing SageMaker, which could be arrange in minutes with only a few clicks. Customers merely have to host their mannequin on the provisioned infrastructure and customise the pipeline to fulfill their particular use case wants. On this manner, the AI platform covers the whole AI lifecycle at BMW Group.
JuMa options
Following finest follow architecting on AWS, the JuMa service was designed and applied in line with the AWS Nicely-Architected Framework. Architectural selections of every Nicely-Architected pillar are described intimately within the following sections.
Safety and compliance
To guarantee full isolation between the tenants, every workspace receives its personal AWS account, the place the licensed customers can collectively collaborate on analytics duties in addition to on growing and experimenting with AI/ML fashions. The JuMa portal itself enforces isolation at runtime utilizing policy-based isolation with AWS Identification and Entry Administration (IAM) and the JuMa person’s context. For extra details about this technique, confer with Run-time, policy-based isolation with IAM.
Information scientists can solely entry their area via the BMW community by way of pre-signed URLs generated by the portal. Direct web entry is disabled inside their area. Their Sagemaker area privileges are constructed utilizing Amazon SageMaker Position Supervisor personas to make sure least privilege entry to AWS providers wanted for the event equivalent to SageMaker, Amazon Athena, Amazon Easy Storage Service (Amazon S3), and AWS Glue. This position implements ML guardrails (equivalent to these described in Governance and management), together with enforcement of ML coaching to happen in both Amazon Digital Personal Cloud (Amazon VPC) or with out web and permitting solely the usage of JuMa’s customized vetted and up-to-date SageMaker pictures.
As a result of JuMa is designed for improvement, experimentation, and ad-hoc evaluation, it implements retention insurance policies to take away information after 30 days. To entry information at any time when wanted and retailer it for long run, JuMa seamlessly integrates with the BMW Cloud Information Hub and BMW on-premises databases.
Lastly, JuMa helps a number of Areas to conform to particular native authorized conditions which, for instance, require it to course of information regionally to allow BMW’s information sovereignty.
Operational excellence
Each the JuMa platform backend and workspaces are applied with AWS serverless and managed providers. Utilizing these providers helps decrease the trouble of the BMW platform group sustaining and working the end-to-end answer, striving to be a no-ops service. Each the workspace and portal are monitored utilizing Amazon CloudWatch logs, metrics, and alarms to examine key efficiency indicators (KPIs) and proactively notify the platform group of any points. Moreover, the AWS X-Ray distributed tracing system is used to hint requests all through a number of parts and annotate CloudWatch logs with workspace-relevant context.
All modifications to the JuMa infrastructure are managed and applied via automation utilizing infrastructure as code (IaC). This helps scale back guide efforts and human errors, enhance consistency, and guarantee reproducible and version-controlled modifications throughout each JuMa platform backend workspaces. Particularly, all workspaces are provisioned and up to date via an onboarding course of constructed on high of AWS Step Features, AWS CodeBuild, and Terraform. Due to this fact, no guide configuration is required to onboard new workspaces to the JuMa platform.
Price optimization
Through the use of AWS serverless providers, JuMa ensures on-demand scalability, pre-approved occasion sizes, and a pay-as-you-go mannequin for the assets used in the course of the improvement and experimentation actions per the AI/ML groups’ wants. To additional optimize prices, the JuMa platform screens and identifies idle assets inside SageMaker Studio and shuts them down robotically to forestall bills for non-utilized assets.
Sustainability
JuMa replaces BMW’s two on-premises platforms for analytics and deep studying workloads that eat a substantial quantity of electrical energy and produce CO2 emissions even when not in use. By migrating AI/ML workloads from on premises to AWS, BMW will slash its environmental influence by decommissioning the on-premises platforms.
Moreover, the mechanism for auto shutdown of idle assets, information retention polices, and the workspace utilization studies to its homeowners applied in JuMa assist additional decrease the environmental footprint of working AI/ML workloads on AWS.
Efficiency effectivity
Through the use of SageMaker Studio, BMW groups profit from a straightforward adoption of the most recent SageMaker options that may assist speed up their experimentation. For instance, they’ll use Amazon SageMaker JumpStart capabilities to make use of the most recent pre-trained, open supply fashions. Moreover, it helps scale back AI/ML group efforts transferring from experimentation to answer industrialization, as a result of the event atmosphere gives the identical AWS core providers however restricted to improvement capabilities.
Reliability
SageMaker Studio domains are deployed in a VPC-only mode to handle web entry and solely permit entry to supposed AWS providers. The community is deployed in two Availability Zones to guard towards a single level of failure, reaching larger resiliency and availability of the platform to its customers.
Modifications to JuMa workspaces are robotically deployed and examined to improvement and integration environments, utilizing IaC and CI/CD pipelines, earlier than upgrading buyer environments.
Lastly, information saved in Amazon Elastic File System (Amazon EFS) for SageMaker Studio domains is saved after volumes are deleted for backup functions.
Conclusion
On this publish, we described how BMW Group in collaboration with AWS ProServe developed a completely managed AI platform service on AWS utilizing SageMaker Studio and different AWS serverless and managed providers.
With JuMa, BMW’s AI/ML groups are empowered to unlock new enterprise worth by accelerating experimentation in addition to time-to-market for disruptive AI options. Moreover, by migrating from its on-premises platform, BMW can scale back the general operational efforts and prices whereas additionally rising sustainability and the general safety posture.
To study extra about working your AI/ML experimentation and improvement workloads on AWS, go to Amazon SageMaker Studio.
Concerning the Authors
Marc Neumann is the top of the central AI Platform at BMP Group. He’s answerable for growing and implementing methods to make use of AI expertise for enterprise worth creation throughout the BMW Group. His main objective is to make sure that the usage of AI is sustainable and scalable, which means it may be constantly utilized throughout the group to drive long-term progress and innovation. By his management, Neumann goals to place the BMW Group as a pacesetter in AI-driven innovation and worth creation within the automotive trade and past.
Amor Steinberg is a Machine Studying Engineer at BMW Group and the service lead of Jupyter Managed, a brand new service that goals to supply a code-centric analytics and machine studying workbench for engineers and information scientists on the BMW Group. His previous expertise as a DevOps Engineer at monetary establishments enabled him to collect a novel understanding of the challenges that faces banks within the European Union and preserve the steadiness between striving for technological innovation, complying with legal guidelines and laws, and maximizing safety for patrons.
Marinus Krommenhoek is a Senior Cloud Answer Architect and a Software program Developer at BMW Group. He’s smitten by modernizing the IT panorama with state-of-the-art providers that add excessive worth and are straightforward to take care of and function. Marinus is a giant advocate of microservices, serverless architectures, and agile working. He has a document of working with distributed groups throughout the globe inside massive enterprises.
Nicolas Jacob Baer is a Principal Cloud Software Architect at AWS ProServe with a robust concentrate on information engineering and machine studying, based mostly in Switzerland. He works intently with enterprise prospects to design information platforms and construct superior analytics and ML use circumstances.
Joaquin Rinaudo is a Principal Safety Architect at AWS ProServe. He’s obsessed with constructing options that assist builders enhance their software program high quality. Previous to AWS, he labored throughout a number of domains within the safety trade, from cellular safety to cloud and compliance-related subjects. In his free time, Joaquin enjoys spending time with household and studying science-fiction novels.
Shukhrat Khodjaev is a Senior International Engagement Supervisor at AWS ProServe. He focuses on delivering impactful huge information and AI/ML options that allow AWS prospects to maximise their enterprise worth via information utilization.