“Machine Learning Is the Quantum Mechanics of Software Engineering”

12 October, 2021

No Comments

Stefan Nica is a Software Engineer/Architect at SUSE who has over 15 years of experience in software development, architecture and design. He has expertise across a wide range of domains: AI/ML, MLOps, DevOps, cloud-native applications, cloud platforms, communication protocols, networking and virtualization technologies. Stefan is a promoter of opensource, virtualization, DevOps culture and practices. He loves good challenges and thrives in an innovation-friendly environment.

How and why did you decide to dive into the tech industry? What and/ or who has inspired you to do so?

It would be tough to point out one exact thing and say that’s exactly how it started. But if I had to do so, I would probably say it began by simple curiosity. And the way that usually goes is:

Hey, you know, what is this?

This is a computer.

Oh, how interesting!

And the… WOW, I can programme it to do what I want it to do and, and if it makes a mistake, it’s not because you know someone else’s fault, because I screwed something and that kind of combination of control and self-accountability but also the freedom. All those things that you can do with a computer can quickly get to be really addictive, so it’s really an addiction-type-of-thing. And yes, I guess that’s how it started.

Then I started doing this and it was all about getting this high and scratching this edge. It was more of a selfish endeavour in that sense when I really started loving what I do with technology but this was much later, I guess maybe five years ago. I was only doing it for myself but five years ago something happened. I ran across this interesting thing called Software-Defined Networking which was at the time, a really disruptive idea. Then I really understood that the power of these ideas can have to change everything, to challenge the status quo of the industry, I guess. And those pioneers at the time were really trying to do exactly that. Because networking is so very complicated and involves so many protocols. If you look at a diagram of protocols and want to print them out and put them on your wall, it will probably take you an entire wall. There are so many of them – hundreds of thousands and these guys were all about making things accessible for everyone else. Simplifying it so that anyone can get involved, innovate, contribute and attend. And I guess that’s when I really started gaining some perspective on things and I’ve been doing that ever since.

We’re not doing technology just for the sake of doing technology. Actually, I was kind of doing that before. Then I realised that technology is just a tool, something that everyone can use and needs to access.

What have you been working on for the past few years?

I think the last few years have been the most interesting years in my career, but then I could have said the same for every year, whenever you asked me that.

So before I get into details I think I need to come clean with something. In my bio and on my LinkedIn account I say that I work for SUSE. That is a lie. Let me explain. You know they say “Choose a job and do something that you love so you never have to work a day in your life.” And that’s the case with me. That’s how it feels working with an open-source company like SUSE. They say that open source is in our genes there and that’s true.

So, I started working maybe five, or more than five years ago as an OpenStack cloud software developer. OpenStack is a complex cloud platform software. It was also a disruptive technology cloud for a while, hence, I started to do that. Then, a few years after I made the transition to something else equally interesting to containerization, Kubernetes and containers. I was also a cloud-native application developer for a while.

But I guess the really interesting story of mine is when I started working in doing things with artificial intelligence and machine learning. In 2020, late 2020, I got involved with this amazing group of people, and we created this open-source project called FuseML. This is partially the reason why I gave this talk to the ОpenFest. And we recently launched it. We’re trying to improve things for everyone that wants to do something with machine learning.

You have a lot of experience in the tech world, starting from virtualization to SDN, to cloud computing, to containerization and cloud-native applications. Now you are exploring the AI/ML realm. Can you share what are the main challenges that stand in front of the AI/ML world?

Sure, so it depends on who you ask. For companies, I would say the challenge is to not really read the benefits of machine learning.

So far, only the big giants and the companies have been able to successfully do that and that’s why you hear all these interesting stories about artificial intelligence and machine learning coming from companies like Google, Netflix, Uber and so on. The real challenge here is to democratise that type of success and to give everyone a fair chance. I think that’s what these new disciplines of ML ops, machine learning operations are trying to do together with a set of best practices to help with that. So, it aims to help facilitate everyone’s access to building machine learning systems that everyone can put in production and get the revenue out of that.

For us, end-users, well, not for me because I’m a developer but I guess I’m also an end-user of machine learning… So, for us and users and maybe even the society at large the challenge is huge because it’s the world we need to understand. I don’t think there are a lot of people that really understand what machine learning is all about and what artificial intelligence really does. So I think that would be the next challenge for us, to inform ourselves and for those who know about it, to popularise it and democratise it in a way that makes it accessible to anyone. In that way, we’ll make it easy for everyone to understand what the implications of machine learning are and how we affect them personally because they will affect us as well.

And I think that that is one way to do it. So, again, I keep going with the democratisation of machine learning. I think that is one of the best ways to tackle this problem and this challenge. Also the popularisation. I hope there’ll come a time when we need to standardise what we’re doing with machine learning in the industry.

What specialized interpretation of the traditional DevOps culture and methodologies are required to build and maintain a successful production Machine Learning system?

It’s not so much in my opinion as it is something that is still developing. It’s something called ML ops and machine learning operations. So people have tried to apply DevOps, standard DevOps, conventional DevOps to build machine learning systems and machine learning applications. And it didn’t really give us a day or hope for – I’m talking mainly about companies that are trying to put production machine learning systems out there, and the reason for that is the fact that machine learning systems are unique. You need to change the way you think about machine learning because they are unlike anything we’re doing within conventional software engineering.

So, DevOps for machine learning requires a more targeted approach to apply them to machine learning and that’s what ML Ops is. It basically looks at machine learning and then says and recognises that machine learning is weird. So, what I’m thinking is that machine learning is the quantum mechanics of software engineering. Quantum mechanics is really weird when compared to classical mechanics or Newtonian mechanics. They can behave in all sorts of unexpected ways, they are really opaque. You cannot directly measure it, you cannot actually measure what it’s doing as you do in classical mechanics.

So you don’t really know what’s happening inside. You need to take a lot of measurements, you need to do a lot of experimentation with machine learning models to understand how they behave, and to change them to behave the way that you want them to. So it’s really, really unlike anything that we do with conventional engineering.

What AI and machine learning tools are you familiar with?

My experience with machine learning tools and additional registers comes mostly from what I do with this fusion project.

And, gosh, I wish I had more than 24 hours in a day because there’s like an entire ecosystem of things, tools and ideas that are constantly expanding and evolving. Some more research papers are being published each and every day in the machine learning field. More tools are being able to capture those ideas and I can only scratch the surface with what I’m doing.

But in my experience as an engineer that is working on something like ML ops, that is maybe closer to the tools that you need to successfully put machine learning in production. So I can give some examples there. I have some experience with tools that relate to those that you use to track what you do with machine learning models, to Version Control Data and all these artefacts that are coming from machine learning development. Things like ML flow, that is a very popular tool for doing that. And not just that but also it’s really popular for data science DVC, and that stands for data version control, which is another tool that I briefly worked with.

Well, the tools that are inside a component that can be used to generate predictions are called Prediction Service platforms. And today I’m also dealing with my voice. And, I mean I can go on and on.

You need these pipeline orchestration engines to implement DevOps like workflows and that are specific to machine learning things are tacked on Argo and kept for pipelines. And yeah, I think that there are maybe hundreds of thousands of doors in here. It’s very hard to keep track of all of them.

You have mentioned that you have dealt with a lot of machine learning problems. What kinds of machine learning problems have you tackled, and how did you tackle them?

Yeah, so those again are related to what I’m doing. It’s an orchestration tool for ML ops and because of that machine learning sometimes involves a lot of tools. So sometimes you need several of those tools to glue them together.

Because every tool serves a localised purpose in what you’re doing to create machine learning systems that are production-grade, one challenge that we’ve been trying to do with FuseML is how to integrate all those tools together, how to get open source tools from various open-source projects, unrelated to one another, and create a complete end to end workflow on top of them. How to integrate them in a way that, first of all, they work together, because sometimes they don’t. And how we did that is, we’re featuring all these Extensibility Mechanisms. We fuse them out with a FuseML project. We can integrate all these tools, and use them as components in your automated workflow with minimal friction.

I guess the way that an end-user would see that someone that uses FuseML is through abstractions. So through abstractions, it is a very nice way of bringing this thought again. Extracting simplicity out of complexity and giving everybody a chance to interact with those that can be complicated and really not challenging to interact with otherwise. So that’s one problem that I tackled. And that’s what, not me myself but the whole team from the project does.

Peace of mind is also an automation tool that automates the workflow of machine learning and building machine learning systems. The question was how. So, how do you deal with it? How can you find a balance between automation on one hand and customization?

So how to have a tool that allows you to automate everything that you want to automate but at the same time gives you the control you need to customise the way it’s running all those processes that are needed to implement the workflow that you’re trying to automate, how do we tackle that one?

Well first of all we recognise that there’s a problem that needs to be solved. So, let’s say you have some images, and you want to recognise you do not want to do object detection from those images. You can have an end to end, fully automatic workflow that does that for you – you just deploy it, apply it to those images and it does that for you with us. But you also can get very personal and particular about how you’re doing things. You can split that workflow into PCs and orchestrate them in a more customizable way.

What are the ethical implications of using machine learning?

Yeah, that’s what everybody is thinking about. I guess that doesn’t matter what we are doing with machine learning. I don’t know if I’m prepared enough to answer that question myself. Everybody has their own interpretation of things. It is a big challenge that machine learning in the industrial sector still needs to deal with that and even more. I myself have to admit I’m still struggling to understand what those indications are but probably time will show that. So I think machine learning has the capacity to transform society, on a very deep level. I guess everyone is concerned with things like privacy, surveillance, and, of course, bias and discrimination because machine learning systems can do that if they’re not properly built.

And these are maybe personal things. So how does it affect me and how will it affect me? How will machine learning systems take away my job? Will it, and how will it impact me? Will this benefit me as a person, as an individual?

I think the more pressing question and the more pressing ethical implication is how it will impact us as a society. There’s one example that I figure we can just give a thought of how much we need to how much we lie on a day to day basis to friends, to relatives, to our children, to our employer, employees. So we do that, maybe even to ourselves sometimes because we need some kind of reassurance on different things.

So as a society and as individuals, we really rely on a lot of stuff that is distorting the truth. So imagine that there will be an app out there, some point in the future and it won’t take too long to come up with that kind of thing that can detect whether you’re telling the truth. You can install it on your phone, you can screen your conversations like the conversation that we’re having right now. So, everyone can have such an app on there, or on their laptop installed and they can track the eye movements, the way our lips move the inflexions in our voice. It can tell us whether we’re telling the truth, or whether we’re not really truthful about what we’re saying.

Now imagine the implications of that and how we can deal with that as a society, as individuals. I think that’s the kind of thing that keeps me up at night. And I really wonder whether we as a society, as a human species, were able to cope with those types of changes. Because everyone has their personal interpretation of those occasions.

So let’s move slightly to the OpenFest 2021. At this year’s OpenFest2021 you presented “MLOps: specialized DevOps for Machine Learning”. What is the reason to choose this topic? How important is it?

Yeah, so it all goes back to what I was saying earlier about FuseML. This project that we’ve been working on at SUSE, with the ML team there. So ML is an emerging engineering discipline where there is still a lot of experimentation, a lot of thoughts being gathered, a lot of ideas being exchanged as practices. They are being collected and I think people need to be aware of what’s happening. Because it’s only through collaboration and exchange of healthy ideas. There’s even an ML ops community, where people that are interested in that exchange ideas. And that’s some of the reasons why I think everyone needs to know about ML ops.

What is your message to all beginners and tech newbies?

Well, if you want to be successful in this industry, that’s really highly competitive, find something that is complicated to use, simplify it in that way so it’s not widely available. Make it accessible to everyone. And I think in that way you’ll not only have to benefit yourself as someone who is passionate about technology but also contribute to the larger picture.