BUILD vs BUY: What's the best way to build AI systems? - EP 016

Show notes

In this episode, we dive deep into the debate of open source vs closed source AI models. Whether you’re considering them for your business or personal projects, making the right decision is crucial. Join Svetlana and Edgar as they explore the pros and cons of each, providing insights on how to navigate the build vs buy decisions.

Show transcript

00:00:02: Hello and welcome to this week's episode of the AI Boardroom.

00:00:07: Svetlana, give us some hints about what we're talking today.

00:00:11: Yeah, apologies.

00:00:15: We're talking about open versus closed source models.

00:00:18: And if you're considering them for your use cases, how do you navigate this build versus buy kind of decision?

00:00:27: What's the pros and cons

00:00:28: which and what factors should you consider when, again, considering open versus closed systems for your use cases?

00:00:38: Yeah.

00:00:39: Let's start off with defining what is closed and what is open.

00:00:47: And in open source, we could also try to define why is it open and what's actually the useful stuff.

00:00:58: So yeah, closed source, we know it all.

00:01:01: use it all, like, or at least everyone who used Chetch -PT or Gemini or someone like that, you use a closed model.

00:01:08: Closed means we don't know lot about the model.

00:01:12: We just know what it is.

00:01:15: For GPT -4, we always still don't know actually the size apart from some leaks and how big the model is.

00:01:23: think Google is also not disclosing how big the models

00:01:27: So yeah, you hear like closed means closed.

00:01:31: We don't have a lot of information.

00:01:35: On the contrary, we have open source, which usually means that everything's just open.

00:01:43: So you can look at the algorithms that were used to create the model.

00:01:47: can mostly disclose what data did you use.

00:01:54: And there it gets a bit...

00:01:57: interesting on the open source side too, and having the actual weights.

00:02:02: So like a model consists of billions of parameters, which are called weights.

00:02:07: And these weights are basically what defines how the model decides which token to generate next.

00:02:15: Token being the text that generates.

00:02:19: And yeah, that's like the open source means you can use

00:02:26: under an open source license, which also differ a bit, but usually you can just use it.

00:02:33: Don't load it, use it.

00:02:35: Change it if you want to.

00:02:38: So that's basically like the two camps.

00:02:42: Yeah, I think I'll add, I think the biggest difference and maybe like the most impactful is the fact that one has these open parameters, which means that you can invite

00:02:55: you know, experts, data scientists, ML engineers to actually retrain the system on additional data.

00:03:02: So by doing that, you're ultimately changing up the weights, the parameters kind of of those systems.

00:03:09: And then you could get potentially a different version of the large language model or a different output.

00:03:16: So you have more control over the model itself and how it's built, kind of the inner workings.

00:03:21: I haven't found that...

00:03:24: I think some systems or models don't make all of the information visible, probably for legal reasons for the type of data that they were trained on.

00:03:35: But nonetheless, I think you still have a finish the complete model.

00:03:39: A lot of the initial costs to actually train the initial version of the model have been taken care of for you.

00:03:45: And so you are starting with some product, but you can mold it in the way that you seem fit.

00:03:53: bringing in the experts.

00:03:55: With the closed systems, it is very much like a blog box.

00:03:59: So you're given the blog box and everything, the I would say, ability for you to do anything with it is to build around it.

00:04:09: You just kind of have to walk around, like you know what the input and output is, but you don't know what's inside.

00:04:15: Yeah, but you have the option to fine tune stuff a lot of the times, not for the really big models.

00:04:23: But for the smaller closed models, you can fine tune them on your own data to work better on your use case.

00:04:29: So that's not completely off the charts, but yeah, it's again only in the small area they allow you to operate in.

00:04:41: Okay.

00:04:42: Yeah.

00:04:42: So hopefully that helps to paint the picture with open versus closed source.

00:04:48: is and I think we wanted to dive into some factors for our audience to consider and I think we wanted to start with privacy.

00:04:54: Yeah, let's start with privacy.

00:04:59: Privacy also includes like where do you have your model?

00:05:04: Like you can use open source on a hosted platform like Nvidia or Azure or Google cloud, AWS, they all have their own like sphere where you can deploy open source models.

00:05:23: They're mostly also available at like open source models are mostly available everywhere.

00:05:29: So you can use Lama basically everywhere where you can use hosted models.

00:05:37: so, yeah, that's still like not on your computer.

00:05:42: It's still in the cloud, but at least you decide on where to deploy it.

00:05:47: And it's like in your own system of whatever cloud you use.

00:05:54: On the other hand, you have like completely hosted solutions.

00:05:59: of the closed source models, which are basically an API.

00:06:02: You have an endpoint, you can send data in, you get data back.

00:06:05: That's all.

00:06:06: You don't know where the server is.

00:06:07: You don't know which server it takes care of.

00:06:09: Sometimes you can adjust the location and it most likely chooses location near you to keep the waiting time down as much as possible.

00:06:20: But in the end, yeah, there is no way that you can actually

00:06:29: influence where your data is going and what's going back.

00:06:32: And you just have to completely rely that's not safe anywhere.

00:06:37: Yeah.

00:06:38: I think, mean, more trust that they have all the right infrastructures in place that data leakage is not going to go.

00:06:44: But I think for a lot of the close source models, they're pretty established companies, right?

00:06:51: Like I would trust that they have the right infrastructures in place, I think.

00:06:56: Some concerns were raised, I think, initially when a lot of people started using chat GPT, which is a GPT -4 model for work, specifically started uploading documents to it that had

00:07:06: IP into it.

00:07:07: And they're ultimately submitting that document into that API stream that goes back to OpenAI and whatever they end up doing with that file, potentially retraining the model on

00:07:21: it.

00:07:22: which is why your documents can become fair game for anything that you could query against.

00:07:29: It's even worse if you don't get it out there.

00:07:31: There is no way on earth right now to get the documents out if they're once trained in.

00:07:39: Yeah, it's just like fighting a needle in the haystack because these models are so huge.

00:07:44: Yeah, and so there's no traceability for where that document

00:07:48: versus open source, you have a lot more control.

00:07:51: You can build these control systems to take care of a lot of these privacy concerns.

00:07:56: But again, I think the biggest thing is you have to have the right talent to navigate it.

00:08:02: it's almost like a scale.

00:08:05: Either you want more control, but then you have the talent, then you have to have more budget probably potentially to work with open source models.

00:08:13: So you can take the shortcut of work with closed models, but then take

00:08:18: or assume the risk of what happens with your data once you start working with it.

00:08:25: So one thing I really enjoy with most closed source models is that I'm like I'm trusting the agreement I have for the company.

00:08:39: There is something to say about like how safe is

00:08:44: We had an established security company which took down half the world with a faulty update.

00:08:52: Being established is not a security guarantee for anything.

00:08:58: So you just have to rely on their security systems and also all their employees that have access are actually using it in the proper way.

00:09:06: yeah, there is still this amount of control you use about where stuff is going.

00:09:15: which yeah, like I said, I trusted, that said, honestly, it's really cost efficient and it's really like easy to get going.

00:09:24: So I always will start my journey with, with some.

00:09:30: hosted or closed solution because, yeah, mostly GPT from open AI from, from my part.

00:09:39: because I just have the access key and I

00:09:42: throw it into the next application I wanna try out and I'm good to go.

00:09:46: And I don't have to rely on like any platform delivering me the same experience.

00:09:55: Because of course, if you have everything close and a close model like Apple does it a lot, you have the full control and you're able to like optimize the S out of it.

00:10:07: And yeah, I think that's definitely something.

00:10:12: take into consideration where else if you have an open source model, they are mostly just like put out in the public.

00:10:18: They're not hosted.

00:10:19: I think you can use Metas.

00:10:22: They have also an API for Llama, but most of the time you like either download it or host it anywhere on some cloud provider.

00:10:31: Yeah, that's something.

00:10:33: No system seems to be really a hundred percent safe.

00:10:42: So hosting your own model on your chosen destination is at least something to take into consideration if it's important and if you have sensitive data.

00:10:53: If there is any way you don't have sensitive data, I would suggest to use just the best tool for the job, which gets you onboarded the quickest.

00:11:06: Yeah.

00:11:07: So you mentioned something about cost.

00:11:10: maybe Yeah.

00:11:12: Talk about that a little bit.

00:11:14: Yeah, the cost factor, that's the other thing why I always use the closed source stuff because it is optimized.

00:11:19: Optimized in AI means most of the time less cost.

00:11:27: having to spin up your own GPU or service to maintain the stuff, even have the staff to even do that.

00:11:37: which has enough knowledge because it's not your typical server that you did for the last 30 years.

00:11:43: You now need GPUs all of a sudden.

00:11:47: And yeah, and then that's definitely something you have to take into consideration because yeah, you have different challenges because hosting AI solutions is a different challenge

00:12:03: than it was before.

00:12:05: Yeah.

00:12:06: And I think what I wanted to also mention that there is a difference, again, depending on the volume of inferences that you plan to do.

00:12:20: I think the cost could be significant between the closed and open source.

00:12:26: I just want to make sure that people understand that there's costs and running it either way, either you're paying them for the API calls directly

00:12:35: that says someone like, open AI, or you have hosting fees for using compute and there's other infrastructure costs.

00:12:43: But the difference, I think, from what I've seen has been there is a significant difference in over long term because those costs will accumulate as you end up recruiting

00:12:58: more and more users.

00:12:59: So those network effects, tapping into the network effects of using it more, that close

00:13:05: cost -closed model systems tend to be a little bit more expensive than open.

00:13:09: And again, something that may be shorter term may not be a significant difference, but as you accumulate more users, more experiences, more use cases for inference, that difference

00:13:23: could be significant.

00:13:25: It is, at the moment at least, a bit of mixed bag because you have costs associated to calls.

00:13:35: if you have an API.

00:13:36: So you have a call, you have input token, output token, that's basically how it costs calculating.

00:13:41: And so if you have a big prompt to put in, the bigger the prompt, the more this costs you to just like put it into the model, which is most, which is like the cheaper part, the

00:13:56: more expensive part is the actual results.

00:13:58: So you have different prices for putting stuff in and getting stuff out.

00:14:03: getting stuff out is like a lot more expensive, but usually your answers are a lot shorter than what you put in.

00:14:11: And yeah, so that's one thing, look at your closed source pricing, you will always have the difference between input and output tokens.

00:14:22: On the open source side, it is not a given that you have this kind of serverless way to do stuff.

00:14:30: Because if you have an API, you just make a call, you get an answer back.

00:14:34: You don't have to care at all about any hostings.

00:14:37: The only thing you might have to care about are some limits on your usage, because they all have quotas to apply.

00:14:45: If you have some...

00:14:50: open source models and you hosted, for example, on Azure, then you have to have a server dedicated for that.

00:14:55: You have to rent the machine in the cloud, which will add up fixed costs directly.

00:15:02: It's just running and it costs you money if you do inference or you don't.

00:15:06: And not every open source model directly gets access on all, at least in all regions, to being hosted in this serverless fashion.

00:15:17: And that's an issue because for me, like testing stuff out, I don't need a machine running, which costs me money.

00:15:24: Like you also have like run pod where you can just spin up GPUs.

00:15:29: And then you have to maintain it and like shut it down if you don't use it because you pay per hour for the machine.

00:15:37: While on the serverless solution with an API call, you just pay for the inference itself and then you're good to go.

00:15:45: depending on the load, most of the time are a lot cheaper using the LifePay as a Go model.

00:15:55: Because until you scale up to a point where this matters, a dedicated machine might not be the proper solution for you.

00:16:08: I've learned a lot because you and I kind of balance each other.

00:16:13: You're very technical.

00:16:14: I tend to be like higher level, but I've learned a lot even through that.

00:16:18: So thank you for diving deep.

00:16:21: So we touched on controls, privacy, and I think flexibility.

00:16:27: We talked a little bit about that you have more control over fine tuning the models.

00:16:35: There's costs,

00:16:37: the thing to pay attention there is the difference in cost structure.

00:16:41: What are some use cases in your perspective that you've seen OpenAI or I'm sorry, Open versus closed systems being best suited for?

00:16:51: So one thing I take into consideration or always took into consideration in this decision up until like a month ago was cutting edge models.

00:17:01: like OpenAI had just the best models.

00:17:04: There was no way around it.

00:17:06: And if you go away from OpenAI and you don't want to use them, like there's Claw, then there is Google, which are all closed source, but these are the cutting edge models.

00:17:15: But then comes Meta and creates Lama 3 and now 3 .1, which are hugely successful and highly capable models, especially for this size.

00:17:28: Comparing it some tasks with like 10 times bigger models and doing it really

00:17:35: which opened up a whole lot of new use cases because you can now have a small model which is capable, which you can host wherever you want, even on your local machine, and use it.

00:17:49: And it's fast enough to be actually useful.

00:17:52: And that's where I mostly use open source.

00:17:55: And I get always the latest open source models on my machine because having them without the need of an internet connection, for example.

00:18:04: And also just to test out stuff.

00:18:06: It's like a lot faster if I just do it on my local machine compared to if I have to, yeah, go somewhere and like host, use a hosted solution, which might even don't have such a

00:18:21: small and fast model in the first place.

00:18:24: For example, now OpenAI has GPT -4 .0 mini, which is really fast and really small, but it still is

00:18:34: not as fast as, for example, Lama 3 .8 billion hosted on something like Grok, like it's not even close because that's how fast the solution is.

00:18:46: And I only can use the solution with the open source model.

00:18:50: So you have some combinations of features.

00:18:55: For example, Grok is a company.

00:18:58: They have built a chip which just is optimized for AI execution.

00:19:05: And they only provide open source models because they don't have their own models.

00:19:09: They just do hardware and have an API.

00:19:14: And you can use open source models on that, which are faster than anything you can use anywhere else.

00:19:21: that's something if you have latency sensitive stuff, which doesn't need the huge intelligence of a really big open source model, then you can go ahead and just say, OK.

00:19:33: Let's use Lama8b.

00:19:35: Let's optimize for that.

00:19:37: Let's optimize the prompting for that.

00:19:39: Because there we have the maximum speed possible for the time being.

00:19:44: So that's really going to the nuances now.

00:19:47: But that's also what you have to do.

00:19:49: You have to really evaluate what's my situation.

00:19:53: And I tend now to recommend go with the smallest, or go with the dumbest, most dumb model you can.

00:20:03: like most small model you can, which still fits your needs because that brings you really quickly to a point where you just have an efficient system, which then scales better, of

00:20:19: course.

00:20:21: Yeah.

00:20:21: And I think what I heard there, and I just wanted to connect the dots because you've said a lot.

00:20:26: So you talked about cutting edge.

00:20:32: Well, I to connect the dots and I want to kind of just connect the dots to like why it matters and why I think what you've talked about is really important.

00:20:40: So cutting edge, you mentioned like access to these cutting edge models through the the closed systems because these companies, the truth of the matter is that they're investing

00:20:51: a lot of money in ultimately competing with each other to create the best quality output models.

00:20:59: And you buy

00:21:00: basically being part of that kind closed systems group or club, you get access to these like innovative kind of upfront cutting edge models because of the speed at which I think

00:21:13: the space of AI really changes.

00:21:16: If you're lagging, if you're still, you know, GPT 3 .5, which was released, I don't know, like a year and a half, like that's already old news because there's such significant in

00:21:26: just that timeframe.

00:21:27: There has been leaps and bounds made in the AI space that you have to kind of adapt or have a modular enough infrastructure to be able to accommodate newer models because you

00:21:41: can't get better output.

00:21:42: the quality of the output with these cutting edge models, they're already taking care of a lot of the complexity that you had to build with previous versions of models, which could

00:21:55: be the case with some of the

00:21:57: open source systems.

00:21:58: So I think you have to kind of understand that, do you want the cutting edge?

00:22:02: And you want to be at the forefront and then make sure that you're foolproofing or future proofing your solution.

00:22:07: That's important to you.

00:22:08: Not always the case, but I think that's why it's really important to pay attention.

00:22:14: Especially if you're in that kind of like truly innovation driven type of an organization where kind of having the cutting edge or the best of the best really, really matters.

00:22:27: You also talked about inference speed.

00:22:30: I think that depends on the use case that you have.

00:22:36: Inference speed is important to satisfy, I think I want to say user expectations.

00:22:41: And because we live in the digital world, it's not natural for us to wait multiple seconds for a model to come up with some answer.

00:22:51: And a lot of these

00:22:53: CHI GPT and things that users have already had access to.

00:22:57: It's like sub -second type of inference.

00:23:00: Like you get an immediate response.

00:23:01: So you have some interaction that users are expecting that as like a standard type of an experience.

00:23:06: So that's why inference speed matters.

00:23:09: And you have to kind of match the compute and kind of the servers kind of to support some of these models to satisfy these new standards of experiences.

00:23:18: So that's why I think having that inference speed consideration is really important.

00:23:24: But you also talked about size, which also could relate to costs as well.

00:23:29: then also something that we haven't talked about is, or maybe we spoke in the previous podcast, size.

00:23:38: The biggest and the baddest model is not always the most appropriate for your use case.

00:23:45: yeah, I think that size is a

00:23:47: big consideration and I think that there's some really capable models.

00:23:50: There are small size.

00:23:51: I know that you have a few that you've used in the past before.

00:23:54: So I don't know if you can share with the listeners.

00:23:56: the top ones that you recommend potentially exploring.

00:24:00: Yeah.

00:24:02: So, we, I, I differentiate between text only and, and, and like really vision output.

00:24:11: there's still no proper way to, to use

00:24:15: five, three vision locally, but you can use it on the cloud providers.

00:24:22: And it's like, it's small compared to what it can do and it delivers pretty good image analysis.

00:24:31: So that's definitely something to take a look at.

00:24:34: If you need vision capabilities on the open source side, which are actually like not slow, say like.

00:24:43: Pass would be the wrong word, but it's not so.

00:24:47: And yeah, like I repeatedly told, like I think also in other episodes, I really like Lama 3 .1, 8 billion.

00:24:56: And I even use a quantized version, which is kind of a bit like reduced version.

00:25:04: It's more of a technical thing, but they reduce basically the size of the weights.

00:25:10: So they lose a bit

00:25:15: like precision, but they are lot more handable.

00:25:23: hope that's word.

00:25:26: Easier to handle, maybe?

00:25:27: Easier to handle, yeah.

00:25:30: I think I took a German word and just threw it Converted it into English.

00:25:36: So yeah, and that's my way to go to local models at the moment.

00:25:43: are Lama 3 .1 and 5 vision tasks.

00:25:52: There's so much more out there, but I tend to always fall back to them at the moment.

00:26:00: Yeah.

00:26:01: And I think one of the other things that I wanted to kind of mention is, I think you've previously tested Haiku and Mistral as well.

00:26:08: So would you consider them as kind of like good backup options?

00:26:12: Yeah, so Mistrills, especially the Mixedrill version, which is a mixture of experts, is also highly capable.

00:26:21: That said, would I choose it over Llama 3 .170 billion?

00:26:27: Depends, because then you go in a territory where you have to test it out and you have to see which one just fits your needs more.

00:26:37: think training is also something, if you have a mixture of experts model,

00:26:41: I'm not sure, but I guess it's a bit harder to train than just like a single expert, like a single model.

00:26:50: Yeah, try it out.

00:26:52: So there is a mixture.

00:26:55: There is also, for example, the solutions of Databricks, which are more suited to data analytics and putting out SQL statements if you need stuff like that.

00:27:06: And that's the beauty of open source.

00:27:08: There is a lot of specialized stuff there.

00:27:11: You have models that you have also like pre trained or pre fine tune models.

00:27:16: For example, you have llama threes, seven, eight billion, example.

00:27:21: And then you have another version, is llama, llama three, eight billion optimized for coding.

00:27:28: And, there is something else optimized for math.

00:27:32: So, you, you, get a lot of community.

00:27:35: That's the beauty of open source.

00:27:36: You get a lot of community input.

00:27:38: that flows back to the open source stuff.

00:27:40: So you not only have the one model, you can actually use a whole bunch of models, which the community does, and just open source, because it's an open source project, everyone's

00:27:51: contributing.

00:27:52: And you get a lot of versions of the same thing with different flavors.

00:27:58: And it actually makes a difference.

00:28:01: So maybe for your use case, there is already an open source model out there.

00:28:06: more suited than the base models are and can fit the task.

00:28:13: And that's the most important thing.

00:28:15: Yeah.

00:28:15: And I think you're talking, I guess you can explore them either on the models page.

00:28:21: Hugging face has them a lot available.

00:28:23: Yeah.

00:28:24: Hugging face would be the go -to spot to explore them.

00:28:27: think there's another platform.

00:28:28: Is it

00:28:33: that makes if you want to explore the quality without doing all of the development and finessing to really, truly explore the capabilities, I think Po website, like Po platform

00:28:47: makes the playground for different models.

00:28:51: Isn't that even like a Chinese model basically?

00:28:56: But I'm not sure.

00:28:58: Yeah.

00:28:58: I think you could switch out the models.

00:29:02: and just basically play around with the output.

00:29:04: So before you even like explore which model you want to use if open sources is really kind of your path, but you can go to post websites and then you can go through the dropdown and

00:29:13: explore some of those use cases before you even get into development.

00:29:19: We talk about a lot, maybe in the future also, like the purpose of prototyping and that really gets you to...

00:29:29: test the solution, but there's different ways to do it and validate aligning the right solution to your use case and then validating whether it's even worth investing into.

00:29:41: But I think maybe in the future, this is something that both Edgar and I really focus on, building, prototyping, de -risking solutions.

00:29:51: So maybe in the future, if anyone's interested in us exploring that path.

00:29:55: for how you could actually validate concepts quickly before ever investing further.

00:29:59: I think we have some really good topics to cover there.

00:30:04: Yeah, definitely.

00:30:05: So we have a great focus on actually doing useful stuff with the AI.

00:30:12: We do useful stuff just to emphasize.

00:30:16: yeah.

00:30:17: For me, it's a bit personal for me because I hate when people talk shit.

00:30:24: Sorry, when people talk bad about AI and its usefulness, you see I get emotional.

00:30:34: Because it is helpful, it is useful, and we love to bring you the content that emphasizes that there is ways and stuff that you can know, and if you know it, you can apply it and

00:30:48: actually get value from it.

00:30:50: So yeah, that's cool.

00:30:53: a call for practical topics.

00:30:56: So we'd love to do more.

00:30:57: And we do this at the end of our episodes.

00:31:01: We'd like to understand what kind of use cases, maybe what questions you have that you'd like some practical guidance.

00:31:09: Both Edgar and I come from practical experience of actually building these solutions.

00:31:14: And we'd like to deliver more helpful content to this audience.

00:31:18: So always love to hear and get your feedback.

00:31:21: And so there's any topics that are sitting on top of mind for you that you want us to do a deep dive on, please let us know in the comments.

00:31:28: Yeah, definitely.

00:31:30: I would love to just add one side note.

00:31:34: Yeah.

00:31:35: We talked a bit about training models and stuff and what you need for training models is data.

00:31:39: There is one part and this is basically only possible with open source is the generation of synthetic data for your training.

00:31:51: And there is a big llama model out now.

00:31:54: There is something from Mistral and NVIDIA, which is called Nemo, which are models focused not solely, but also on delivering synthetic data to actually train and fine tune your

00:32:08: stuff.

00:32:09: So that's something you are even not allowed to do with closed source models most of the time.

00:32:17: And like OpenAI, for example,

00:32:20: prohibits you from doing that by their guidelines.

00:32:23: Of course you can do it, they might close your API for that.

00:32:29: yeah.

00:32:31: It probably means that you shouldn't do it.

00:32:33: Yeah, you shouldn't do it.

00:32:34: In their agreement, which we didn't read and just accepted, it says, don't do it.

00:32:41: So that's something to also consider because, yeah, it gets.

00:32:47: more and more important every day, honestly.

00:32:51: And it's mostly only possible with over -sourced models.

00:32:57: Yeah.

00:32:58: So I think a big takeaway for me is don't talk bad about AI in front of Edgar because he will get emotional and he will get upset.

00:33:07: But I hope you found this episode useful, helpful.

00:33:11: Please don't forget to subscribe, like, and comment on this video.

00:33:15: And anything else?

00:33:17: posing comments from you?

00:33:20: Yeah, embrace the tool and the right tool for the job to be honest.

00:33:27: It's easier said than done, of course, but that's why we do this.

00:33:30: So yeah, if you have any questions, if you have anything on the episode you need more information on, let us know down in the comments.

00:33:38: DM us on LinkedIn.

00:33:40: We are free to, well, open to every feedback and love to hear all your questions.

Show notes

Show transcript

New comment