Companies are investing heavily in AI projects as they see huge potential in generative AI. Consultancies have predicted opportunities to reduce costs and improve revenues through deploying generative AI – for example, McKinsey predicts that generative AI could add $2.6 to $4.4 trillion to global productivity. Yet at the same time, AI and analytics projects have historically struggled to get from proof of concept to production.
Gartner has found that only around 54% of AI projects have moved from testing into production. This means that a significant percentage of projects either don’t move forward or take longer to progress than is expected. These projects can also run into other problems around how they run in production and at scale. For example, Accenture estimates that only two percent of AI projects meet operational responsible AI guidelines in production.
For companies involved in generative AI pilots, it is worth looking at the hurdles to getting to production. Some of these issues can be anticipated and planned for, like having company data prepared for use with generative AI systems. But other challenges are only now coming to light as more companies go through their own pilot projects.
Vectors and Data Embeddings
Creating vector data for generative AI is one of the first steps required in any project. Without vectorized data, you cannot carry out some of the basic steps involved in generative AI projects such as retrieval augmented generation (RAG), which relies on searching for semantically similar data to provide additional context to your large language model, or LLM.
So, what kind of problems might you run into when creating your vector data set? For example, getting your data ready will involve processing that data to split the text or contents into meaningful chunks, followed by a round of transforming that data into numerical values called embeddings. This set of embeddings will be leveraged by generative AI processes like RAG. So, what will your chunking strategy be? What embedding model will you use to vectorize the data?
While you have these questions to answer, getting the preparation right is not the only challenge teams can face around embeddings. Every time someone wants to interact with your application, they will also need to create an embedding of their request to be used in the vector search. This process can be overlooked – while you might spend a huge amount of time and effort on getting your own data turned into vector embeddings for search, are you also looking at how your user requests come in and get turned into embeddings too?
Why is this so important? Unlike vector data that you have prepared in advance, your users’ requests around vector data embeddings must take place in real time. Any request that a user makes – whether through text or by sending in an image – will be raw, unstructured data that will need to be transformed, and then used in a vector search operation.
Supporting these operations means that you will have to implement a vector embedding process here as well. You can implement this yourself or use a hosted service. To make the right decision here, you must consider both developer productivity and the impact of latency on your transactions. Implementing your own request vector embedding process will require some developer time to build and support the service; however, this has the benefit of better performance compared to using a hosted service that would require a round-trip transaction to the service provider.
Where Should You Run Your Embedding Process?
You can run the transformation to create embeddings as part of the application workflow and data upload process, or you can delegate this task to your data platform. From a developer perspective, handling the transformation in the application and as part of the data upload process does allow you to be more granular in how your application handles that request and then passes the result on for the insert or search operation. However, this also takes up developer time to build and maintain the code to create embeddings.
Delegating this to your data platform to handle makes sense, as you will already be using that infrastructure to carry out any vector search operation. So having this embedding operation take place in your data platform should be a natural extension. At the same time, running this in your data platform also means that it won’t need your developers to step in and build the functionality themselves. Developers never need to see or be aware of the embeddings being generated, they just focus on providing the most relevant results for the end user.
The overall goal here around generative AI and data is that users should not feel any significant latency in the transaction, and developers should be able to focus on solving the customers’ problems. It should work in the same way as they are used to with other data searches, even though there might be significantly more processing work carried out with a generative AI search compared to a traditional search request. By moving the generative AI request – and the associated vector embedding transformation work – into the data platform layer, you can reduce the workload on your developer team and improve performance.
When it comes to delivering on expectations around generative AI, performance and developer productivity will be key criteria. Poor experience around latency or results returned will affect how likely users are to return to a service once they have had the chance to use it. Asking application developers to start worrying about creating and managing embedding will slow down feature velocity. By considering the user journey around generative AI – including how embeddings are created, used, and results returned to the user – you can increase your chance of successful delivery.