Probably, a startup aiming to enhance the reliability of large language models (LLMs), has raised $9 million in seed funding from Andreessen Horowitz. The company is working on a system designed to prevent hallucinations and factual errors from reaching users, aiming for 99.99% accuracy, a standard common in deterministic systems. According to founder Peter Elias, the project involves rethinking fundamental assumptions in AI engineering to achieve this level of precision. The company's first product is a data science tool that generates quick answers from complex datasets, with each result accompanied by citations and an audit trail. Elias described the system as a 'data science mech suit,' combining an LLM with a deterministic validator to ensure results align with the dataset. The validator checks the LLM's initial responses, and the model is trained against this system to optimize for speed and accuracy. Elias emphasized that better harness engineering allows for weaker models to perform effectively, reducing the need for high-end AI infrastructure. This approach enables the tool to run on smaller, local hardware, significantly cutting token costs associated with AI usage. The company's solution is particularly relevant as token costs rise and businesses reassess their AI budgets. Elias also noted that the same engine can be adapted for other precision-sensitive applications, such as accounting or medical services. He pointed out that major AI labs have not attempted this approach, possibly due to financial incentives tied to model corrections. The tool's ability to reduce ambiguity and reliance on high-end models represents a significant shift in AI engineering practices.

Probably’s data science tool is built on a model that is 'four classes weaker than the frontier models,' allowing it to operate on local hardware like a desktop computer. This reduces the token costs associated with AI use, making the solution more accessible for businesses facing rising expenses. The company’s approach could have broader implications beyond data science, as the same validation system can be extended to other precision-sensitive domains. Elias highlighted that the system’s effectiveness lies in its ability to minimize ambiguity, enabling the model to perform accurately without requiring high computational power. This method challenges the conventional reliance on large, high-end AI models, offering a more cost-effective and efficient alternative. The tool’s design reflects a growing trend in AI development, where accuracy and cost-efficiency are becoming increasingly important. As token costs continue to rise, solutions like Probably’s could play a crucial role in shaping the future of AI applications.

Probably’s approach to reducing AI errors involves a combination of a large language model and a deterministic validator system, ensuring results align with the dataset. The model is trained against the validator, optimizing for both speed and accuracy. Elias explained that refining the context sufficiently allows the model to perform correctly without requiring high computational resources. This method enables the tool to run on smaller, local hardware, significantly lowering costs. The company’s solution is particularly relevant as token costs rise, prompting businesses to reassess their AI budgets. Elias also noted that the same validation system can be adapted for other precision-sensitive use cases, such as accounting or medical services. He pointed out that major AI labs have not attempted this approach, possibly due to financial incentives tied to model corrections. The tool’s design reflects a growing trend in AI development, where accuracy and cost-efficiency are becoming increasingly important.

Source: techcrunch