Other-ai

Bright Data Launches Web Data Infrastructure for AI

Bright Data, a web data collection platform, says the data suggests there's far more data out there than previously thought, with 97% of AI organizations relying on real-time web data infrastructure.

A robotic dog navigates an indoor setting amidst red chairs, showcasing technology in modern environments.

Photo: Vladimir Srajber / Pexels

Bright Data, a web data collection platform, is addressing the growing need for real-time, structured data to power AI systems. The company highlights that AI models must now access fresh, relevant, and trustworthy data to deliver accurate and contextually appropriate outputs. Traditional methods of data collection, which rely on static snapshots, are no longer sufficient for today’s dynamic environments. The company argues that enterprises must develop infrastructure capable of handling millions of simultaneous interactions across diverse websites, languages, and access rules to stay competitive. This infrastructure is essential for maintaining the trust and effectiveness of AI systems in business settings. Source: mittr

According to Bright Data CEO Or Lenchner, the challenge lies in retrieving real-time information, which is crucial for grounding AI outputs in current and verifiable data. He notes that without this capability, AI systems risk delivering stale or inaccurate results, leading to poor business decisions and customer dissatisfaction. The company emphasizes that AI performance is increasingly dependent not just on model architecture but also on the system’s ability to quickly and reliably retrieve data. This includes handling fluctuations in competitor pricing, consumer sentiment, and market trends, which require a constant feed of new information. Lenchner also points out that many AI systems still struggle to deliver current, contextually relevant, and trustworthy outputs in operational settings, despite the use of retrieval-augmented generation (RAG). Source: mittr

The source explains that the next frontier in AI may depend on a new web data infrastructure layer that can enable models to discover and map the ever-expanding digital realm. This layer must be able to navigate hundreds of millions of existing web domains and billions of new URLs created each week, delivering real-time information and overcoming technical barriers. The infrastructure must also emulate human browsing behavior to access content on websites that use JavaScript or aggressive antibot software. Lenchner describes the challenge as one of scaling and latency, noting that platforms must mimic web users with identifying information, such as IP addresses and location, across millions of websites. Source: mittr

Source: mittr

Key points

Bright Data CEO Or Lenchner says the data suggests there's far more data out there than previously thought.
97% of AI organizations depend on real-time web data infrastructure.
AI performance increasingly depends on a system’s ability to quickly and reliably retrieve fresh, relevant, and trustworthy data.
Many AI systems still struggle to deliver current, contextually relevant, and trustworthy outputs in operational settings.
A new web data infrastructure layer must be able to navigate hundreds of millions of existing web domains and billions of new URLs created each week.
Platforms must emulate human browsing behavior to access content on websites that use JavaScript or aggressive antibot software.
Bright Data says doing this in-house becomes a full-time engineering problem that competes with the actual AI work.

Source: MIT Technology Review Read the original →

WRITTEN BY

Priya Anand

Emerging AI & Applications

Priya covers emerging AI applications and the wider impact of AI across industries.