A Hugging Face Space now allows users to create a 3D figurine from a single photo without using any traditional creative software. The process relies on two model endpoints, each handling a distinct step in the pipeline. The first step uses the black-forest-labs/FLUX.2-dev model to generate a stylized portrait from a reference image and a prompt. The second step uses the microsoft/TRELLIS.2 model to convert that portrait into a textured 3D mesh. The entire process is automated, with no manual editing required between steps. Users can upload their own photo and run the pipeline using their Hugging Face token, which allows the GPU work to run on their own quota. The resulting 3D model is displayed as a spinning .glb file, providing a real-time preview of the final output. The studio also includes a collection of famous artworks, each generated using the same two models with different prompts. This approach eliminates the need for complex creative software and streamlines the process into a series of automated model calls. The result is a lightweight application that acts as a thin wrapper around two model endpoints, making the entire process accessible to users without prior technical expertise. Source: huggingface
The system is built on the principle of treating multimedia models as callable building blocks, allowing agents to chain multiple models together without integration code. Each Gradio Space on the Hub exposes an agents.md file that outlines the API schema, call endpoints, file upload methods, and authentication details. This contract enables agents to interact with the Space without needing to manage weights, provision GPUs, or use client libraries. The process is simplified to a series of HTTP calls, with the output of one Space becoming the input for the next. This approach reduces the complexity of the creative stack from a series of applications to a sequence of model endpoints. The result is a more efficient and accessible workflow, where the main task is to describe what is needed, and the agent handles the rest. Source: huggingface
The shift from traditional creative tools to model endpoints represents a fundamental change in how multimedia applications are built and used. Previously, creating a stylized 3D figurine required a pipeline of heavy creative software, each with its own file format and expertise. Now, the same result can be achieved through two HTTP calls, each handled by a state-of-the-art model. This change is made possible by the availability of open weights and the standardized agents.md contract, which allows models from different labs to be chained together with minimal integration code. The agent does not need to manage the underlying models; it simply reads the contract and executes the calls. This approach lowers the marginal cost of building a multimedia application, making it more accessible to developers and end users alike. Source: huggingface