FIBO: The First JSON-Native, Open-Source Text-to-Image Model Built for Real-World Control and Accuracy

FIBO: The First JSON-Native, Open-Source Text-to-Image Model Built for Real-World Control and Accuracy

The world of generative AI has evolved rapidly with text-to-image tools enabling creators, marketers, designers and enterprises to bring ideas to life with unprecedented ease. However, most existing models have a clear limitation: they prioritize imagination at the cost of control. Whether producing inconsistent styles, unpredictable lighting or drifting away from user prompts, traditional models … Read more

olmOCR: Redefining Document Understanding with Vision-Language Models

olmOCR: Redefining Document Understanding with Vision-Language Models

The digital era has seen an explosion in the amount of information stored in PDFs, scanned documents and image-based files. From research papers and corporate reports to handwritten notes and invoices, these unstructured sources hold trillions of valuable data points. Yet, extracting and converting this data into structured, machine-readable text has long been a challenge. … Read more

DeepSeek-V3: Pioneering Large-Scale AI Efficiency and Open Innovation

DeepSeek-V3: Pioneering Large-Scale AI Efficiency and Open Innovation

The field of artificial intelligence has entered a transformative phase – one defined by scale, specialization and accessibility. As the demand for larger and more capable language models grows, the challenge lies not only in achieving state-of-the-art performance but also in doing so efficiently and sustainably. DeepSeek-AI’s latest release, DeepSeek-V3 redefines what is possible at … Read more

LongCat-Video: Meituan’s Groundbreaking Step Toward Efficient Long Video Generation with AI

LongCat-Video: Meituan’s Groundbreaking Step Toward Efficient Long Video Generation with AI

In the rapidly advancing field of generative AI, the ability to create realistic, coherent, and high-quality videos from text or images has become one of the most sought-after goals. Meituan, one of the leading technology innovators in China, has made a remarkable stride in this domain with its latest open-source model — LongCat-Video. Designed as … Read more

Agent Lightning By Microsoft: Reinforcement Learning Framework to Train Any AI Agent

Agent Lightning By Microsoft: Reinforcement Learning Framework to Train Any AI Agent

Artificial Intelligence (AI) is rapidly moving from static models to intelligent agents capable of reasoning, adapting, and performing complex, real-world tasks. However, training these agents effectively remains a major challenge. Most frameworks today tightly couple the agent’s logic with training processes making it hard to scale or transfer across use cases. Enter Agent Lightning, a … Read more

Qwen3-VL-8B-Instruct — The Next Generation of Vision-Language Intelligence by Qwen

Qwen3-VL-8B-Instruct — The Next Generation of Vision-Language Intelligence by Qwen

In the rapidly evolving landscape of multimodal AI, Qwen3-VL-8B-Instruct stands out as a groundbreaking leap forward. Developed by Qwen, this model represents the most advanced vision-language (VL) system in the Qwen series to date. As artificial intelligence continues to bridge the gap between text and vision, Qwen3-VL-8B-Instruct emerges as a powerful engine capable of comprehending … Read more

AgentFly: The Future of Reinforcement Learning for Intelligent Language Model Agents

AgentFly: The Future of Reinforcement Learning for Intelligent Language Model Agents

AgentFly is a cutting-edge framework developed by researchers at the Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) to revolutionize how large language models (LLMs) learn and act. It combines the power of reinforcement learning (RL) with language model agents enabling them to go beyond static prompt responses and learn through real-time feedback and experience. … Read more

The Art of Scaling Reinforcement Learning Compute for LLMs: Top Insights from Meta, UT Austin & Harvard University

The Art of Scaling Reinforcement Learning Compute for LLMs: Top Insights from Meta, UT Austin and Harvard University

As Large Language Models (LLMs) continue to redefine artificial intelligence, a new research breakthrough has emerged from Meta, The University of Texas at Austin, University College London, UC Berkeley, Harvard University and Periodic Labs. Their paper, titled “The Art of Scaling Reinforcement Learning Compute for LLMs,” introduces a transformative framework for understanding how reinforcement learning … Read more

Wan 2.1: Alibaba’s Open-Source Revolution in Video Generation

Wan 2.1: Alibaba’s Open-Source Revolution in Video Generation

The landscape of artificial intelligence has been evolving rapidly, especially in the domain of video generation. Since OpenAI unveiled Sora in 2024, the world has witnessed an explosive surge in research and innovation within generative AI. However, most of these cutting-edge tools remained closed-source limiting transparency and accessibility. Recognizing this gap, Alibaba Group introduced Wan, … Read more

PaddleOCR-VL: Redefining Multilingual Document Parsing with a 0.9B Vision-Language Model

PaddleOCR-VL: Redefining Multilingual Document Parsing with a 0.9B Vision-Language Model

In an era where information is predominantly digital, the ability to extract, interpret and organize data from documents is crucial. From invoices and research papers to multilingual contracts and handwritten notes, document parsing stands at the intersection of vision and language. Traditional Optical Character Recognition (OCR) systems have made impressive strides but they often fall … Read more

Agentic Entropy-Balanced Policy Optimization (AEPO): Balancing Exploration and Stability in Reinforcement Learning for Web Agents

Agentic Entropy-Balanced Policy Optimization (AEPO): Balancing Exploration and Stability in Reinforcement Learning for Web Agents

AEPO (Agentic Entropy-Balanced Policy Optimization) represents a major advancement in the evolution of Agentic Reinforcement Learning (RL). As large language models (LLMs) increasingly act as autonomous web agents – searching, reasoning and interacting with tools – the need for balanced exploration and stability has become crucial. Traditional RL methods often rely heavily on entropy to … Read more