VOOZH about

URL: https://dev.to/future_x/futurex-physical-ai-daily-issue-31-0618-632

⇱ FutureX · Physical AI Daily — Issue 31 (06/18) - DEV Community


Today's Highlights

· Robotaxi global expansion advances on three fronts in a single day: WeRide × Uber launches in Zurich (Europe's second city in two weeks, after Madrid); Stellantis × Wayve × Uber sign a global L4 robotaxi cooperation MoU; Uber × Lucid × Nuro designate Houston as the next city, targeting mid-2027.

· Faraday Future (controlled by Jia Yueting, Chinese EV entrepreneur) unveils a "full-form" embodied robotics lineup across six series, including the $1,990 FX Navi education robot and a new Futurist humanoid, targeting home and K–12 education ecosystems; FFAI shares rose on the news.

· World models continue to attract capital and ship products: Physis (逆矩阵, Chinese world-model startup), founded by Peking University entrepreneur Chen Boyuan (born 2004), closes a seed++ round exceeding $100 million USD, with Matrix Partners China, Wuyuan Capital, BAI Capital, and Ant Group strategic investment participating; on the same day, Alibaba launches real-time interactive world model "HappyOyster 1.0" and AutoNavi releases DreamX-World 1.0.

· Mifeng Technology (觅蜂科技, physical AI data platform spun off from Zhiyuan Robotics) raises a further hundred-million-RMB-range Angel+ round led by Guofang Capital, continuing the independent-incubation playbook built on the thesis that "data is the differentiator"; according to OFweek, cumulative embodied-AI funding in China from January to May has reached approximately 96.6 billion RMB.

· On the research side: Alibaba releases the Qwen-Robot technical reports (RobotManip / RobotNav); ACE-Ego-0 unifies human and robot egocentric data for VLA pre-training and open-sources the model (HF↑39).

I. Research

ACE-Ego-0: Unifying Human Egocentric Video and Robot Trajectories for VLA Pre-training · vla

VLA training suffers from expensive, scarce real-robot trajectories, while internet-scale first-person human video offers ready-made "supplementary supervision." The contribution here is genuinely combining two heterogeneous data types — differing in action space, embodiment structure, temporal dynamics, and annotation quality — into a single pre-training framework rather than training them separately, with an open-source release accompanying the paper.

Hao Li et al. (ACE Robotics × CUHK) · arXiv 2606.17200 source · HF↑39

The team builds a scalable "egocentric video → action" pipeline that converts raw human video into pseudo-action trajectories in robot format, then uses unified representations to align action labels from both human and robot data to a comparable scale for joint pre-training. The companion open-source model ACE-Ego was released on the same day jointly by ACE Robotics and CUHK.

Alibaba Qwen-Robot Technical Reports: Using "Alignment" to Unlock Scalable Robot Foundation Models · vla

Following yesterday's Tongyi "hands-feet-brain" triple release, Alibaba now fills in the methodological details. The core argument: manipulation data is inherently heterogeneous, costly to collect, and narrow in diversity — simply stacking data causes conflicts; alignment across representation, motion, and behavior must come first before multi-source large-scale training "adds rather than cancels" — a key test of whether the formula behind language and multimodal foundation models transfers to robotics.

Haoqi Yuan et al. (Alibaba Tongyi) · arXiv 2606.17846 source (RobotManip) / 2606.18112 (RobotNav) · Commentary: Feynman Bits source (WeChat, CN)

RobotManip, built on Qwen-VL, proposes a unified alignment framework across three dimensions — representation, motion, and behavior — enabling multi-source manipulation data to be jointly trained without mutual interference. RobotNav targets "agent-style navigation systems," providing a scalable navigation backbone whose observation strategy is externally reconfigurable at inference time: instruction following, object search, object tracking, and autonomous driving all share the same perception-planning backbone, but consume visual streams differently; robustness is achieved by randomizing task modes and observation parameters (token budget, per-camera weights) during training. According to a WeChat analysis, Qwen-VLA is effectively a ~5B unified-weight model — roughly a 4B Qwen3 VL backbone with an approximately 1.15B DiT flow-matching action decoder attached.

MuseVLA: Equipping VLAs with On-Demand Multimodal Sensing · vla

Most VLAs consume only RGB and are blind to physical quantities — temperature, sound, radar response — that RGB cannot capture. This paper models the choice of "which sensor to activate and what to attend to" as a tool-call-like action, letting the model decide when to "open a third eye." The approach is more scalable than indiscriminately stacking sensors.

Xingyuming Liu et al. (Peking University / Microsoft et al.) · arXiv 2606.17598 source · Commentary: Non-Embodied Non-Intelligent source (WeChat, CN)

Given a task instruction and visual context, MuseVLA first generates a "sensor token + target description" — equivalent to a parameterized tool call — that decides which sensing modality to invoke and what to attend to; the selected sensor's measurement is then converted into a unified intermediate "sensor image" fed back into the policy. This effectively wires infrared, audio, radar, and other modalities into the manipulation loop as on-demand inputs conditionable on language, beyond RGB.

EgoInfinity: Automatically Converting Arbitrary Web Video into 4D Hand-Object Interaction Data for Robot Learning · manipulation

Internet video is the largest "reservoir" of human manipulation knowledge, but turning arbitrary RGB clips into trainable robot data has been a persistent bottleneck. EgoInfinity is not another static dataset; it is a continuously operating "engine" for producing data — a higher-leverage contribution for open-world manipulation learning, which has long been bottlenecked by data scarcity.

Gaotian Wang et al. · arXiv 2606.17385 source

EgoInfinity is a modular 4D hand-object interaction data engine that chains perception, segmentation, reconstruction, interaction-aware refinement, and retargeting, converting web video into "arbitrary-viewpoint robot retargeting + video-to-action" training data without any human-in-the-loop annotation. Its modular design also lets it benefit continuously from upstream advances in component models.

Uncertainty Quantification for Flow-Based VLAs: Teaching Policies to Know When They Might Be Wrong · vla

VLA action heads trained with flow matching perform strongly but have almost no mechanism to express "I'm not confident about this step." In non-stationary environments outside the pre-training distribution, the model can fail without warning. This paper provides a deployment-ready failure prediction method.

Ralf Römer et al. · arXiv 2606.18043 source

The authors derive an efficient method for estimating epistemic uncertainty: measuring velocity-field disagreement (VFD) across a small ensemble and using it for failure detection and unreliable-action identification. Compared with adding a heavy Bayesian head to a flow model, VFD incurs low computational overhead and is suitable for real-time control loops as a gate for "should I trust this action step?"

Looped World Models: Parameter-Shared Recurrent Transformers that Shrink World Models by 100× · world-model

World models face a fundamental tension: faithful long-horizon simulation demands deep computation, but deeper models are expensive to deploy and prone to error accumulation. LoopWM proposes treating "iterative latent depth" as a new scaling axis orthogonal to "building bigger models / adding more data" — a paradigm option for world modeling worth watching.

Hongyuan Adam Lu et al. · arXiv 2606.18208 source · HF↑5 · Commentary: AI Miaomaofang source (WeChat, CN)

LoopWM is the first recurrent architecture for world modeling: a single parameter-shared Transformer block iteratively refines latent environment states, adaptively scaling "computational depth" to the complexity of each prediction step. It reportedly achieves up to ~100× parameter efficiency compared with conventional approaches at equivalent quality, with spectral constraints ensuring stability across arbitrary rollout lengths.

EBench: Beyond Success Rate — Diagnostic Evaluation for General Mobile Manipulation Policies · benchmark

A single success-rate scalar hides the true capability profile of a policy. EBench decomposes evaluation along two groups of dimensions — capability and generalization — and benchmarks several leading general-purpose policies on a common scale, providing practical value for practitioners choosing between approaches.

Ning Gao et al. · arXiv 2606.18239 source

EBench contains 26 diverse and challenging manipulation tasks annotated across 5 capability dimensions and 4 generalization dimensions, and evaluates models including π₀, π₀.₅, XVLA, and InternVLA-A1. Key finding: models with similar success rates can have drastically different capability profiles — π₀.₅ leads on test success rate and train-test retention; InternVLA-A1 leads on mobile manipulation but collapses on dexterous tasks; XVLA's strong atomic-skill set barely overlaps with the others.

DexLink Hand: A 16-DOF Linkage-Driven Dexterous Hand at 320g and Under $400 · manipulation

Dexterity, compactness, and affordability have long been mutually exclusive — high DOF typically implies complex actuation and transmission that is hard to fit within a human-hand form factor. This hand pushes cost into the low-hundreds-of-dollars range with a form factor amenable to mass production, representing a genuine tooling dividend for dexterous manipulation data collection and scaled research.

Hao Wu et al. · arXiv 2606.17418 source

DexLink Hand integrates 20 joints and 16 independent actuators within a human-hand-sized structure, with all actuation, sensing, and transmission components fully embedded. It uses a hybrid planar-and-spatial linkage mechanism, weighs approximately 320g, and costs under $400 in total, aiming for human-level dexterity with high structural integration and affordability.

Other papers today: CAIP (contrastive visual pre-training that extracts human gestures from egocentric video as end-effector action proxies); ThinkingVLA (interleaved vision-language reasoning, unified autoregressive architecture for prediction + inverse dynamics); PearlVLA (relocating "deep thinking" into VLM latent space, balancing low-latency control with explicit planning); WAM-RL (online interactive reinforcement learning with a world-action model, co-evolving world model and policy); OmniDrive / DRIVE-CHOREO (LLM-orchestrated multi-agent driving world models, multi-view controllable video generation); VERITAS (generator-visual-verifier framework that guides and self-improves general policies at inference time); HumanoidArena (egocentric hierarchical whole-body learning benchmark testing the interface between high-level policy and low-level motion tracker); Damage Adaptation in Seconds (soft/metamaterial robots self-adapting to catastrophic damage within one minute).

Open Source · Tools · Benchmarks

· HRDX Dataset: A large-scale vector HD map dataset covering ~40 hours and 1,400 km of minimally overlapping road segments, with 6-camera surround view + 128-line LiDAR + centimeter-level RTK, paired with precisely aligned aerial orthophotos, 10 categories of vector maps, and 20+ semantic/topological attributes — several times the scale of existing public HD-map datasets (arXiv 2606.17080 source).

· ERQA-Plus: A diagnostic benchmark for embodied AI reasoning, with 1,766 QA pairs anchored to 711 robot-centric images, structured across categories including perception, action, social interaction, navigation-environment, and commonsense consequences — specifically designed to separate "genuine embodied reasoning" from "lucky visual/language shortcuts" (arXiv 2606.17639 source).

· WireCraft: A simulation benchmark for industrial flexible cable (DLO — deformable linear object) manipulation, aligning simulated data, real-robot data, and a unified evaluation protocol — filling the gap left by existing benchmarks that are either hardware-locked or lack industrial fixtures (arXiv 2606.18097 source).

· AnnotateAnything: An annotation framework that automatically converts passive 3D assets into "operable" assets — vision-language reasoning infers object semantics and interaction constraints, then large-scale parallel physics annotation produces executable manipulation labels (arXiv 2606.17446 source).

· DeepInsight: A unified evaluation infrastructure spanning the full Physical AI stack, using a single runtime to host heterogeneous evaluations ranging from single-step foundation model decoding to thousands of physics ticks for whole-body control (a gap of more than three orders of magnitude), enabling cross-layer regression diagnostics (arXiv 2606.17574 source).

II. Funding & Deals

Physis (逆矩阵, Chinese world-model startup) | Seed++ | $100M+ · world-model

Co-investors include Matrix Partners China, Guanghe Ventures, Wuyuan Capital, BAI Capital, and Zhongding Capital, with strategic investment from Ant Group; existing investors Hillhouse Ventures and Peking University-affiliated Yanyuan Capital participated in an oversized follow-on. The company was founded by Peking University entrepreneur Chen Boyuan, born in 2004, betting on "general world foundation models" — AI that understands and predicts how the physical world operates — to serve as a cognitive engine for serious industrial, embodied, and physical simulation applications. Funds will primarily go toward pre-training R&D and scaled training infrastructure, with a flagship model planned for release by end of 2026; the founder states the window for this direction has narrowed to 18 months. Also on the same day: Alibaba's "HappyOyster" and AutoNavi's DreamX-World world model products launched (see Industry section), with both the capital and product tracks of world models continuing to accelerate in parallel.Source: BAI Capital source (WeChat, CN)

Mifeng Technology (觅蜂科技, spun off from Zhiyuan Robotics) | Angel+ | Several Hundred Million RMB · adjacent

Led by Guofang Capital, with follow-on from Futeng Capital, Shanghai Electric Science Fund, and Yuanqi Innovation; existing investors Junpu Intelligence and DCP VGC continued their oversized participation. The company was established approximately four months ago and has already closed two rounds (prior seed/angel round included Sequoia China, Baidu Ventures, and Yunfeng Fund). Mifeng is the independent physical AI data services platform spun off from Zhiyuan Robotics (Chinese humanoid robot maker) in February 2026, occupying the standalone data-supply position that opened up once the closed loop of "robot collects data → trains model → improves robot" was broken apart — continuing Zhiyuan's playbook of spinning off core business units into independently funded subsidiaries targeting sub-verticals. According to OFweek, cumulative embodied-AI funding in China from January to May has reached approximately 96.6 billion RMB.Source: Robot Foresight source (WeChat, CN)

Jijia Vision (极佳视界, Chinese world-model startup) | Series B2 | 1 Billion RMB · world-model

Raises a 1 billion RMB Series B2 round following a string of large financings within three months, and simultaneously launches its first home robot "Shuguang S1." Jijia Vision is pursuing a world model / video generation approach; this round and the new product land simultaneously, marking a step in its extension from foundational models toward consumer home products.Source: InfoQ source

Chengdu Humanoid Robot Innovation Center | New Round | 100M+ RMB · humanoid

Led by Orient Fortune Capital, with Guanghua Open Source and others following; Orient Fortune, an early investor, doubles down again. The round follows a recent strategic investment by a Shudao Group portfolio company and comes as the center has recently secured a major contract in the embodied intelligence space. The pattern of "scenario operator bets → capital follows and oversubscribes" is a typical trajectory for regional humanoid innovation centers that anchor themselves to deployment scenarios in exchange for funding.Source: Chengdu Release source (WeChat, CN)

Noitom Robotics (诺亦腾, Chinese motion-capture company) | Pre-A++ | Several Hundred Million RMB · adjacent

Oversubscribed by multiple leading industrial capital firms and market-rate institutions; investors include the Beijing Artificial Intelligence Industry Investment Fund, the Shanghai Artificial Intelligence Industry Series Fund, Shenzhen Capital Group, and Jianfa Emerging Investment. Noitom's strength is in motion capture, and it is positioning itself in the upstream embodied data collection segment — part of the same thematic thread as several other "no hardware, data/platform only" rounds closing today.Source: Alpha Commune source (WeChat, CN)

III. Commercialization & Deployment

WeRide × Uber Launch Robotaxi Service in Zurich — Europe's Second City · autonomy

Zurich has given the green light for robotaxi operations, with WeRide (Chinese autonomous driving company) and Uber announcing a commercial robotaxi service in the city — becoming Europe's second city in two weeks, after Madrid. The two companies have been operating fully driverless or public robotaxi services in Abu Dhabi, Dubai, Riyadh, and other Middle Eastern cities since late 2024, and are now replicating that experience in developed Western European markets. The announcement was simultaneously covered by more than ten financial news outlets on the same day, making it the one genuinely "live and operational" entry in today's global robotaxi expansion wave.Source: WeRide source (WeChat, CN)

Autonomique Semi-Humanoid Robot Enters Production Deployment at Tier-1 Automotive Supplier F&P · industrial

Canadian company Autonomique announces that its semi-humanoid robot + AI has advanced to production deployment at Tier-1 automotive parts supplier F&P Mfg. The company itself characterizes its focus as stable on-line productivity rather than acrobatic showpieces. Worth noting as a real-world industrial automotive Tier-1 deployment, though single-site production differs substantially from scaled capacity.Source: The Robot Report source

Sanctuary AI Extends Physical AI Strategy to Industrial Robots, Validates at Tier-1 Automotive Supplier · embodied

Sanctuary AI announces the extension of its physical AI capabilities from humanoids to industrial robots, and reports completing "production-ready" performance validation at a Tier-1 automotive supplier. Note that this constitutes capability validation/demonstration in a production-line environment, and remains a significant distance from mass-production installation and scaled delivery.Source: Business Wire source

Industrial Embodied Inspection Robot CASIVIBOT Ships in Volume to Luoyang · industrial

Reported as China's "first" industrial embodied quality-inspection robot, CASIVIBOT has entered volume delivery and is deployed in Luoyang for industrial quality-inspection applications. ⚠️ Vendor claim The "first" designation and the volume figure are single-party assertions; this can be read as a signal of embodied AI penetrating the inspection sub-vertical in China.Source: Sohu source

Raymond × Third Wave Automation: Physical AI Scaled Across Forklift Fleets · industrial

The Raymond Corporation, a major forklift manufacturer, partners with Third Wave Automation to roll physical AI out across Raymond's forklift fleet at scale. Warehouse logistics is one of the fastest segments for embodied/autonomous technology to deliver ROI, and a leading forklift OEM bringing in autonomous capability at fleet scale is meaningfully closer to true scale-out than isolated pilot installations.Source: Robotics & Automation News source

IV. Industry Developments

Faraday Future Unveils "Full-Form" Embodied Robot World, $1,990 Education Robot Enters Market · humanoid

Faraday Future (FFAI), controlled by Chinese entrepreneur Jia Yueting, rolls out a "full-form" EAI robot world spanning six product series at once, launching the all-new Futurist full-size humanoid and the FX Navi quadruped robot priced at $1,990, alongside a so-called "three-in-one" embodied robotics education ecosystem targeting schools and home users. FX Navi features 12 joint motors and uses an iOS/Android smartphone as its compute platform, paired with visual programming and STEM curriculum; the Futurist humanoid claims native support for NVIDIA whole-body motion control. Multiple Chinese media outlets simultaneously reported that FF received approximately $70 million in robotics-related financing; FFAI shares rose intraday following the news. ⚠️ Vendor claims Product specifications, the education ecosystem, and share-price movements are based on company announcements and intraday trading.Source: Business Wire source

Robotaxi Alliance Building: Stellantis × Wayve × Uber Team Up; Lucid × Nuro × Uber Target Houston · autonomy

On June 17, Stellantis, Wayve, and Uber announced a cooperation MoU for the development and deployment of global L4 autonomous robotaxi services: Stellantis contributes L4-ready vehicle platforms (with embedded sensor suites and safety redundancy designed for high-utilization driverless operations); Wayve contributes the AI driving software; Uber contributes its global mobility network. The goal is coverage across Europe, North America, and additional cities, building on Wayve and Uber's existing deployment plans across more than ten cities including London and Tokyo. On the same day, Uber partnered with Lucid and Nuro to designate Houston as the next robotaxi city, targeting mid-2027. Combined with the actual Zurich launch (see above), the "OEM + autonomous software + mobility platform" three-way bundle is emerging as the dominant organizational model for robotaxi scale-out.Source: Stellantis source

Alibaba "HappyOyster" and AutoNavi DreamX-World Launch Same Day, World Models Ship as Products · world-model

Alibaba Cloud launches HappyOyster 1.0, an open-world model product that constructs and enables real-time interaction with a generated environment — claimed to produce interactable digital worlds from a single sentence, capable of inferring "action → feedback" causal chains while maintaining long-range consistency in characters and environment. On the same day, AutoNavi (Alibaba's mapping and navigation subsidiary) releases DreamX-World 1.0, positioned as a general-purpose, multimodal, interactive video world model that integrates camera navigation, long-range scene memory, and composable event control. Continuing the world-model momentum since the recent BAAI Conference, today's focus shifts from "definitional debates at conferences" to large platforms shipping world models as usable products. ⚠️ Vendor claimsSource: AutoNavi Tech et al. source (WeChat, CN)

Dobot to Launch Next-Generation Companion AI Humanoid, "First Listed Cobot Maker" Enters Home Market · humanoid

Dobot (越疆, Chinese collaborative robot maker) previews an upcoming next-generation companion AI humanoid robot, bringing the "perception-reasoning-actuation" closed loop — previously validated in open commercial settings — into the home, and positioning itself as defining a new standard for home embodied intelligence. As the company billed as the "first listed collaborative robot maker" pivots from industrial arms to consumer companion humanoids, it adds another example of cobot manufacturers extending into home scenarios; claims such as "AI can now understand the physical world" reflect marketing language.Source: Sina Finance source

Unitree Opens First Asia Store: Robot Dogs Outsell Humanoids, Diverging Retail Strategies with Zhiyuan · humanoid

Unitree (宇树, Chinese robot maker) opens its first Asia store on a flagship commercial street in Shanghai, using the prime retail location as a brand showcase and direct sales channel — trading top-district foot traffic for consumer visibility. Zhiyuan Robotics (智元, Chinese humanoid robot maker), also on the same street, has chosen a quieter location on Caobao Road, focused primarily on receiving enterprise clients, research teams, and potential partners. On-the-ground feedback indicates robot dogs are selling better than humanoids; Unitree's biggest buyers remain institutional customers that want "one humanoid for display purposes." The two companies' contrasting site selections reflect divergent pre-IPO bets on the consumer-brand versus enterprise-sales pathways.Source: Kechuang Daily source (WeChat, CN)

Big-Tech Executive "Next Stop: Embodied AI" — Qwen Lead Lin Junyang Reportedly Enters the Space · embodied

Chinese media report that Lin Junyang, head of Alibaba's Qwen (Chinese large language model series) team, is joining an embodied intelligence startup valued at approximately 13.5 billion RMB, with the focus most likely on the "brain" model layer rather than physical hardware manufacturing. ⚠️ Single-source report The news has not been officially confirmed; the valuation and positioning are media characterizations. This can be read as one data point in the trend of large-tech executives moving into embodied AI, not a settled conclusion.Source: Guanwang Finance source (WeChat, CN)

Hardware · Supply Chain

· Star Dynamics XHAND 1 PRO (星动纪元, Chinese dexterous hand maker): Launches a 21-DOF fully direct-drive dexterous hand; an independent side-palm degree of freedom improves little-finger opposition precision; maximum finger spread angle 135°, maximum envelope grip diameter over 160mm, capable of grasping large objects such as beer mugs and basketballs. Star Dynamics is an early commercial representative of fully direct-drive five-finger dexterous hands. source (WeChat, CN)

· Boya Intelligence "Gaoshan S1" (伯牙智能, Chinese dexterous hand startup): Founded by a former Alibaba executive Liu Xin together with professors from USTC and Harbin Institute of Technology; headquarters signed into Suzhou AI Industrial Park with a new dexterous hand product launch, backed by a lead investment from Suzhou state capital. Dexterous hands account for approximately 14%–20% of a humanoid robot's total cost, making them one of the highest-value core components. source (WeChat, CN)

· Dexterous Hand Micro-Motor Localization in China: LinkHand 2.0 (灵心巧手, Chinese dexterous hand maker) has switched 60% of its micro-motors to supply from Zhaowei Electromechanical (兆威机电, Chinese micro-motor maker), reducing per-unit cost by approximately 1,800 RMB and cutting total hand cost by roughly 22%; miniature servo motors are becoming a new supply chain bottleneck. source (WeChat, CN)

· Dexterous Hand Demand and Capacity: Industry sources indicate demand for dexterous hands in China this year is up roughly 10× year-on-year, with total volume of approximately 200,000–300,000 units; high-DOF variants (16 DoF and above) account for roughly 10%. LinkHand, as a manufacturer producing thousands of units per month, plans to deliver 50,000–100,000 units in 2026 — already exceeding the approximately 18,000 humanoid robots shipped globally in all of 2025. source (WeChat, CN)