Hello, AI lovers!
Welcome to the 30th edition of “This Week's AI Engineering”!
This week, Alibaba launches an insane new video generation model, Openai converts ChatGpt into an interactive tutor, and this Chinese open source model is crushing all benchmarks
We'll also consider some radar tools that can be used to charge your development workflow.
Alibaba's new video generation model is best
Alibaba has released WAN 2.2the world's first open source video generation model uses a mixed architecture to provide movie quality video generation with 27B parameters, but only 14B active per step, providing professional video creation with access to consumer hardware.
what's new
- Innovative Moe Architecture: The first open source video model using specialized experts – High No Expert for layout planning and low No Expert for detailed improvements, Apache 2.0 license optimizes performance while maintaining computational efficiency.
- Enhanced training foundation: Large data improvements on +65.6% images and +83.2% video, compared to WAN 2.1, incorporate curated aesthetic data with detailed labels for lighting, configuration, contrast and tone, achieving film quality output.
- Dual Model Strategy: 27B MOE premium version with expert switching based on signal-to-noise ratio in addition to the 5B density model (TI2V-5B) for consumer-friendly deployment.
Benchmark domination
Consumer hardware excellence:
- Generate 5-second 720p video in under 9 minutes with single RTX 4090
- Supports both text-to-video and inter-image generation at 720p/24fps
- Runs efficiently on consumer GPUs using optimized memory usage
Commercial model competition:
- Achieves “top performance among all open and closed source models”
- Excellent results of WAN Bench 2.0 compared to major commercial alternatives
- Advanced WAN2.2-Vae with 16x16x4 compression ratio for optimal quality efficiency balance
Real World Applications
- Unified framework deployment: Provides both academic research and industrial applications with seamless integration, enabling everything from creative content creation to technical video integration research.
- Advanced Technology Architecture: The total compression ratio reaches 4 x 32 x 32 with patching, providing efficient video processing while maintaining high visual fidelity in a variety of use cases.
Better than other models
- The advantages of open source: Unlike Runway or Pika Labs' proprietary video generation tool, WAN 2.2 offers full transparency and customization capabilities without any usage restrictions or ongoing subscription costs.
- Hardware Accessibility: Innovative efficiency allows professional-grade video generation in consumer hardware, allowing democratization of video creation compared to cloud-dependent alternatives.
- Commercial viability: The Apache 2.0 license eliminates legal concerns for commercial applications and is ideal for businesses that require professional video generation without vendor dependencies.
This release places WAN 2.2 as the definitive open source alternative to its unique video generation model with significant cost benefits and enterprise-ready features.
ChatGpt has become your private tutor
Openai has started learning mode with ChatGptinteractive learning capabilities designed to provide direct answers and revolutionize education that drives AI with Socrates questions and personalized scaffolding, rather than guiding students in stages.
what's new
- Socrates Learning Approach: Use interactive prompts, hints, and self-reflections instead of direct responses to promote active participation and develop metacognition through educational principles supported by research developed with teachers and scientists.
- Wide range of availability: Expand now for free, Pro and team users have ChatGPT EDU availability in a few weeks, with simple toggling features for various learning goals during conversation.
- Personalized Educational Support: Based on rating questions and chat history, it adapts to the user's skill level and provides scaffolding answers with information divided into digestible sections and key topic connections.
Performance improvements
- Student success indicators: Users are described as “24/7 all-nursing hours” that breaks down complex materials into clear explanations and has the effect of successfully supporting challenging concepts through sustainable, patient tutoring.
Advanced learning features:
- Knowledge checking with quiz and open-ended questions
- Personalized feedback based on individual progress
- Cognitive load management for optimal learning retention
- Cultivate curiosity through guided discoveries
Real-world impact
- Integrated Educational ResearchFuture development includes partnerships with Stanford's scale initiative for long-term research on AI learning outcomes, focusing on clearer visualizations of complex concepts and goal setting throughout the conversation.
- Target Optimization: Designed primarily for university students who conduct continuous wide range of educational research for K-12 applications, ensuring an age-appropriate educational approach.
The launch places ChatGpt as the leading AI education platform, combining advanced AI capabilities with proven educational research for transformative learning experiences.
Create apps simply by talking to Microsoft's latest tools
Microsoft's Github Spark It was launched as an AI-driven tool for creating and sharing “microapps” without writing or deploying code. Following UNIXPhilosophy, personalize your software as easily as customizing your development environment through natural language interactions.
what's new
- Three-component architecture: An NL-based editor with interactive preview and revision variations, a managed runtime environment with deployment-free hosting and persistent data storage, plus a PWA-enabled dashboard for spark management and sharing with controlled permissions.
- Flexibility of model selection: Choose from Claude Sonnet 3.5, GPT-4O, O1-PREVIEW, or O1-MINI for a variety of creative approaches. Click one to save auto history and recover all revisions seamlessly for seamless iterations.
- Joint development:Sharks can be shared with read-only or Read-Write Permissions, allowing users to provide favorites or remix sharing sparks, and “semantic view sources” through a revision history that shows the creator's thought process.
Benchmark Performance
Development speed revolution:
- Live app display when entering natural language description
- 3-6 versions generated for search per request
- Automatic deployment using PWA features on desktop/mobile
- Built-in UI components with customizable themes
Successful use cases:
- Children's Allowance Tracker with LLM-generated Congratulations
- Custom Hacker New Client with Comment Thread Overview
- Karaoke Night Tracker with Guest Status Management
- Educational map app with city descriptions
- The World of Animated Vehicles (created by the age of 6)
Technical implementation
- Advanced Runtime Features: Managed Key Value Store with Visual Data Editor, integrated model prompts via GitHub models, and design systems with themes that eliminate the complexity of traditional deployments.
What's better than your competitors
- The Philosophy of Creating Zero Costs: By allowing anyone to build personalized software tools via natural language, reducing the cost of creating apps to zero and making computers powerfully customizable.
- UNIX Philosophy Applications: Apps that are tailored to individual needs and do one thing as convenient as they are as necessary, focus on reducing barriers to complexity for niche, short-lived or personal tools.
- Semantic Development ExperienceUnlike traditional no-code platforms, Spark allows development through natural conversations with automatic variant generation, allowing non-developers to access programming.
This technical preview represents a fundamental shift towards natural language programming, placing GitHub Spark as the future of accessible software development.
Runway's new tools revolutionize video editing within context
Runway has released Alepha state-of-the-art in-context video model that enables comprehensive video editing via simple text prompts or reference images, providing professional-grade visual effects without traditional production requirements.
what's new
- Multitasking Visual Generation: Comprehensive video editing features including camera controls (reverse shots, low angles, next shot generation), style conversion (aesthetic transfer, environment change, playback), and object manipulation (add/remove/replace elements to the appropriate lighting and shadows).
- Professional Quality Control: Enables scene manipulation through text editing (appearance, green screen extraction) and natural language descriptions, while maintaining proper lighting, shadows, reflections and viewpoint consistency.
- Flexible output options: Export using a variety of background options, including green screen, transparency, and solid color, with accurate creative controls and reference image support for professional integrated workflows.
Advanced editing features:
- Motion transfer from one video to a new first frame image
- Changes in environment (season, time, weather conditions)
- Retexture and complete exchange of objects (a car from horse to drawing tank)
- Changing the color using a sample or description prompt
Real World Applications
- Industry Use Cases: Creative projects with film production coverage generation and visual effects, content creation and transformation, post-production lighting modifications and deletion of elements, and even scene creation.
- Cost-effective production: Eliminates the need for re-shooting due to lighting and timing issues, reduces costly practical effects and makeup requirements, and provides unlimited creative flexibility for post-production.
What's better than your competitors
- Source Fidelity Maintenance: Unlike destructive editing tools, ALEPH allows for widespread modifications through AI-powered processing, while maintaining the quality of the original video.
- Natural Language Control: All editing achieved through simple textual descriptions eliminates complex software learning curves and technical barriers for creative professionals.
- Professional integration: Seamless compatibility with existing post-production workflows that provide enterprise-grade functionality without infrastructure changes.
The release positions runway Aleph as a definitive AI-powered video editing solution, combining unprecedented creative control with specialized production standards.
Tools and releases you need to know
Wix Adi (Artificial Design Intelligence) changes web design by automatically creating customized websites based on user input. Ask a series of questions about the purpose, preferences and content of the website you are looking for, create a fully functional site using AI in minutes, and make web development accessible to anyone. The automated design process is tailored to your needs and provides easy content integration with customization options for further improvements.
Apply Pie An AI-powered platform that makes mobile app development more accessible through no-code development for iOS, Android and web applications. While users with no programming skills will be able to create apps using the drag-and-drop interface, the bread and butter feature is a chatbot builder with ChatGPT. The platform offers AI-powered features such as voice recognition, cross-platform compatibility, and marketplace integration for enhancements.
Applitools Use Visual AI to automate web and mobile application testing to display and function as intended on a variety of devices and browsers. Compare the visual aspects of your application with baseline images to identify discrepancies that traditional testing methods may miss, streamlining automated visual testing, comprehensive test reports, seamless CI/CD pipeline integration and quality assurance.
And that brings together this issue.”AI engineering this week.“
Thank you for tuning it! Share this newsletter with fellow AI enthusiasts and follow more weekly updates.
Until next time, happy buildings!
