Sora Launches, Easing Concerns for Dream-short

Finance

Sora Launches, Easing Concerns for Dream

Advertisements

December 5, 2024
47 Comments

After ten long months of anticipation, the world finally witnessed the debut of Sora, OpenAI's newest video generation modelThis release comes after the previous sneak preview in February, during which only select artists, actors, and directors were granted access to trial versionsThe prolonged wait served to build excitement, and upon its official unveiling, Sora quickly overwhelmed servers with a flood of users eager to explore its capabilities.

Sora Turbo, as it has been aptly named, represents a significant leap from its predecessorIt now offers users the ability to generate high-definition videos at a resolution of 1080p, with a maximum duration of 20 secondsThis increase in video length is a notable advancement, especially when compared to the typical 5-10 seconds offered by many domestic AI video generatorsThe longer the video, the greater the demand for coherence, reduced repetition, and seamless transitions—a requirement that hinges on the sophistication of the model and the quality of the training data.

One of the most exciting new features introduced in this release is the editing functionality, which includes options like Remix, Re-cut, Storyboard, Loop, Blend, and Style presets

Advertisements

Previously, a significant limitation of AI-generated videos was the difficulty in making subsequent adjustments after the initial creationHowever, Sora addresses this challenge with a more versatile toolkit for video editing.

Let’s explore three of the standout new functions in more detailThe Remix feature allows users to replace or regenerate specific elements within a videoFor instance, after generating a clip where a character pushes open a library door, users can effortlessly swap that door for a French-style one, tailoring the content to their creative vision.

The Storyboard function serves as a powerful asset for video creators, allowing them to meticulously specify the content for each frame of the videoFor example, a user could dictate that the first 114 frames should depict, “a spaceship parked in a red background,” followed by a transition showing “an astronaut standing in the interior of the spaceship” from frames 114 to 324, and end with a close-up of the astronaut’s eyes, obscured by a knitted mask, in frames 324 to 440.

Additionally, the Blend feature enables the combination of two videos, as exemplified by Sora blending a video of falling snowflakes with another featuring blooming flowers

Advertisements

The resulting transition appears remarkably smooth, a functionality that has previously been absent from other AI video tools.

However, it’s essential to note that while these features are impressive, Sora is currently only available in select countriesRegions like the UK and mainland China are still awaiting access.

Regarding pricing, Sora is accessible to ChatGPT Plus and Pro subscribers, with Plus members allowed to generate 50 videos at a resolution of 480P each month, while Pro members enjoy unlimited access to slower generation rates.

In the months since Sora's announcement, other domestic competitors have been racing to catch upWith Sora's recent launch, several domestic AI video generation tools have been tested against it to analyze their relative performance.

When evaluating the duration of video generation and pricing, Sora clearly leads with its 20-second limit

Advertisements

Following closely is Runway, which can produce videos lasting up to 10 seconds, whereas many domestic products typically cater to durations between 5 to 6 secondsIt’s worth mentioning that while Daydream can generate videos of up to 6 minutes, this is achieved through a multi-step process involving character creation followed by storyboard editing, rather than continuous text-based video generation.

On the pricing front, domestic tools often allow for free trials, albeit with usage limits, whereas Sora requires payment from the get-go, starting at $20, with Runway being a slightly cheaper option at $15 per month.

Examining the specific functionalities and their respective output becomes imperativeA comprehensive evaluation approach involves understanding basic capabilities—such as how well each product comprehends text prompts, how clearly characters are depicted in motion, and accuracy with multiple subjects in the frame.

Initially, a prompt was constructed to assess the capability of generating a mid-shot scene: “In a sunset setting, two girls with long hair, one in a yellow dress and the other in a blue dress, both holding carrots, accompanied by three little rabbits that slowly run over to nibble the carrots, in a cinematic color treatment.” Surprisingly, Sora failed to recognize the number of subjects, producing only two rabbits

Other tools, conversely, fared better, successfully capturing all intended elements despite stylistic variations.

The second prompt shifted to focus on a close-up: “Under lights, a Chinese girl with black curly hair wearing a white dress holds a bouquet of pink flowers, first looking down at them before slowly lifting her head and smiling, in a cinematic color treatment.” This time, most tools executed the request well, understanding the nuances of the character and her actions, yet some struggled with the specifics of ethnic representation, showcasing variability in the understanding of what constitutes a “Chinese girl.”

The evaluation also explored the evolution of AI capabilities over the past half-yearFor example, DreamAI demonstrated an increase in character realism compared to five months ago.

In assessing advanced capabilities, particularly Sora's newly promoted features such as Remix, several industry professionals noted complexities in execution

alefox

A difficulty test involving Remix revealed glitches, with attempts to replace a rabbit with a puppy resulting in unintended outcomes like floating carrotsAlthough industry experts acknowledged the experimental nature of these tests, they suggested that the quality of final results is significantly influenced by prompt quality and stylistic choices.

To summarize, while Sora’s introduction has generated excitement, its capabilities do not yet invoke widespread fear among domestic creatorsMost evaluations suggest that Sora performs within expected parametersFang Jiarui, a financing executive at Shengshu Technology, noted that when comparing Sora’s latest performance to its February demo, there was no significant increase in realismMoreover, feedback from practical tests reveals that Sora struggles with extended sequences featuring complex movements, leading to inconsistencies when simulating physical interactions.

Regarding the innovative features introduced with Sora, professionals had mixed reactions

Wu Jieqian, CEO of Hanhua Technology, indicated that similar functionalities related to Storyboarding and Style presets are already present in other tools, and the Blend feature was initially seen in LumaJiang Shu, a senior researcher in the AI domain, voiced that some aspects of Sora's functionality appear exclusive, exhibiting a level of coherence in detail handling that is commendableThe way that transitions are naturally executed in Sora, particularly when showing an object from multiple perspectives, indicates an advanced understanding of three-dimensional contexts.

Overall, the product experience is another commendable highlight; Sora presents itself as an end-to-end solution for video creation, a step up from the previous text-only interface offered by ChatGPTThe introduction of innovative features signals OpenAI’s commitment to enhancing user experience, emphasizing seamless integration in the creative process

Sora Launches, Easing Concerns for Dream

Social Share

Post Comment