021 Industry Research Team
Congyuan Liu
Senior Analyst
Introduction
Recently, OpenAI's CHAT-GPT, a chatbot model, has been making waves, with Microsoft investing $100 billion at a valuation of $290 billion. All these developments are pointing towards a new paradigm in artificial intelligence models known as "Generative Models."
Previously, Decision-Based AI models relied on analyzing, judging, and predicting based on existing data, with typical applications such as intelligent recommendations (short videos) and autonomous driving. In contrast, Generative AI places a stronger emphasis on learning, induction, and creative deduction, ultimately generating entirely new content. This paradigm shift represents a significant boost in productivity and creativity. It has given birth to creative work in fields like marketing, design, architecture, and content creation. Furthermore, it's starting to find applications in areas such as life sciences, healthcare, manufacturing, materials science, media, entertainment, automotive, and aerospace, bringing substantial improvements in productivity across various sectors.
OpenAI and Microsoft
In 2019, Microsoft initiated a collaboration with OpenAI, and by 2021, they had already invested $1 billion in the partnership. Fast forward to 2023, and Microsoft has further solidified its commitment by injecting an additional $100 billion into OpenAI.
Microsoft's expanded partnership with OpenAI encompasses several key elements:
Supercomputing at Scale:
Microsoft is significantly ramping up its investments in cutting-edge supercomputing systems to accelerate OpenAI's groundbreaking strides in independent AI research. These resources will also bolster the development of Azure's AI infrastructure, facilitating global deployment of customized AI applications for customers.
New AI-Powered Experiences:
OpenAI models will be integrated into Microsoft's consumer and enterprise products, introducing novel digital experiences underpinned by OpenAI's technology. This includes Microsoft's Azure OpenAI services, empowering developers to build state-of-the-art AI applications by directly accessing OpenAI models.
Exclusive Cloud Provider:
Azure, as the exclusive cloud provider, will support all of OpenAI's workloads across research, products, and API services.
Furthermore, Microsoft plans to integrate ChatGPT into its search engine, Bing, to bolster its presence in the search engine market. Chat-GPT's capabilities will also be introduced into Office for generating and responding to text-based content.
Since 2016, Microsoft has been on a mission to develop Azure into a world-class AI supercomputer. The partnership between Microsoft and OpenAI is driving the forefront of cloud supercomputing technology. In 2020, they unveiled their first Top-5 supercomputer and have since scaled up the deployment of multiple AI supercomputing systems. OpenAI now utilizes these infrastructures for training its groundbreaking models, which have been deployed in Azure to support projects like GitHub Copilot, DALL·E2, and ChatGPT.
Source: Microsoft, Semafor, The Information
AI Models
AI models can be broadly categorized into two main types: Discriminant/Analytical AI and Generative AI.
Discriminant/Analytical AI:
Discriminant AI involves learning the conditional probability distribution within data and making analyses, judgments, and predictions based on existing information. This category of AI is primarily employed in applications such as recommendation systems, risk assessment systems, and decision-making for autonomous vehicles and robots. It is a mature technology with widespread applications that significantly enhance non-creative work efficiency. Specific applications include facial recognition, precision ad targeting, financial user scoring, and intelligent assistance in driving.
Generative AI:
Generative AI learns the joint probability distribution within data and, rather than just analyzing existing information, it focuses on inductive creative processes. It draws inspiration from historical data and can create entirely new content, even addressing discriminative problems. Generative AI has seen exponential growth since 2014, and recent developments have exploded exponentially. It finds applications in content creation, research, human-machine interaction, and various industrial sectors. Specific applications encompass text generation, image-to-text conversion, intelligent voiceovers for videos, intelligent poster generation, video special effects generation, code generation, voice-based human-machine interaction, and intelligent medical diagnostics.
Source: Learn OpenCV, Overseas Unicorns
In 2016, artificial intelligence technology had a comprehensive breakthrough, and Discriminant AI began to see extensive applications, including recommendation systems, computer vision, natural language processing, and more. The global AI market expanded from around $60 billion in 2016 to nearly $300 billion in 2021. With the support of technologies like recommendation systems, computer vision, and natural language processing, companies like Amazon, ByteDance, SenseTime, and Tesla experienced rapid growth.
Source: Sullivan
Generative AI
Gartner has positioned Generative AI as the most promising artificial intelligence technology for business. According to their 2022 AI Technology Maturity Curve, Generative AI is expected to reach a stage of production maturity within 2-5 years, offering tremendous development potential and application opportunities. By 2025, data generated by Generative AI is projected to account for 10% of all data, compared to less than 1% in 2021. Additionally, it's anticipated that 30% of outbound messages from large organizations will be generated by Generative AI, and 50% of drug discovery and development will involve Generative AI. Looking ahead to 2027, 30% of manufacturers will employ Generative AI to enhance product development efficiency.
An article published on Sequoia Capital's official website on September 19th, titled "Generative AI: A World Full of Creativity," mentions that Generative AI holds the potential to create trillions of dollars in economic value.
In a report titled "AI 2022: Breakthrough," Coatue suggests that the exponential breakthroughs in AI have rapidly expanded its application scenarios. In October 2022, StabilityAI secured $101 million in funding, reaching a valuation of $1 billion, with investors including Coatue, Lightspeed Venture Partners, and O'Shaughnessy Ventures. The company was founded by former British hedge fund manager Emad Mostaque in 2020. In the same month, Jasper raised $125 million and achieved a valuation of $15 billion, with investors including Coatue, Bessemer Venture Partners, and IVP, among others. In 2019, OpenAI received a $1 billion investment from Microsoft, and by 2021, OpenAI's valuation had reached $20 billion.
Numerous Generative AI companies have made it into the "Intelligent Applications 40" (IA40) list jointly published by Madrona, Goldman Sachs, Microsoft, Amazon Web Services, and PitchBook in 2022. The IA40 enlisted over 50 venture capitalists from more than 40 top venture capital and investment firms to nominate and vote for the top companies shaping the future of intelligent applications. These companies have collectively raised over $16 billion since their inception, with more than $5 billion raised this year. Among these, 14 Generative AI-related companies, including Runway, Jasper, and Copy.ai, comprise 35% of the list.
Source: Various, including Gartner, Sequoia Capital, Coatue, and IA40 Report
Technological Advances - Early Architecture, Models, Data, and Computational Power
Accumulation and Enhancement
Architectural Improvements
The learning capability of deep neural networks is directly correlated with the size of models. However, as the model scale increases, training becomes progressively more challenging. To overcome this, structural improvements have been made. The introduction of the highly parallelized Transformer architecture, for instance, has resulted in a remarkable increase in the number of parameters in deep neural networks. This number has risen from mere thousands in the early days to hundreds of billions in current models.
Model Development
The introduction of models such as GPT-3, CLIP, Diffusion, and DALL·E2 has greatly elevated AI's proficiency in handling tasks related to natural language processing, cross-modal tasks, and generative problem-solving.
Data Abundance
The more high-quality training data available, the more effectively algorithms can learn from it. With the advent of the digital age, tools and software for generating data have become increasingly common. This has led to exponential growth in both the quantity and quality of data available for AI training.
Computational Power Enhancement
Large-scale deep learning models have reached a point where their extensive parameters and data requirements necessitate a corresponding increase in computational power. Currently, the computational power needed for training large-scale models is 10 to 100 times greater than before.
Source: Information derived from "Compute Trends Across Three Eras of Machine Learning" and Google Scholar.
Key Models
Variational Autoencoder (VAE)
Introduced in 2013 by Diederik P. Kingma and Max Welling, a Variational Autoencoder (VAE) is a model that converts high-dimensional input into a probabilistic distribution in latent space through an encoder. This distribution is then sampled, and the samples are decoded into newly generated results. VAEs find applications in tasks such as image generation and speech synthesis. However, the generated images tend to be somewhat blurry.
Source: freeCodeCamp,Google Scholar
Generative Adversarial Networks (GAN)
In 2014, Ian J. Goodfellow and his colleagues introduced Generative Adversarial Networks (GANs). This model consists of a generator and a discriminator. Taking image generation as an example, the generator takes training noise as input to create images. The discriminator's role is to determine whether an image is real or generated by the generator. As training progresses, the generator's skills improve, and the discriminator can no longer distinguish between real and generated images. The generator is fixed to train the discriminator until it can accurately discern real images from fake ones. Afterward, the fixed discriminator is used to train the generator, and this cycle continues until a high-quality generator is obtained. GAN models are capable of generating images, 3D models, and even videos. However, they have limited control over the output, which can result in somewhat random results.
Source: freeCodeCamp,Google Scholar
Transformer
The Transformer, introduced by the Google team in 2017, utilizes a self-attention mechanism that assigns varying weights to different parts of input data based on their importance. It uses attention exclusively for feature extraction. The evolution of its network structure has led to an increase in the number of parameters and model depth, resulting in a significant improvement in the capabilities of generative AI technology. Its parallelization advantages enable training on larger datasets, contributing to the development of pre-trained models like GPT.
Source: Google Scholar
Visual Transformer (ViT)
In 2020, the Google team introduced Visual Transformer (ViT), which applies the Transformer architecture to the field of image classification. ViT divides input images into 16x16 patches, projecting each patch into fixed-length vectors that are then fed into a Transformer. Subsequent operations are similar to the original Transformer. By incorporating human prior knowledge into network design, ViT achieves faster convergence, lower computational cost, more diverse feature scales, and enhanced generalization capabilities. It excels at learning and encoding the knowledge embedded in data, making it a fundamental architectural choice in the field of computer vision. Visual models like ViT empower AI with the ability to perceive and understand visual data, enhancing AI's perceptual capabilities.
Source: Google Scholar
GPT (Generative Pre-trained Transformer)
Introduced by OpenAI in 2018, the original GPT had 117 million parameters and was trained on a dataset of approximately 5GB of text. It leveraged massive amounts of unlabeled text for pre-training, equipping the model with the ability to understand and generate text even in scenarios with limited or zero labeled data. This development significantly enhanced the cognitive capabilities of generative AI models.
In 2020, GPT-3 was unveiled with a staggering 175 billion parameters and a pre-training dataset of 45 terabytes. Beyond its proficiency in common NLP tasks like natural language inference, sentence relationship assessment, question-answering, common-sense reasoning, and classification, GPT-3 demonstrated exceptional performance in more challenging tasks such as article writing, SQL statement generation, and JavaScript code composition. This accomplishment earned it a spot in MIT Technology Review's list of "10 Breakthrough Technologies" for 2021.
Source:OpenAI,Google Scholar
CLIP (Contrastive Language-Image Pre-training)
Introduced by OpenAI in 2021, CLIP leverages self-supervised learning with text information to excel in visual tasks. Trained on a dataset comprising 4 billion "text-image" pairs, CLIP employs a Transformer model to model sequences of image patches. It maps raw data from diverse modalities into a unified or similar semantic space, enabling cross-modal understanding between different types of data. It has the ability to discern relationships between different modalities of data, making it capable of translating and generating data across different modalities. CLIP can generate textual descriptions based on images or produce corresponding images based on textual prompts. This significantly expands the scope of applications for generative AI technology and opens up new possibilities for Artificial General Intelligence (AGI).
Source:OpenAI,Google Scholar
Diffusion Model
The concept of diffusion models was initially introduced in 2015 in the paper "Deep Unsupervised Learning using Nonequilibrium Thermodynamics." In 2020, the "Denoising Diffusion Probabilistic Models" (DDPM) model was proposed for image generation. Diffusion models operate by introducing Gaussian noise to images during training to disrupt the training data. This process aims to find methods to reverse the noise and employs the learned denoising techniques to synthesize new images from random inputs. This algorithm has applications in generating molecular graphics for drug molecules and protein molecules.
Source:OpenAI,Google Scholar
DALL·E2
Introduced by OpenAI in April 2022, DALL·E2 builds upon CLIP to establish connections between text and images. By utilizing Diffusion, it generates images based on visual semantics, and it employs a prior model to map text semantics to corresponding visual semantics. Ultimately, DALL·E2 is capable of the following functions:
Generating images based on text descriptions.
Extending images beyond the canvas.
Editing images according to text descriptions, which enables the addition or removal of elements.
Creating variants of an image while preserving its original style based on a given text input.
Source: OpenAI, Lei Fengwang, Google Scholar, Tencent Technology, CITIC Construction Investment
Current State of ChatGPT Development in China
Several companies in China are actively pursuing developments in the ChatGPT and AIGC (Artificial Intelligence in General Conversation) domains. Notably, Baidu's motivation is quite clear – to defend its stronghold in the search business and establish a favorable position in the next-generation search engine market. Baidu's progress in the ChatGPT business is attributed to its significant investment in question-and-answer samples for search engine operations, ensuring an ample volume of samples. Other companies like JD.com, Alibaba, Pinduoduo, among others, have also initiated efforts in the field of intelligent customer service.
TikTok (ByteDance) is gradually entering the AIGC domain and applying it within its ecosystem. The company is transitioning from relying on user-generated content to shifting towards AIGC. Furthermore, some startups are making their presence felt, such as AI Utopia introduced by ListenAI, which features open dialogues similar to ChatGPT.
While many domestic companies are converging around the concepts of virtual agents and AIGC, there is currently no alternative to ChatGPT, and several technological bottlenecks persist. This is primarily due to the following four reasons:
Lack of foundational models domestically, leading to a lack of model iterations and accumulations. ChatGPT relies on InstructGPT, which boasts a 1:106 superiority over other models, including domestic ones.
Limited availability of real-world data within China. Except for Baidu, which has natural user search question-answer training samples, most other companies lack sufficient data.
Inadequate technological expertise within China. Various aspects of ChatGPT's development, such as data processing, cleaning, labeling, model training, and inference acceleration, pose technical challenges, significantly affecting the results. Additionally, even major domestic players have yet to find large-scale use cases for reinforcement learning frameworks.
The innovation landscape in China is still in its nascent stages. While the overall business environment is eager, the investment and returns will require some time to mature.