目录:
- ChatGPT Chronicle
- How We Missed the GPT Feast
- Can GPT Large Language Models Achieve AGI?
- Series Topic Preview

ChatGPT Chronicle
Let’s sort out a timeline. ChatGPT is a conversational UI + GPT-3.5 series model. We will sort it out to today based on the most representative papers, models, and APIs.
Before 2020
- June 2017, Google released the Transformer paper.
- June, July 2017, OpenAI released the reinforcement learning algorithm for human preference and the PPO algorithm, both of which are used by ChatGPT.
- June 2018, OpenAI released GPT-1.
- November 2018, Google released BERT, and the NLP field has mainly based on this framework to study downstream tasks since then.
- February 2019, OpenAI released GPT-2, OpenAI gained confidence and focused on GPT since then.
2020
- Early, Covid-19 broke out. China closed its doors.
- January, OpenAI released the Scaling Law of language models (concept: model ability is strongly related to parameter scale and data scale), OpenAI gained confidence in Scaling-up on data and parameter scale.
- May, GPT-3 paper released.
- June, GPT-3 API released.
- September, the key prototype algorithm related papers of ChatGPT were released.
- December, European institutions released open-source datasets for GPT-3 replication.
2021
- July, OpenAI released the Copilot prototype algorithm.
- August, Codex API released.
- November, GPT-3 API Public Release, not open to China.
- China closed its doors.
2022
- January, GPT-3.5 API (text-davinci-002) released, the model has been trained with Github code, and the reasoning ability has been significantly improved (the causal relationship of this assumption needs to be demonstrated by the academic community), after the blessing of Alignment technology, the ability to follow human instructions has been significantly improved, and the usefulness and harmlessness of the output results have been significantly improved.
- March, GPT-3.5 paper released, open Alignment algorithm.
- May, OpenAI Codex has been used by 70 applications, including Microsoft’s acquisition of Github’s Copilot.
- August, Stability AI open-sourced StableDiffusion, the effect of the text-to-image algorithm is available, the speed is feasible, and the code is open source at the same time, which detonated image generation. For a time, in China, AIGC seemed to be synonymous with image generation.
- September, Sequoia Capital released the Generative AI: A Creative New World blog.
- Chinese researchers and developers do not have OpenAI’s API access. But everyone can try image generation, so the Internet seems to only pay attention to image generation, and the attention to GPT large language models has further declined.
- After nearly a year of API access and UI exploration, nearly a year of Chain of Thought and other Prompt Engineering technology trial and error, model acceleration and other technologies (such as Flash Attention, Fixed-Point) brought about cost and delay reduction, the model potential of GPT-3.5 has been developed (become Better, Faster and Cheaper), Copy.ai, Jasper and other text generation companies’ products are gradually maturing.
- November, OpenAI released the new model of GPT3.5 API (text-davinci-003).
- December 1, ChatGPT released. Musk and other celebrities began to talk about ChatGPT, detonating the English Internet.
- Early December, self-media on the Chinese Internet gradually began to discuss ChatGPT, mainly by translating Twitter. Scholars on Zhihu began to reflect. A week later, the attention index declined, and only AI self-media have been taking ChatGPT as their main focus for two months.
- China closed its doors.
2023
- January, Microsoft announced an investment of billions of dollars in OpenAI and added GPT to the full family bucket.
- February, the Chinese New Year ended, Microsoft and Google took turns, and the Nasdaq earnings season, AI was repeatedly mentioned. The Chinese Internet knows Microsoft, and ChatGPT detonated the Chinese Internet, and the attention index soared.
- China opened up.
It is worth noting that the three years when China closed its doors due to the epidemic were the three years when OpenAI’s GPT developed, grew, and productized.
How did we miss the GPT feast?
Now that the history has been reviewed, why didn’t we (China, especially the AI community) realize earlier that OpenAI’s technological breakthrough was at the application level?
What conditions need to be met to realize the problem:
- Be able to read and understand papers from institutions such as OpenAI, DeepMind, and Google (representative group: researchers)
- Be able to use OpenAI’s API to explore the models in the papers (representative group: early adopters among researchers)
- Sensitivity to Silicon Valley, often watching what everyone is using OpenAI’s API to make products (representative group: VC)
In China, we roughly estimate that the first category has about 1/100,000, the second category is about 1/1,000 of the first category, and the third category is about 1/1,000,000. Three conditions, lacking one, cannot realize how far OpenAI has developed. Which team has brought together these three types of people, and do they have sufficient collisions? Is there anyone who has these three attributes? To make matters worse, researchers have been locked up in the country for three years and have not been abroad to participate in academic conferences and exchanges, and I guess many people have not even participated in online conferences, and many things we cannot see from the papers.
Let’s dig deeper. In the first category of people, it is divided into NLP (Natural Language Processing) researchers and other AI researchers (such as computer vision, speech recognition, and machine learning).
In the Chinese NLP research community, it is basically to use language models (especially BERT, not GPT) to apply to various downstream tasks of NLP. In academia, it is to brush the list and publish papers. In the industry, it is to make customer service robots, writing robots, and role-playing robots. The research methods are also completely different from the essence of GPT – Scaling-up and Alignment. (Almost) no one is studying large language models (LLM) as a possibility of general artificial intelligence (AGI).
Other AI researchers, such as computer vision, most people still focus on images, even if they use Transformer, they also solve image problems, such as using Transformer to do autonomous driving, image generation, etc. Even Karpathy, the AI director of Tesla AutoPilot. Karpathy resigned from Tesla in the first half of 2022 as an independent researcher and devoted himself to large language models.
Karpathy once said that he was obsessed with the direction of the fastest progress in AI in the past ten years, and he was very interested in language models, but he ignored the power of scaling up, that is, a simple Objective (next word) + a simple structure (Transformer) + enough parameters + enough data (web text), a language model can emerge abilities that cannot be seen in a small-scale state. He once thought that reinforcement learning was the path of AGI like others (he should refer to the early OpenAI), but in the end he found that large language models were the most promising path. Before that, researchers in language models spent too much energy on specific tasks.
Let’s talk about another important group in the AI field – the Computer Vision (Computer Vision) group. In the deep learning wave that began in 2012, computer vision has always been the most widely used and commercially successful direction, attracting too much energy from AI researchers, from image classification, detection, segmentation to recognition, from images to videos, from high-level vision to low-level vision, we have rolled out one new height after another on convolutional neural networks. A YOLO target detection framework has been iterated to the point where the original author has given up, and someone has pushed it to version v7. The most representative is the moon landing project of computer vision – autonomous driving, which requires almost all visual AI technologies such as imaging, recognition, synthesis, mapping, and planning. From the CNN era to the Transformer era, it has continuously pulled more people into the water, but until today, the full autonomous driving solution has not converged. The problem defined by Musk is correct, autonomous driving is a real-world AI problem, but obviously Tesla’s solution is not ready for full autonomous driving.
The small family of the NLP circle, the outsiders of the CV circle, the three years of closed doors due to the epidemic, and the lack of information on the Internet. These factors are superimposed, and the entire Chinese world has formed an information cocoon. For 10 years, we thought that the advantages of AI algorithms, data, and applications we had accumulated have now become a huge gap between China and the United States. At this time, we didn’t even have a news investigation to dig into the ins and outs of this matter.
Another problem is that our Chinese Internet is not enough to provide high-quality training data. What is high-quality data? For example, Wikipedia, high-quality active forums, professional news, academic papers, high-quality code, and books.
Let’s see what the training data of GPT-3 is. The dataset with the largest weight is OpenWebText (open source version), and the data is collected from the URLs on the Reddit forum, and then the content is crawled. Common Crawl is an open Internet data archive (English accounts for half, Chinese accounts for about 5%). Other representative data include Wikipedia, Books open books, Stack Exchange technical Q&A community, Github code, ArXiv papers, RealNew news archive, and PubMed medical data. It can be seen that the proportion of data generated by the Chinese Internet is so low that it can be ignored. This is also a problem that plagues many people who try to train Chinese large models, but in fact, ChatGPT’s ability to communicate in Chinese has far exceeded those specialized Chinese large language models, and the reason behind it is the translation ability implicitly learned by GPT.
Without good Chinese data, we can only hitchhike on the data of the global Internet. The generation of the above high-quality data requires an open community, and we seem to have no solution.
Can GPT large language models achieve AGI?
Based on GPT’s LLM, relying solely on language, it is likely to be unable to achieve AGI, but only “an exit on the highway to AGI (Yann Lecun)”. But LLM is enough to turn the Internet infrastructure upside down, and it has both Logic and Memory. Logic is the reasoning ability, and Memory is the memory of high-frequency knowledge. Obviously, Memory can be divided into on-chip and off-chip, on-chip is limited, and off-chip is infinite. Next, we only need to focus on pushing the Logic of LLM to the extreme, offloading most of the low-frequency Memory outside the model, and matching it with search and other query technologies, we can achieve the reconstruction of the entire Internet front and back ends. We are far from eating up the dividends of scaling-law. What limits us is only the Moore’s Law and manufacturing capabilities of integrated circuits, the price of energy, and the acquisition of data.
In terms of integrated circuits, the system Moore’s Law represented by Chiplet is not enough, and people need Foundry that can scaling-up.
In terms of energy, solar energy and wind energy + energy storage can solve many problems, and the more exciting thing is the nuclear fusion technology represented by Helion, which has the opportunity to reduce the energy price by an order of magnitude, and then more.
In terms of data, the current GPT model relies on Internet text data, which will be exhausted, it doesn’t matter, the data of the real world is infinite.
Series Topic Preview
I’ll write it here for now.
Planned:
- The story of OpenAI
- AI Alignment
- AI and Capitalism
- AI and Education
- People in the AGI era
By Dr. Hong, February 8, 2023
Discover more from 自由档案馆
Subscribe to get the latest posts sent to your email.

