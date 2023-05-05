Everyone is suddenly abuzz about AI. Since OpenAI released ChatGPT, the idea of artificial general intelligence seems like a plausible reality. Whether you believe AI will benefit the world or bring about disaster, it’s hard to deny how compelling ChatGPT/GPT-4 is when it comes to the quality of its answers and the range of tasks it can perform.

This has sparked an “AI arms race”, with the largest tech companies all desperately investing in their own AI capabilities after being beaten to the punch by OpenAI.

Permissionless data

ChatGPT was trained using a vast collection of written material found online — such as a collection of some 7000 unpublished books; WebText; millions of “high quality” outbound links from Reddit; CommonCrawl, a web archive depository containing petabytes of web data, Wikipedia entries and more.