The 2024 Future of Memory and Storage (FMS) conference provided important insights on digital storage and memory applications and product development. For 2024 FMS changed its name (but not its initials) to better reflect a focus on all aspects of digital storage and technology and technology as well as how storage architectures and the storage hierarchy is changing to support new AI storage and memory workflows. This article looks at announcements by Kioxia and SK Hynix as well as an announcement by Lam Research (before the 2024 FMS) on technology for increasing the layer count in 3D NAND to 1,000 layers.
Kioxia gave a keynote talk that included Atsushi Inoue and Neville Ichhaporia (from Kioxia) as well as Rob David from NVIDIA (formerly Mellanox), following up on the scaling discussions from the keynote talk by Western Digital at the 2023 FMS. Creating higher 3D stacks increases production costs and the net density gain has been diminishing with additional flash layers. This is shown in the chart below as well as the reduction in capex spending for NAND flash production during the downturn that started in 2022 through most of 2023.
Kioxia defined Capex efficiency as the ratio of production GB output over spending on capex. Capex efficiency declines with increases in the vertical scaling (layers) alone. Kioxia said that extra layers could be added with better capex efficiency if this was combined with shrinking of lateral features achieved through memory hole pitch shrinking, staggered memory holes, write layer overhead shrinking and other approaches. Using this approach Kioxia said that their Gen 8 BiCS flash product achieved higher memory density with lower layer counts using lateral shrinking combined with some vertical shrinking (more layers) than competitors, as shown below.
The result of these changes at Kioxia was an 80% increase in interface performance, a 10% reduction in read latency, a 30% better write power efficiency and a 20% better write performance for Kioxia’s Gen 8 compared to Gen 6 3D NAND flash. In addition to its 1TB TLC Gen 8 product die, Kioxia introduced a 2Tb QLC die, providing more than 23Gb/mm2 density. This QLC product can provide 4TB per 16-die stack and more than 128TB in an SSD. As shown in the slide below, Kioxia envisions high capex efficient Gen 9 and Gen 10 products with its lateral as well as vertical scaling approach.
Using advancing PCIe interfaces Kioxia said that with a 2U top of rack switch in a 40U rack they could achieve up to 20TB/drive, up to 7.5PB/1U and up to 304PB/rack. Kioxia also said that with an embedded controller memory buffer (CMB) and a direct memory access controller (DMAC) in its SSDs it could provide data integrity at scale (data scrubbing) without the overhead penalty of moving data between system DRAM and CPU cache. The company also talked about its XL-Flash CXL memory module and SSD storage in AI data processing, as shown below.
Rob Davis from NVIDIA talked about the charging storage and memory hierarchy for AI, particular for Retrieval-Augmented Generation (RAG) AI. This is an AI framework that combines the strengths of traditional information retrieval system (like databases) with the capability of generative large language models (LLMs). By combining the additional knowledge from traditional information retrieval systems with LLM language skills, more accurate and relevant information is available from the LLM.
Rob talked about having SSDs that could include complex vector databases inside of the SSD for augmented data retrieval. Kioxia AiSAQ (All-in-Storage ANNS) is a research project that would require minimal DRAM as well as better AI performance and accuracy, compared to traditional approaches as well as conventional SSD-based approaches as shown below.
Unoh Kwon, VP and Head of HBM PI, SK hynix talked about their DRAM solution for AI memory. There is a surge in demand for ultra high-speed processing of big data to support various AI modelling. This processing is driving the demand for high performance memory. New memory architectures such as CXL and domain specific processors such as neural processing units (NPUs) are changing the way processing, including AI, is done .
High bandwidth memory (HBM) is widely used in AI processing due to its high performance. HBM3 runs at 0.8TB/s and HBM3E at 1.2TB/s (single channel). HBM3E supports 12 high die stacks, while HBM will support up to 16 high die stacks using a hybrid bonding technique. The chart below shows SK hynix’s HBM roadmap, which appears to include some built in logic processes in HBM4.
SK hynix is working to provide the most optimal AI memory for each customer using custom IP and processes. In addition to HBM, SK hynix is working on processing in memory (PIM) solutions which can provide performance improvements and greater energy efficiency by using the high bandwidth within the memory and not moving data around for processing, as shown below. The company is also working on LPDDR5 for direct connect memory with up to a 50% higher bandwidth and 21% lower power reduction. SK hynix is also providing products supporting scale out memory using CXL as well as CXL enabled memory pooling.
Chunsung Kim, VP, SSD PMO at SK hynix spoke about the companies NAND solutions for Generative AI. He pointed out that AI training data sets have been growing 3X/year since 2010 and that AI models are becoming increasingly powerful and diverse. There are many steps for the data workflow for AI training and subsequent inference and these different steps require different kinds of digital storage. AI is also requiring more data center resources. Average processor active power has grown 5X since 2014 with a 3X increase in average rack power. SSDs are being used for cache memory in GPU applications. For client inference devices faster SSDs are required.
Current GenAI-ready SSD solutions are a PCIe Gen5 enterprise SSD with good IOPS/W and a 61TB QLC enterprise SSD in various enterprise and data center form factors. For on-device AI they offer a PCIE Gen5 client SSD and a zoned UFS SSD for mobile applications, both offering higher performance and energy efficiency. The image below shows SK hynix’s view of future AI memory solutions including DRAM and SSD solutions, both revolutionary and evolutionary. The LPDDR AiM is targeted for higher performance on-device AI in mobile applications. SK hynix will also be developing compute in memory as well as compute in storage solutions to improve AI performance and to lower energy consumption.
Although announced before the 2024 FMS, Lam Research, a manufacturer of semiconductor production equipment and processes announced its Cryo 3.0 Cryogenic Etch Technology. This is the 3rd generation of the company’s cryogenic dielectric etch technology that can be used for ultra cold high power confined plasma reactive etching for high aspect layer etching (up to 10 microns) with critical dimension deviation of less than 0.1% from top to bottom.
Such high aspect etching is needed for making 3D NAND flash with up to 1,000 layers. Lam says that, combined with the Vantex dielectric system this technology can etch 2.5X faster with better wafer-to-wafer repeatability and with a 40% reduction in energy consumption per wafer and up to a 90% reduction in emissions compared to conventional etch processes.
Kioxia and SK hynix announced new storage and memory products and future roadmaps with a special focus for use in advanced AI training and inference. Lam Research announced advanced etching to support the creation of up to 1,000 layer 3D NAND flash.