October 13, 2004

At Wal-Mart, World's Largest Retail Data Warehouse Gets Even Larger

By Evan Schuman, eWEEK

It's only fitting that the largest retailer should have the world's largest database, but at more than one-half a petabyte, that's a lot of information, even for Wal-Mart.

The vendor that is supporting those many bytes of data—NCR's Teradata division—begged for the extraordinary permission from the normally secretive Wal-Mart to announce this achievement Wednesday to make a point: It is arguing that its systems can scale without hiccups even at an extreme number.

But Wal-Mart being Wal-Mart, it's not saying much. While confirming that it does even now have the world's largest datawarehouse—and that it permitted its supplier to announce that—it won't say anything other than "to acknowledge an important milestone," said Gus Whitcomb, Wal-Mart's director of corporate communications. He referred questions to Teradata, saying it's their announcement.

Beyond issuing a news release that Wal-Mart is "increasing its lead as the largest retail data warehouse in the world," it gave no details as to the size or specifics. The "more than 500 terabytes" figure came from a source who didn't want a name or a company linked to the figure.

The statement did, however, point out that this massive data warehouse is not solely a customer CRM system, but also serves as the base for Wal-Mart's Retail Link decision-support system between Wal-Mart and its suppliers. Retail Link allows suppliers to access large amounts of online, real-time, item-level data to help those suppliers improve operations.

Back at Teradata, officials are prohibited from discussing what they have done for Wal-Mart, but one vice president did take the opportunity to argue what it means from an IT perspective.

"The issues we encounter at Wal-Mart are really not all that different from smaller retail data warehouses," said Rob Berman, vice president of Teradata's retail operations. He contrasted Wal-Mart's current data warehouse size with its earliest stage, when it was literally less than one-thousandth of its current size.

"When Wal-Mart started with a 320-GByte data warehouse, it used one database administrator [DBA]. Today, the number of DBAs is still fewer than five," Berman said.

Unlike a typical database that can get slower as it expands—and requires more time to complete backups and virus scans, for example—Berman argues that Teradata's approach sidesteps those growth issues. "Our system is nearly 100 percent linear-scalable. It's designed to scale without the management restrictions of other databases."

How so? "Every time we add a node, we add an equal amount of bandwidth," he said. "Every time we add a component of processing power, we add another component of bandwidth. We just grow the highway. Every time they grow in DASD [direct-access storage device], we add I/O bandwidth."