2012-03-09

Enabling block compression in Hive's sequence files

If you're like me, you like Hive, and you like storing some of your data in sequence files. You also like compressing your data, so that your data is snappy and delicious.

If you're like me, you may have taken these recommendations from the Hive wiki for enabling block compression in your sequence files.

Alas, we found that block compression wasn't actually happening. In looking at the Hadoop 0.20.203.0 source, the logic associated with the "io.seqfile.compressiong.type" setting is marked as deprecated.

We found it necessary to use the newer "mapred.output.compression.type" setting instead.

No comments:

Post a Comment

Subscribe via email

Enter your email address:

Delivered by FeedBurner

Subscribe (RSS)