Randy Zwitch 1/28/2016

A Million Text Files And A Single Laptop

Read Original

This article addresses the common data engineering problem of efficiently processing millions of small, similarly formatted text files that are too large for RAM but don't justify big data frameworks. It demonstrates a solution using GNU Parallel, stream processing, and command-line tools on a single modern laptop, with examples in R and Python for data generation.

A Million Text Files And A Single Laptop

Comments

No comments yet

Be the first to share your thoughts!

Browser Extension

Get instant access to AllDevBlogs from your browser

Top of the Week