Yoel Zeldes • 10/28/2018

How to Engineer Your Way Out of Slow Models

This article details how Taboola engineers optimized a computationally expensive multi-modal CTR prediction model without sacrificing accuracy. It explains the architecture using Cassandra for caching embeddings, a gRPC service (EmbArk), Kafka for async message queuing, and a dedicated embedding service to reduce inference latency.

0 comments

#caching #Neural Networks #Embedding