Jakub Orłowski, Krzysztof Antończak, Facundo Guerrero OLX ...€¦ · OLX Data Hub Jakub...

OLX Data HubJakub Orłowski, Krzysztof Antończak, Facundo Guerrero

Presto Summit 2019, New York City

Meet OLX, the biggest Web company you’ve

never heard of

Within classified ads, OLX Group is the largest global player

Present in 30 markets,

Leading position

in 27>300m

4Source: Company Information; Leading position refers to top 3 position based on MAUs as per SimilarWeb, Oct 2019; MAUs refers to Monthly Active Users

… with a strong local presence

+ 5,500 dedicated employees

+ 30 offices globally

Anatomy of a typical “BI Stack”

Typical Data Stack S3, Redshift, GitLab, Jenkins

- Tight coupling between compute nodes and storage

- Data is stored on the compute nodes

- Low usage of S3 (Spectrum adoption is slower than expected)

- Limited dependency management

- No scheduling standards (random low quality python scripts)

What are the problems we aim to solve?

- Complex cross-stack synchronisation mechanism

- “Reservoir” design discourages building on each other

- Use of multiple AWS regions makes sharing difficult and increase costs

- Separated ETL scheduling standards

Data Lake Shared Solutions

Divergent Solutions?

...and what if?

Data Lake

Data Hub

Data Lake

Multiple ExecutionEngines

Shared Solutions(Odyn)

Shared synchronisation system and code repository (and, hopefully, standards)

Shared support of multiple execution engines: Redshift, Athena, Presto, Spark

Use of Redshift will be an eng. choice and it’s expected to get lower

Shared storage in a single AWS region and same account

Shared Solutions

OLX Data Hub (“Odyn”) high level architecture overview

Storage

Operator

ODYNData Hub

Applications App 2 App 3 App ...

Config

Scheduler

Actual OLX Data Hub (“Odyn”) task configuration example

Migrating to PrestoWhy we decided to move out of the Redshift comfort zone

Typical data workflow of a “BI stack”

EXTRACT

TRANSFORM

“If you were entering Hadoop ecosystem 8-10 years ago, there was this mantra: bring compute to your

storage, tie them together; shipping data is so expensive.

That is no longer true. All modern architectures right now separate storage from compute. Grow your

data without limit, scale your compute power whenever you need.”

Kamil Bajda-Pawlikowski, Data Council NY, Nov 7-8, 2018

Introduced Athena for querying raw data

EXTRACT

TRANSFORM

Athena adoption failed :-(

● Query exhausted resources● The query timeout is 30 minutes● Generic raw data not so friendly for queries● CTaS usage increase

Looking for the best query execution engine for our needs

Introduced Presto for processing data

EXTRACT

TRANSFORM

Presto in production at OLX

● 30+ nodes in AWS (r5.8xlarge)● 20K+ queries daily● 100+ users in 20 teams over 5 countries● 1PB+ data on S3 (Parquet, ORC, JSON)

prestosql.io

OLX Data Platform

Presto InfrastructureWhere and how we run Presto

Where Presto is Running?

● Kubernetes cluster○ AWS EKS in Ireland○ Staging and Production○ Single Amazon availability zone

● We move Presto from EMR to Kubernetes (EKS) using a mix of spot and on-demand instances

● Store metrics in Prometheus and show them in Grafana

Sizes:● Production = 25 * r5.8xlarge● Staging = 16 * r4.2xlarge

Challenge

Presto has a static size for the cluster even where there is nothing to do, we need to have the workers nodes up

Presto “AutoScaling”

We developed our own “auto-scaling” solution for presto workers, allowing us to reduce the cost of the cluster when no queries are running on it

Next challenges

Presto still not 100% integrated in our current ecosystem.

● Cluster for analysts login using our Single Sign (OKTA) on system ● Use different IAM roles depending on user / catalog / table (GDPR).● Cost-Based Optimizer (using Hive Metastore)

joinolx.com

Jakub Orłowski, Krzysztof Antończak, Facundo Guerrero OLX ...€¦ · OLX Data Hub Jakub...

Documents

Krzysztof Jan Unidade de Áudio sem Fios Stelmachowski

Fotografie - Jakub Vašulkajv.fotografie.sweb.cz/Fotografie - Jakub Vašulka.pdf · fotografie F-1 Lucie Mrázová; ateliér Brno 2009 První fotografie se týká ateliérového fotografování

Alfareros de aquí o de allá: identidad estilística y ... · Revista Española de Antropología Americana 517 201, vol. 4, núm. 2, 1-Krzysztof Makowski y Gabriela Oré Alfareros

Foto: Instagram dominik 35 FAMILIA Y AMIGOS BALANCE ......2020/11/10 · entre el polonés Jan Krzysztof Duda y el estadunidense Fabiano Ca ruana, segundo clasificado en la lista

Historia cultural, historia de los semióforosse944350dd1b12d89.jimcontent.com/download/version/1473255764/... · HISTORIA CULTURAL, HISTORIA DE LOS SEMIÓFOROS Krzysztof Pomian 6

GRAN PREMIO INTERNACIONAL DE CASTILLA Y LEÓN · ADAM BILEWICZ / TOMASZ OLSZEWSKI 5 Selección Nacional de Polonia JAN SOUCEK / LUKAS NEPRAS 20'20''25 JAKUB SPICAR / RADEK SLOVE 6

European Cup Juniors Liberec 2013 · 2019-12-26 · SUKHYY, Davyd KHASHIEV, Islam BRIGIDA, Caio RUF, Daniel ALLANSSON, Victor JECMINEK, Jakub RINNE, Rasmus KIM, Yeongrae VOLEK, David

Comercio Electrónico Olx

Técnicas compositivas de la primera mitad del siglo XX- Influencia en las obras tempranas de Krzysztof Penderecki

Horizontes y cambios lingüísticos en la prehistoria de los ... · en la prehistoria de los Andes centrales Krzysztof Makowskia Resumen En el presente artículo comparo, desde la

Comercio electrónico a nivel de constumer to consumer (tratamiento legal de redes como mercado libre, kotear, olx, etc)

ANÁLISIS ESTRUCTURAL E INTERPRETATIVO DEL … · flauta y orquesta de Krzysztof Penderecki y su objeto es profundizar en el estudio de una de las ... Prokofiev, Schostakovich o

Seminario Mobile Innovus - api.ning.com · alarma del celular Escucha noticias en el radio Revisa la cotización del dólar Sube tres carros a OLX y revisa sus mensajes. ... también

· Web viewSacerdote polaco Krzysztof Charamsa, de 43 años, secretario adjunto de la Comisión Teológica Internacional, adscrita a la Congregación para la Doctrina de la Fe, profesor

REVISTA EL MUNDO DE LAS MARCASelmundodelasmarcas.com/wp-content/uploads/2019/11/... · El creador y arquitecto, Jakub Szczęsny, presentó el proyecto en el festival «Wola Art 2009,

KRZYSZTOF PENDERECKI Y SU PIEZA PARA VIOLONCHELO … · 2020. 10. 24. · marcha de la historia. Para conocer bien una composición de cualquier autor, en este caso Penderecki, debemos

venta de pianos - gradimi.com.mx · venta de piano en el salvador ... venta de piano en olx venta de piano en republica dominicana venta de piano fazioli pianoforte venta venta de

Presentación de la adolescencia (Celia,Víctor,Alberto, Ainhoa y Jakub)

1. LA COMISIÓN PERMANENTE APRUEBA SOLICITAR UN ... · Krzysztof Konaszewski) 12.00 – 12.20 Fotovoltáica y energía solar (Asociación de Energías Renovables) 12.20 – 12.40

La Revolucion Historiografica Francesa · PDF fileEstudios sobre historia cultural La sociedad hispano medieval. La ciudad ... Vovelle, Krzysztof Pomian, Roger Chartier y Jacques Revel,