17
musiclopedia ♫ ♪ ♪ ♫ discover the world of music

musiclopedia presentation

Embed Size (px)

Citation preview

musiclopedia♯ ♮ ♭ ♬ ♫ ♪ ♩

♩ ♪ ♫ ♬ ♭ ♮ ♯

discover the world of music

motivation

demo

(artists,dates)(1.4 TB, 240 million records)

pipeline

data sources

MusicBrainz

(artists,dates)

pipeline

Data Store & FrontEndStorage & Batch processing

data sources

MusicBrainz

(1.4 TB, 240 million records)

(artists,dates)

clusters

data sources

MusicBrainz

(1.4 TB, 240 million records)

HDFS datanodeSpark executor

HDFS datanodeSpark executor

HDFS datanodeSpark executor

HDFS namenodeSpark driver

Flaskserver

OrientDB master

OrientDB master

4 x m4.large (8GB RAM ea. & 6TB SSD total) 3 x m4.large (32 GB SSD total)

data flow

content

header

WARC/1.0WARC-Type: conversionWARC-Target-URI: http://www.biography.com/people/ella-fitzgerald-9296210WARC-Date: 2014-08-02T09:52:13ZWARC-Record-ID: WARC-Refers-To: WARC-Block-Digest: sha1:JROHLCS5SKMBR6XY46WXREW7RXM64EJCContent-Type: text/plainContent-Length: 6724

Ella Fitzgerald, known as the "First Lady of Song" and "Lady Ella," was an American jazz and song vocalist who interpreted much of the Great American Songbook...

data flow

www.biography.com/people/ella-fitzgerald-9296210, Ella Fitzgeraldwww.oldies.com/product-view/47037M.html, Louis Armstrongbojack.org/2007/06/knock_a_few_bucks_off.html, John Coltrane

WARC/1.0WARC-Type: conversionWARC-Target-URI: http://www.biography.com/people/ella-fitzgerald-9296210WARC-Date: 2014-08-02T09:52:13ZWARC-Record-ID: WARC-Refers-To: WARC-Block-Digest: sha1:JROHLCS5SKMBR6XY46WXREW7RXM64EJCContent-Type: text/plainContent-Length: 6724

Ella Fitzgerald, known as the "First Lady of Song" and "Lady Ella," was an American jazz and song vocalist who interpreted much of the Great American Songbook...

data flow

challenges

- How to find the bands: Air, The Clash, Chicago?

~1,4 TB, 274M websites, 1000 artists

- Norah Jones vs Miles Davis?

challenges

- How to find the bands: Air, The Clash, Chicago?

~1,4 TB, 274M websites, 1000 artists

- Norah Jones vs Miles Davis?

challenges

- How to find the bands: Air, The Clash, Chicago?

~1,4 TB, 274M websites, 1000 artists

- Norah Jones vs Miles Davis?

about me

B.Sc. EE, Universidad de Chile

M.Mus. Music Technology, NYU

Artist catalog:-MusicBrainz databaste (~1,000,000 entries)

→Jazz subset (1,000 entries)

Artist relationship metric:-CommonCrawl July 2015 log (~145 TB)→ Uncompressed '.wet' files (~1.5 TB)

data specs

JohnColtrane

W1

W10

W6 W5

Norah Jones

W2

W3

W4

MilesDavis

W9

W8

W7

W12

W13

W5

Miles John Norah Total

Miles 5 2 2 9

John 2 4 1 7

Norah 2 1 9 12

model

Miles John Norah Total

Miles 5 2 2 9

John 2 4 1 7

Norah 2 1 9 12

model

Avgerage links between any two artists “X” = (2+2+1)/3 = 1.667

Avgerage links for a single artist “Y”= (9+7+12)/3 = 9.333

=> Average percentage “Z” = X/Y = 17.8 %

bool areConnected(artist A, artist B){aCountsInB = countLinks(A,B) / countLinks (B)bCountsInA = countLinks(A,B) / countLinks (A)

if mean(aCountsInB, bCountsInA) > C *Zreturn true

return false}