17
Network analysis of biomedical data Artem Ryblov, Alexey Zaikin, Oleg Blyuss, John Timms Lobachevsky State University of Nizhniy Novgorod Institute for Women’s Health, UCL 19.05.15

Ryblov - Presentation (ppt)

Embed Size (px)

Citation preview

Network analysis of biomedical data

Artem Ryblov, Alexey Zaikin, Oleg Blyuss, John Timms

Lobachevsky State University of Nizhniy Novgorod

Institute for Women’s Health, UCL19.05.15

Goal

Analyze biomedical data in order to predict diseases (pancreatic cancer, diseases of the digestive system) at

various stages with the help of machine learning techniques

Content

• Data and disease

• Methods

• Results

• The biggest program in the field of women's cancer research

• Screening study: 200 out of 200 000 women – 100 cases/controls, 100 oncomarkers

• Data available up to 12 years prior (!) to diagnosis

• Data stored in biobank and available for later research

Pancreatic cancer

Biomedical data

…An oncomarker is a biomarker found in the blood, urine, or body tissues that can be elevated in cancer, among other tissue types. There are many different tumor markers, each indicative of a particular disease process, and they are used in oncology to help detect the presence of cancer.

93 markers

Logistic regression

ROC-analysis(curve)

In search for networkoncomarker

or multi-multi marker assay

•Computer networks, e.g. WWW (the Internet)•Functional networks, e.g. part of the genomу,human brain•What to do, if we have many variables for cases/controls?

Pancreatic cancer: 100 markers, 14 cases, 36 controls

marker i

marker j

edgeweight

Apply a threshold for edge weights!

Parenclitical network analysis

Cancer Control

Threshold 5.8 Cancer 14

Control 36

Topological indexes: 32 well-established indexes (centrality scores), mean/max/min degree of nodes, betweenness,

closeness, page rank, ....

Multivariate forward-backward selection in logistic model: 11 Indexes: 97% AUC vs 95% AUC logistic regression only

For proteomics data improvement is more impressive: 87% AUC vs 76% AUC logistic regression only

Conclusion

• Important goal

• Interesting methods

• A lot of work to do

Summary

• Network analysis of oncomarkers is the way to early diagnosis of pancreatic cancer• Parenclitical network approach can be used in multimarker analysis where the number of markers is significant

• Open questions: What indices can we use? What data can we analyze?• Extend parenclitic networks approach for categorical data (smoking, taking medicines, hormone therapy)• Use cross-validation techniques

Conclusion

THANK YOU FOR YOUR ATTENTION!!!