BDA - Big Data Analytics
This course offers a non-technical introduction to the emerging and interdisciplinary area of data science. Students will be introduced to the development, fundamental tools, and the impact of data science in a wide range of disciplines such as business, the sciences and engineering. Fundamental data visualization techniques and basic concepts of machine learning will be applied through real-life data science projects. Moreover, students will explore the general framework for ethical thinking and practicing data science, the current challenges, the benefits, the potential harms and risks posed by developing data science models and technology.
An introductory course on programming languages and tools which are relevant to data analytics. Each language or tool is introduced as a separate module and incorporates applications in mathematics and statistics. Examples of included programming languages and tools are MATLAB, Python, R and SAS. Additional languages and tools may be covered based on current trends in data analytics. Students will complete hands-on programming assignments throughout the course.
An introductory course on machine learning. Machine Learning is the science of discovering pattern and structure and making predictions in data sets. It lies at the interface of mathematics, statistics and computer science. The course gives an elementary summary of modern machine learning tools. Topics include regression, classification, regularization, resampling methods, and unsupervised learning. Students enrolled are expected to have some ability to write computer programs, some knowledge of probability, statistics and linear algebra.
The statistical perspective of data mining is emphasized for majority of the course. Both applied aspects (programming, problem solving, and data analysis) and theoretical concepts (learning, understanding, and evaluating methodologies) of data mining will be covered. Topics include Regularization and Kernel Smoothing Methods, Tree-based Methods, Neural Networks and optional topics such as deep learning.
Topics considered include the solution of non-smooth optimization problems arising in data science, including unconstrained and constrained optimization problems, Lagrange multiplier methods, inequality constraints, Kuhn-Tucker conditions, and applications. Also considered are linear and nonlinear inverse problems, regularization of ill-posed problem including singular value decomposition, and Tikhonov regularization methods and sparse regularization methods, inverse eigenvalue problems and applications such as compressed sensing, image reconstruction and machine learning.
This course introduces students to practical applications of big data analytics. Lecture topics include an overview of the various topics in business, engineering, and government currently using big data analytics. Students will choose a project involving a real world application to explore techniques learned during other course work. Course involves written and oral presentations for students to improve communication and teamwork skills.
This course allows the student to pursue an in-depth exploration of a project initiated in BDA 450. The course involves written and oral presentations for students to improve communication and teamwork skills.
An introductory course on programming languages and tools which are relevant to data analytics. Each language or tool is introduced as a separate module and incorporates applications in mathematics and statistics. Examples of included programming languages and tools are MATLAB, Python, R and SAS. Additional languages and tools may be covered based on current trends in data analytics. Students will complete hands-on programming assignments throughout the course.
An introductory course on machine learning. Machine Learning is the science of discovering pattern and structure and making predictions in data sets. It lies at the interface of mathematics, statistics and computer science. The course gives an elementary summary of modern machine learning tools. Topics include regression, classification, regularization, resampling methods, and unsupervised learning. Students enrolled are expected to have some ability to write computer programs, some knowledge of probability, statistics and linear algebra.
The statistical perspective of data mining is emphasized for majority of the course. Both applied aspects (programming, problem solving, and data analysis) and theoretical concepts (learning, understanding, and evaluating methodologies) of data mining will be covered. Topics include Regularization and Kernel Smoothing Methods, Tree-based Methods, Neural Networks and optional topics such as deep learning.
Topics considered include the solution of non-smooth optimization problems arising in data science, including unconstrained and constrained optimization problems, Lagrange multiplier methods, inequality constraints, Kuhn-Tucker conditions, and applications. Also considered are linear and nonlinear inverse problems, regularization of ill-posed problem including singular value decomposition, and Tikhonov regularization methods and sparse regularization methods, inverse eigenvalue problems and applications such as compressed sensing, image reconstruction and machine learning.
This course will introduce mathematical foundations of machine learning theory and algorithms. Topics include statistical learning theory, kernel methods and generative models. Some modern machine learning methods such as dictionary learning, deep learning, online learning, and reinforcement learning may also be included, time permitting. Students enrolled are expected to have some knowledge of probability, linear algebra, optimization, and analysis.
This course will introduce optimization methods for large-scale problems by exploiting special structures including convexity and sparsity. Topics include introduction to convexity, gradient-related methods, dual methods, sparse optimization methods and nonconvex optimization methods. Students enrolled are expected to have some knowledge of linear algebra, optimization, probability, and analysis.
Under the guidance of a faculty member in the Department of Mathematics and Statistics, the student will undertake a significant computational data analysis problem. A written report and/or public presentation of results will be required.
Introductory discussion on central dogma of molecular biology, concepts of transcription, translation, gene regulation, and the need for high throughput methods. Other topics covered are Introduction to R and Bioconductor, Advanced microarray data analysis, NGS data analysis using edgeR in Bioconductor, Network Biology, sequence, pathway informatics, SNPs, GWAS, informatics for genome variants.
Techniques for obtaining basic tail bounds and concentration inequalities, uniform laws of large numbers, Rademacher complexity of a set, covering and packing in metric spaces, and metric entropy. Also, high dimensional random matrices described in a non-asymptotic framework, with a focus on the estimation of sparse and structured covariance matrix, are studied. The sparse linear regression models and the principal component analysis in the unstructured and sparse setting will be covered.
An introduction to the statistical analysis of sample curves or functions. Topics include smoothing, registration, functional principal component analysis, scalar-on-function regression, and functional response models. All these techniques will be applied using the statistical software R.
Various transform methods from the data domain to coefficients of the data in certain discrete bases are studied. Transforms studied include FFT, DCT, wavelet transforms and framelet transform. Both theory and applications of these transforms are covered.
Techniques for obtaining basic tail bounds and concentration inequalities, uniform laws of large numbers, Rademacher complexity of a set, covering and packing in metric spaces, and metric entropy. Also, high dimensional random matrices described in a non-asymptotic framework, with a focus on the estimation of sparse and structured covariance matrix are studied. The sparse linear regression models and the principal component analysis in the unstructured and sparse setting will be covered.
An introduction to the statistical analysis of sample curves or functions. Topics include smoothing, registration, functional principal component analysis, scalar-on-function regression, functional response models. All these techniques will be applied using the statistical software R.
Various transform methods from the data domain to coefficients of the data in certain discrete bases are studied. Transforms studied include FFT, DCT, wavelet transforms and framelet transform. Both theory and applications of these transforms are covered.