Technology and Platform

DCAL has access and expertise in the below software and platforms to facilitate research and training.

R: Statistical Language

An open-source statistical modeling language, R has traditionally been popular in the academic community and fast catching up with professionals, which means that lots of data scientists will be familiar with it. R has in depth support for statistical models, machine learning and visualization libraries. Off late support from H2O, which is a open source machine learning platform, has helped R scale for enterprise systems.

Python: High-Level Programming Language with Excellent Data Libraries

Python has robust libraries that support statistical modeling (Scipy and Numpy), data mining (Orange and Pattern), visualization (Matplotlib) and Scikit-learn, a library of machine learning techniques very useful to data scientists. Python being a generic purpose language makes it easier to integrate with other enterprise systems.

SAS: Data Mining Software Suite

SAS is a suite for business intelligence analysis and analytics. In 2015, SAS topped the Gartner Magic Quadrant list in terms of "ability to execute" in the category of advanced analytics platforms due to the breadth and quality of its predictive modeling and data mining techniques.

SPSS Modeler and SPSS Analytics

Forrester Research Wave ranks IBM's advanced data analytics platform, SPSS, as the top offering in the advanced analytics category for its breadth of tools that deals with data loading, preparing, and predictive modeling using statistical or machine learning techniques. SPSS Modeler and SPSS Statistics were acquired by IBM in 2009, and have been adopted by many enterprise practitioners.

LINDO

LINDO™ linear, nonlinear, integer, stochastic and global programming solvers have been used by thousands of companies worldwide to maximize profit and minimize cost on decisions involving production planning, transportation, finance, portfolio allocation, capital budgeting, blending, scheduling, inventory, resource allocation and more.

IBM CPLEX

IBM ILOG CPLEX Optimizer provides flexible, high-performance mathematical programming solvers for linear programming, mixed integer programming, quadratic programming, and quadratically constrained programming problems. These include a distributed parallel algorithm for mixed integer programming to leverage multiple computers to solve difficult problems.

HADOOP and SPARK

The name Hadoop has become synonymous with big data. It's an open-source software framework for distributed storage of very large datasets on computer clusters. All that means you can scale your data up and down without having to worry about hardware failures. Hadoop provides massive amounts of storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs. Spark is an in-memory processing framework that can run on Hadoop and is used primarily for running multistage and iterative algorithms like scalable machine learning algorithms.

DCAL has a Hadoop cluster running 10 nodes with a total capacity of 200 TB of storage space and compute power of 100 CPUs and 500 GB of RAM.

Tableau

Tableau can help anyone see and understand their data. Connect to almost any database, drag and drop to create visualizations, and share with a click.