Profilbild von Debaditya Roy Machine learning engineer aus Berlin

Debaditya Roy

verfügbar

Letztes Update: 09.10.2018

Machine learning engineer

Firma: predict.io
Abschluss: Master in Science (ICT Innovation/Informatik)
Stunden-/Tagessatz: anzeigen
Sprachkenntnisse: englisch (verhandlungssicher) | hindi (gut)

Dateianlagen

resume.pdf

Skills

Programming Languages: Python, Java, C, Sql.
Python Libraries: sklearn, pandas, numpy, matplotlib.
Big Data Frameworks: Apache Hadoop, Apache Spark, Apache Flink.
Machine Learning Frameworks: Scikit-learn
Deep Learning Frameworks: Tensorflow, Keras.
AWS Solutions: Lambda, Athena, S3, DynamoDB, EC2, Cloudwatch, Sagemaker, API Gateway.
Version Management: Git
Statistical/Machine Learning/Deep Learning Skills:  Regression, Classification, Markov models, Word Embeddings. Natural Language Processing
Containerization: Docker.

Projekthistorie

Since Jan 2018 - Present: Machine Learning Engineer/Data Engineer at predict.io (Berlin)
Technology: AWS, MySQL, PostgreSQL, Python, Java, sklearn, keras.
Roles:
- Intent Prediction Service: Built an end to end intent prediction service based on location data from users. The raw location data pulled from users using our android/ios sdk in converted into visits,     which are then fed to in- tent prediction algorithm, that predicts the purpose or intent of the particular visit. For example, if the user was shopping, or at home, or at sports etc. Furthermore this model was deployed for live predictions using AWS Sagemaker and a lambda endpoint was set that called the sagemaker endpoint for routing predictions in our custom dashboard.
Machine Learning algorithms: XGBoost Classification, random forest classification, artificial neural network.
Programming Languages: Python.
Libraries: sklearn, tensorflow, keras, boto3, microsoft LGBM.
AWS Services: Lambda, S3, Sagemaker.
Frameworks: Serverless.

- Zone Clustering Service: Built a machine learning service for clustering user visit data into specific zones of interest, such as home, work, etc. The zone clustering algorithm is a central component for our intent prediction algorithm. Used DBSCAN to cluster zones from latitude, longitude of visit data. The zone clustering algorithm is deployed as a AWS Lambda service, which is being called by our dashboard to visualize the zones on a live map. Machine Learning algorithm: DBSCAN.
Libraries: sklearn.
AWS Services: Lambda, S3.
Frameworks: Serverless.

- Next location prediction: Devised a LSTM based model that was capable of predicting next location of a user. The visit data was captured for users and converted into a timeseries representation, which was fed into a LSTM based model to predict the next sequence of the timeseries.
Algorithm: LSTM.
Libraries: tensforlow, keras.

- Backend Services: Deployed pipelines in AWS using lambda, athena, S3, and other services for extracting, aggregating and reporting data for our affiliates and partners. Created custom cloud watch metrics for live monitoring.
Languages: Python.
AWS Services: Lambda, S3, DynamoDB, Redis, Cloudwatch, Athena, API Gateway.

- Reimplementing existing apis: For purpose of speed and scalability we reimplemented our existing APIs using GO language. We got more than 10 times improvement in speed and cost reduction.
Languages: golang.

Since Feb 2017 - Nov 2017: Research Assistant at DFKI/Zalando (Berlin)
Worked in a research collaboration between DFKI and Zalando.
Technology: NoSQL, HBase, Cassandra, Redis, Java.
Roles:
- Benchmarking NOSql databases: or storing cookie data they used HBase as their KV store. I was responsible for researching if there are any better options than that. Simulated a cookie data load and benchmarked HBase, Cassandra, Redis KV store.
Databases: Redis, Cassandra, HBase.
Frameworks: YCSB.
- Optimizing cookie-sync algorithm: The existing cookie-sync algorithm was oveloading the existing system, with cookies generated from real as well as false users. Researched optimization possibility of the existing algorithm. Came up with a solution using heavy-hitters algorithm to make the algorithm more efficient.
Languages: Java
Framework: Apache Spark

Since April 2016 - August 2016: Research Assistant at INRIA (Rennes, France)
Technology: Apache Flink, Java, Raspberry pi, edge-computing.
Roles:
- Built a distributed optical character recognition application of video streams on Apache Flink for deploying into raspberry pi based edge computing infrastructure. The work resulted in a publication, in IEEE Mobile Cloud Conference in San Francisco 2017 , which was co-authored by me.

Since July 2012 - August 2015: Technology Consultant at PricewaterhouseCoopers (Kolkata, India):
Involved in various roles starting as a developer, to tech strategy advisor, to technical track lead.
Technology: SAP PI, SAP ABAP, Java, SAP Banking, SAP CRM, Integration, Middleware technology.
Roles:

- Integrating multiple technical systems running on Oracle, SAP, Java using SAP PI and custom Java applications.
Client industry: Retail.
Client name: Titan(Tata group)

- SAP implementation and upgrade project for an ecommerce major.
Client industry: e-commerce.
Client name: Mjunction(Tata group)

- Porting banking,CRM solutions of a large banking corporation of Azerbaijan into SAP Banking and SAP CRM. Additionally implemented payments and plastic cards system using custom java solution. Leaded a group of 4 people for the technical track of plastic card and payments interface.
Client industry: Banking.
Client name: Demir bank

- Single consultant employed to provide advisory and tech-strategy solution for a retail major. Designed a new POS integration module for the client to be
used in their business.
Client industry: Retail
Client name: Dedcor

- Technical integration between ERP and non ERP systems. Used SAP PI and custom Java solutions.
Client industry: Pharmaceuticals
Client name: Capsugel

- Worked as a SAP ABAP developer for materials management and sales module implementation of a major shipping corporation.
Client industry: Shipping
Client name: GRSE

Other Major Projects:

-
Master thesis(NLP,Machine learning, Data integration)
I did my master thesis in TU Berlin where I researched on detecting ”textual redundancies in dataset” under Prof. Dr. Ziawasch Abedjan. The thesis was an intersection of data integration, machine learning, natural language processing, and deep learning. As an end result, I developed a framework that detected redundancies in a relational dataset. Framing the problem as a machine learning classification problem I started off by defining measures of similarity between words to identify them as redundant or not. Defining semantic similarity measures using word embedding techniques and neural networks, and syntactic measures using string-based metrics I extracted the features of our classification matrix. The feature matrix was fed to different classifiers to classify the redundant words. The framework was further enhanced to detect redundant cells in a relational dataset with a vision to improve duplicate detection.
Languages: Python.
Libraries: gensim Word2Vec, Glove, NLTK, sklearn, Glove, matplotlib, pandas, numpy.

- Image Classification with ecommerce startup(Deep learning, ANN, CNN)Image Classification with ecommerce startup(Deep learning, ANN, CNN)
We worked on “image recognition of background items in an image”. We used VGG16 to transfer the weights before we started building our own layers on top of it. The
testing ac- curacy for the project came to around 0.74.
Languages: Python.
Libraries: tensorflow, keras.

- Big Data Analytics Project (Apache Spark, Apache Flink, Apache Hadop)
Benchmarking of graph-processing systems of Apache Spark and Apache Flink on different data loads, starting from ten thousand edges to social network graph having 1 billion edge.
Languages: Java
Frameworks: Apache Spark, Apache Flink

- Regulatory Data Mining and Intelligent Reporting (Machine Learning, NLP)
Mining MIFIL regulatory doc, to make life of compliance officers easier. Collecting features to build machine learning model for automatic reporting in case of a compliance issue.
Languages: Python
Libraries: sklearn

- Building a database engine (Java)
Building a database which supports sql, hadoop queries and with built-in query optimization technique for a 6 month academic project.
Languages: Java

Reisebereitschaft

Verfügbar in den Ländern Deutschland
Time: Immedietly
Location: Berlin/Remote/Germany
Profilbild von Debaditya Roy Machine learning engineer aus Berlin Machine learning engineer
Registrieren