My research interests include Big data analysis and processing, Computer networking and Cyber security, Cloud Computing, and Internet of Thing. A major research is Medical Big Data Analysis to find correlation between disease (8,000) and genome data (of 22 chromosomes) of 20,000 patients. Anothor major research is Smart Farm based on Internet of Things that develops holistic solution using sensor devices, data transfer to Cloud through network, analysis and pattern recognition using machine learning in Cloud, data visualization through Web and Mobile Application, and signal system to end users.
- Smart Farm and Smart Home using Internet of Things
- Big Data Analysis with MapReduce (Spark/Hadoop): Medical Big Data Analysis
- Cybersecurity, Virtual Reality (VR) and Augmented Reality (AR) applications
- Machine Learning & Neural Networks: TensorFlow, CNN (Convolutional Neural Network)
- Big Data Transfer on High Speed Network (10Gbps), Networked Storage Systems, Software Defined Storage
- Data De-duplication (in both storage and network)
- Cloud computing, distributed systems, mobile computing
- Software Defined Network (OpenFlow, OpenVSwitch, Controller)
- Network Application and Protocols, Wireless/Sensor Networks, Multicasting Protocols
Projects on Big Data Analysis
- Elastic High Speed Medical Big Data Analysis System to Discover Associations between Genetic Variants and Diseases, July 2021 - present
- Enhancing the existing medical big data analysis system with 10Gbps high speed network on Spark cluster and 10TB EHR and Genome data. Developing a solution to analyze the 10TB data with a limited memory resource (155GB RAM, not enough to load the data at a time) within a threshold time. Enhancing user interface using plot and table formats in Shiny application. Implementing dynamic user management including authentication and saving of users' credential information. Exploring multiple approaches for dynamic resource allocation. This research project is supported with RSCA grant.
- Medical Big Data Analysis Based on Spark Cluster, Jan 2017 - June 2021
- Developed "Web Based Query Tool" system that analyzed medical information with large-sized 20,000 patients' Electronic Health Records (EHR) and Genome data. The goal of the system is to find correlation between diseases and genomes using PheWAS (Phenome Wide Association Studies) and GWAS (Genome Wide Association Studies). Deployed a high performance distributed network (Spark Cluster) where data are processed in multiple servers in parallel to speed up the performance. For rapid development of visualizing analysis results, Shiny application on R was used. To connect R and Spark Cluster, SparkR was used. This work has been published at an IEEE conference, June 2021. (The presentation of this work titled with "Medical Big Data Analysis System to Discover Associations between Genetic Variants and Diseases" is shown at "publication" tab)
Projects on Internet of Things (IoT)
- Efficient Internet of Things (IoT) Smart Farm System, on-going project
- Designing and developing efficient Internet of Things (IoT) Smart Farm System that automates farming facilities, increases harvest productivity while reducing cost. Deploying many types of sensor devices including humidity, temperature, water, and more into farm. High quality image data of vegetables and plants are captured using drone to find diseases on farms. Collected data are stored to different types of Clouds including Google, IBM, and Amazon, and image data are analyzed using Tensorflow to find disease patterns. Developing solutions to visualize analyzed information and to actuate devices.
- Internet of Things (IoT) Smart House System, on-going project
- Designing and developing efficient Smart House solution. Data are collected from various sensor devices including current, gas, ultrasonic, water, vibration, temperature and humidity sensors. Assembled the sensor devices with Arduino Mega 2560 and Yun. Preprocessed collected data with Arduino Mega 2560 and sent to Cloud using Arduino Yun. Stored and monitored the collected data in Cloud. Visualizing the data through mobile and Web applications.
Projects on Data Optimization on Storage and Network
- Efficient Big Data Transfer Service (BigDTS) through Networks, on-going project
- Designing and developing efficient Big Data Transfer Service (BigDTS) that removes the redundant data transfer by avoiding sending duplicate data from server to client. BigDTS reduces network traffic to Internet Service Provider (ISP) networks, as well as increases throughput and decrease time delay at end users. BigDTS consists of a centralized controller, virtual routers (called middle-boxes), and client/server programs. Implemented by Java.
- Software Defined In-Network De-duplication, Jul 2013 - Sep 2015
- Addressed that data transferred on network have excessive redundancies when duplicate data traverse same routers (or switches) to multiple destinations. Developed a paradigm to remove redundancies in networks (ultimately storages) by using de-duplication techniques in client, server, and network side. Implemented by C++ and Java.
- Mobile De-duplication, Jul 2013 - Feb 2016
- Address that image and video files are immensely generated in mobile devices and that those files have huge redundancies due to duplicate copy preferences of users or application characteristics (taking multiple similar pictures of an object). Developing a light-weight image de-duplication in mobile devices considering security of separated chunks. Written in C++.
- Structure Aware File and Email De-duplication for Cloud-based Storage Systems (SAFE), Jul 2012 - Jun 2013
- Discovered that a slight change in Microsoft Office documents and PDF creates totally different binary files compared to the original documents. Explored the format of Office documents and PDF, and proposed how to de-duplicate the documents in files and emails for Cloud-based Storage systems like Dropbox. Designing and developing scalable de-duplication systems result in signicant decrease in storage space and index overhead. Implemented experiments by C++ based on Cloud Storage System.
- Email De-duplication File System on Email Servers (HEDS), Sep 2010 - May 2012
- Proposed hybrid scheme that adaptively performs de-duplication at the granularity of either file-level or chunk-level. Designed and implemented the hybrid email de-duplication system and evaluated it with real email datasets. Evaluated that it achieves high data reduction rate while reducing CPU and memory overhead. Implemented in C++, Sendmail, along with Filesystem in User space (FUSE).
Projects on Multicast Routing Protocols for Wireless Sensor Networks
- Branch-based Efficient and Adaptive Multicast for Wireless Sensor Networks (BEAM), Sep 2012 - Dec 2015
- Designed and Implemented algorithms for efficient data forwarding and membership establishment schemes for branch-based multicast routing protocols. Implemented by C++ and bash shell script. Evaluated with NS2 and MATLAB.
- Adaptive Path Control Protocol for Efficient Branch-based Multicast Routing in Wireless Sensor Networks (APCP), Sep 2012 - Nov 2014
- Proposed new method to measure efficiency of multicasting path. Designed and Implemented algorithms to run the method in C++. Evaluated with NS2 and MATLAB.
- Energy-Efficient Adaptive Geo-Source Multicast Routing for Wireless Sensor Networks (EAGER), May 2012 - Sep 2012
- Designed and Implemented energy-efficient and scalable multicast routing protocol. Optimized the location-based and source-based multicast in terms of energy, packet overhead, and computational overhead. Developed an encoding mechanism to optimize packet header. Written in C++ language and bash shell script. Evaluated with NS2 and MATLAB.
- Multicast Routing with Branch Information Nodes for Wireless Sensor Networks (MR.BIN), Jan 2009 - May 2010
- Investigated the various overhead issues of existing WSN multicast protocols. Implemented a hybrid approach of geographic unicast routing and state-based multicast routing. Evaluated optimal tradeoff among the overhead of the intermediate nodes states, packet header size, computation time, and energy consumption and balance. Implemented in C++ language. Simulated with NS2 and analyzed with scripting languages including bash shell script, TCL, and AWK.
- Adaptive Geo-Source Multicast Routing for Wireless Sensor Networks (AGSMR), Sep 2008 - May 2009
- Identified the scalability issue of previous location based stateless multicast protocols in Wireless Sensor Networks. Designed and implemented tree construction algorithm with LCRS (Left Child Right Sibling), algorithm to nd common source routing path, and algorithm to select branch geographic information. Implemented in C++ on Linux. Simulated with NS2.