Jiashu Wu 吴嘉澍

Jiashu Wu Education Publication & Patent Award Project Internship CV Contact

Hola😊! I am David Jiashu Wu (吴嘉澍). I received my Ph.D.🎓 degree from the University of Chinese Academy of Sciences, advised by Prof. Yang Wang. I received my M.Eng. degree from the University of Melbourne in 2020 and my B.S. degree from the University of Sydney in 2018. Prior to transferring to University of Sydney in 2016, I joined Beijing Institute of Technology as a undergraduate.

My research interest is to apply domain adaptation and deep learning for network intrusion detection.

Updates

🎉 [Apr 2024] I completed my Ph.D. defence.
🎉 [Jan 2024] The paper Open Set Dandelion Network for IoT Intrusion Detection has been accepted by ACM Transactions on Internet Technology (CCF B, SCI Q1, IF=5.3)
🎉 [Oct 2023] I was awarded with the National Scholarship, Ministry of Education, P.R.China.
🎉 [Jun 2023] I was awarded with the President's Scholarship, Chinese Academy of Sciences.
🎉 [Jun 2023] Patents CN202210026373.0 and CN202210973887.7 have been granted.
🎉 [Mar 2023] The paper Adaptive Bi-Recommendation and Self-Improving Network for Heterogeneous Domain Adaptation-Assisted IoT Intrusion Detection has been accepted by IEEE Internet of Things Journal (SCI Q1, IF=10.6)
🎉 [Feb 2023] The paper Cost-Efficient Sharing Algorithms for DNN Model Serving in Mobile Edge Networks has been accepted by IEEE Transactions on Services Computing (CCF A, SCI Q1, IF=11.0)
🎉 [Jan 2023] The paper Heterogeneous Domain Adaptation for IoT Intrusion Detection: A Geometric Graph Alignment Approach has been accepted by IEEE Internet of Things Journal (SCI Q1, IF=10.6)
🎉 [Dec 2022] The patent CN202011438337.2 has been granted.
🎉 [Oct 2022] The paper Joint Semantic Transfer Network for IoT Intrusion Detection has been accepted by IEEE Internet of Things Journal (SCI Q1, IF=10.6)
🎉 [Jul 2022] The paper PackCache: An Online Cost-driven Data Caching Algorithm in the Cloud has been accepted by IEEE Transactions on Computers (CCF-A)

	Ph.D. in Computer Science, 2021 - 2024 University of Chinese Academy of Sciences (中国科学院大学), Beijing, China Supervisor: Prof Yang Wang
	M.Eng. in Artificial Intelligence, 2019 - 2020 (Average mark=88.1, top 1%, achieve First Class Honour grade in all subjects) University of Melbourne (墨尔本大学), Parkville, Victoria, Australia Supervisor: Prof Rui Zhang
	B.S. in Computer Science & Financial Mathematics and Statistics, 2016 - 2018 (Average mark=86.4, top 1%) University of Sydney (悉尼大学), Camperdown, New South Wales, Australia Supervisor: Prof Simon Poon
	B.Eng. in Software Engineering, 2015 - 2016 (Quitted and joined USYD in 2016) Beijing Institute of Technology (北京理工大学), Beijing, China

Publication

(* Equal Contribution, ✉ Corresponding Author, IF/S: Impact Factor/Score, WOS: Web of Science, PDF & BibTex are downloadable)

IoT Intrusion Detection via Domain Adaptation and its Efficiency Improvement

✅ CCF-A & SCI Q1: 8 papers

Open Set Dandelion Network for IoT Intrusion Detection
Jiashu Wu, Hao Dai, Kenneth B. Kent, Jerome Yen, Chengzhong Xu, Yang Wang✉
ACM Transactions on Internet Technology (ACM TOIT), 2024
SCI Q1, CCF B类, IF: 5.3
[ Abstract ] [ PDF ] [ Link ] [ BibTex ] [ 10.1145/3639822 ] [ SCI (Online Soon) ]

As Internet of Things devices become widely used in the real world, it is crucial to protect them from malicious intrusions. However, the data scarcity of IoT limits the applicability of traditional intrusion detection methods, which are highly data-dependent. To address this, in this paper we propose the Open-Set Dandelion Network (OSDN) based on unsupervised heterogeneous domain adaptation in an open-set manner. The OSDN model performs intrusion knowledge transfer from the knowledge-rich source network intrusion domain to facilitate more accurate intrusion detection for the data-scarce target IoT intrusion domain. Under the open-set setting, it can also detect newly-emerged target domain intrusions that are not observed in the source domain. To achieve this, the OSDN model forms the source domain into a dandelion-like feature space in which each intrusion category is compactly grouped and different intrusion categories are separated, i.e., simultaneously emphasising inter-category separability and intra-category compactness. The dandelion-based target membership mechanism then forms the target dandelion. Then, the dandelion angular separation mechanism achieves better inter-category separability, and the dandelion embedding alignment mechanism further aligns both dandelions in a finer manner. To promote intra-category compactness, the discriminating sampled dandelion mechanism is used. Assisted by the intrusion classifier trained using both known and generated unknown intrusion knowledge, a semantic dandelion correction mechanism emphasises easily-confused categories and guides better inter-category separability. Holistically, these mechanisms form the OSDN model that effectively performs intrusion knowledge transfer to benefit IoT intrusion detection. Comprehensive experiments on several intrusion datasets verify the effectiveness of the OSDN model, outperforming three state-of-the-art baseline methods by 16.9%. The contribution of each OSDN constituting component, the stability and the efficiency of the OSDN model are also verified.
Adaptive Bi-Recommendation and Self-Improving Network for Heterogeneous Domain Adaptation-Assisted IoT Intrusion Detection
Jiashu Wu, Yang Wang✉, Hao Dai, Chengzhong Xu, Kenneth B. Kent
IEEE Internet of Things Journal (IEEE IoT Journal), 2023
SCI Q1, 清华B类, IF: 10.6
[ Abstract ] [ PDF ] [ Link ] [ BibTex ] [ 10.1109/JIOT.2023.3262458 ] [ WOS:001037986000009 ]

As Internet of Things devices become prevalent, using intrusion detection to protect IoT from malicious intrusions is of vital importance. However, the data scarcity of IoT hinders the effectiveness of traditional intrusion detection methods. To tackle this issue, in this paper, we propose the Adaptive Bi-Recommendation and Self-Improving Network (ABRSI) based on unsupervised heterogeneous domain adaptation (HDA). The ABRSI transfers enrich intrusion knowledge from a data-rich network intrusion source domain to facilitate effective intrusion detection for data-scarce IoT target domains. The ABRSI achieves fine-grained intrusion knowledge transfer via adaptive bi-recommendation matching. Matching the bi-recommendation interests of two recommender systems and the alignment of intrusion categories in the shared feature space form a mutual-benefit loop. Besides, the ABRSI uses a self-improving mechanism, autonomously improving the intrusion knowledge transfer from four ways. A hard pseudo label voting mechanism jointly considers recommender system decision and label relationship information to promote more accurate hard pseudo label assignment. To promote diversity and target data participation during intrusion knowledge transfer, target instances failing to be assigned with a hard pseudo label will be assigned with a probabilistic soft pseudo label, forming a hybrid pseudo-labelling strategy. Meanwhile, the ABRSI also makes soft pseudo-labels globally diverse and individually certain. Finally, an error knowledge learning mechanism is utilised to adversarially exploit factors that causes detection ambiguity and learns through both current and previous error knowledge, preventing error knowledge forgetfulness. Holistically, these mechanisms form the ABRSI model that boosts IoT intrusion detection accuracy via HDA-assisted intrusion knowledge transfer. Comprehensive experiments on several intrusion datasets demonstrate the state-of-the-art performance of the ABRSI method, outperforming its counterparts by 9.2%, and also verify the effectiveness of ABRSI constituting components and ABRSI's overall efficiency.
Cost-Efficient Sharing Algorithms for DNN Model Serving in Mobile Edge Networks
Hao Dai, Jiashu Wu, Yang Wang✉, Jerome Yen, Yong Zhang, Chengzhong Xu
IEEE Transactions on Services Computing (IEEE TSC), 2023
CCF-A, SCI Q1, IF: 11.0
[ Abstract ] [ PDF ] [ Link ] [ BibTex ] [ 10.1109/TSC.2023.3247049 ] [ WOS:001045785600016 ]

With the fast growth of mobile edge computing (MEC), the deep neural network (DNN) has gained more opportunities in application to various mobile services. Given the tremendous number of learning parameters and large model size, the DNN model is often trained in cloud center and then dispatched to end devices for inference via edge network. Therefore, maximizing the cost-efficiency of learned model dispatch in the edge network would be a critical problem for the model serving in various application contexts. To reach this goal, in this paper we focus mainly on reducing the total model dispatch cost in the edge network while maintaining the efficiency of the model inference. We first study this problem in its off-line form as a baseline where a sequence of requests can be pre-defined in advance and exploit dynamic programming techniques to obtain a fast optimal algorithm under a semi-homogeneous cost model. Then, we design and implement a 2:5-competitive algorithm for its online case with a provable lower bound of 2 for any deterministic online algorithm. We verify our results through careful algorithmic analysis and validate their actual performance via a trace-based study based on a public open international mobile network dataset.
Heterogeneous Domain Adaptation for IoT Intrusion Detection: A Geometric Graph Alignment Approach
Jiashu Wu, Hao Dai, Yang Wang✉, Kejiang Ye, Chengzhong Xu
IEEE Internet of Things Journal (IEEE IoT Journal), 2023
SCI Q1, 清华B类, IF: 10.6
[ Abstract ] [ PDF ] [ Link ] [ BibTex ] [ 10.1109/JIOT.2023.3239872 ] [ WOS:001000701600047 ]

Data scarcity hinders the usability of data-dependent algorithms when tackling IoT intrusion detection (IID). To address this, we utilise the data rich network intrusion detection (NID) domain to facilitate more accurate intrusion detection for IID domains. In this paper, a Geometric Graph Alignment (GGA) approach is leveraged to mask the geometric heterogeneities between domains for better intrusion knowledge transfer. Specifically, each intrusion domain is formulated as a graph where vertices and edges represent intrusion categories and category-wise interrelationships, respectively. The overall shape is preserved via a confused discriminator incapable to identify adjacency matrices between different intrusion domain graphs. A rotation avoidance mechanism and a centre point matching mechanism is used to avoid graph misalignment due to rotation and symmetry, respectively. Besides, category-wise semantic knowledge is transferred to act as vertex-level alignment. To exploit the target data, a pseudo-label election mechanism that jointly considers network prediction, geometric property and neighbourhood information is used to produce fine-grained pseudo-label assignment. Upon aligning the intrusion graphs geometrically from different granularities, the transferred intrusion knowledge can boost IID performance. Comprehensive experiments on several intrusion datasets demonstrate state-of-the-art performance of the GGA approach and validate the usefulness of GGA constituting components.
Joint Semantic Transfer Network for IoT Intrusion Detection
Jiashu Wu, Yang Wang✉, Binhui Xie, Shuang Li, Hao Dai, Kejiang Ye, Chengzhong Xu
IEEE Internet of Things Journal (IEEE IoT Journal), 2022
SCI Q1, 清华B类, IF: 10.6
[ Abstract ] [ PDF ] [ Link ] [ BibTex ] [ 10.1109/JIOT.2022.3218339 ] [ WOS:000935007100053 ]

In this paper, we propose a Joint Semantic Transfer Network (JSTN) towards effective intrusion detection for large-scale scarcely labelled IoT domain. As a multi-source heterogeneous domain adaptation (MS-HDA) method, the JSTN integrates a knowledge rich network intrusion (NI) domain and another small-scale IoT intrusion (II) domain as source domains, and preserves intrinsic semantic properties to assist target II domain intrusion detection. The JSTN jointly transfers the following three semantics to learn a domain-invariant and discriminative feature representation. The scenario semantic endows source NI and II domain with characteristics from each other to ease the knowledge transfer process via a confused domain discriminator and categorical distribution knowledge preservation. It also reduces the source-target discrepancy to make the shared feature space domain-invariant. Meanwhile, the weighted implicit semantic transfer boosts discriminability via a fine-grained knowledge preservation, which transfers the source categorical distribution to the target domain. The source-target divergence guides the importance weighting during knowledge preservation to reflect the degree of knowledge learning. Additionally, the hierarchical explicit semantic alignment performs centroid-level and representative-level alignment with the help of a geometric similarity-aware pseudo-label refiner, which exploits the value of unlabelled target II domain and explicitly aligns feature representations from a global and local perspective in a concentrated manner. Comprehensive experiments on various tasks verify the superiority of the JSTN against state-of-the-art comparing methods, on average a 10.3% of accuracy boost is achieved. The statistical soundness of each constituting component and the computational efficiency are also verified.
PackCache: An Online Cost-driven Data Caching Algorithm in the Cloud
Jiashu Wu, Hao Dai, Yang Wang✉, Yong Zhang, Dong Huang, Chengzhong Xu
IEEE Transactions on Computers (IEEE TC), 2022
CCF-A, SCI Q2, IF: 3.7
[ Abstract ] [ PDF ] [ Link ] [ BibTex ] [ 10.1109/TC.2022.3191969 ] [ WOS:000947356900023 ]

In this paper, we study a data caching problem in the cloud environment, where multiple frequently co-utilised data items could be packed as a single item being transferred to serve a sequence of data requests dynamically with reduced cost. To this end, we propose an online algorithm with respect to a homogeneous cost model, called PackCache, that can leverage the FP-Tree technique to mine those frequently co-utilised data items for packing whereby the incoming requests could be cost-effectively served online by exploiting the concept of anticipatory caching. We show the algorithm is 2\alpha competitive, reaching the lower bound of the competitive ratio for any deterministic online algorithm on the studied caching problem, and also time and space efficient to serve the requests. Finally, we evaluate the performance of the algorithm via experimental studies to show its actual cost-effectiveness and scalability.
Towards Scalable and Efficient Deep-RL in Edge Computing : A Game-based Partition Approach
Hao Dai, Jiashu Wu, Yang Wang✉, Chengzhong Xu
Journal of Parallel and Distributed Computing (JPDC), 2022
SCI Q1, CCF-B, IF: 4.5
[ Abstract ] [ PDF ] [ Link ] [ BibTex ] [ 10.1016/j.jpdc.2022.06.006 ] [ WOS:000826304000005 ]

Currently, most edge-based Deep Reinforcement Learning (Deep-RL) applications have been deployed in the edge network, however, their mainstream studies are still short of adequate considerations on its limited compute and bandwidth resources. In this paper, we investigate the near on-policy of actions taking in distributed Deep-RL architecture, and propose a "hybrid near on-policy" Deep-RL framework, called Coknight, by leveraging a game-theory based DNN partition approach. We first formulate the partition problem into a variant of knapsack problem in device-edge setting, and then transform it into a potential game with a formal proof. Finally, we show the problem is NP-complete whereby an efficient distributed algorithm based on the potential game theory is developed from device perspective to achieve fast and dynamic partitioning. Coknight not only significantly improves the resource efficiency of the Deep-RL but also allows the inference to enforce the scalability of the actor policy. We prototype the framework with extensive experiments to validate our findings. The experimental results show that with the premise of a rapid convergence guarantee, Coknight, compared with Seed-RL, can reduce GPU utilization by 30~ while providing large-scale scalability.
Simultaneous Semantic Alignment Network for Heterogeneous Domain Adaptation
Shuang Li, Binhui Xie, Jiashu Wu, Ying Zhao, Chi Harold Liu✉, Zhengming Ding
Proceedings of the 28th ACM International Conference on Multimedia (ACM MM), 2020
CCF-A, EI, IS: 12.9
[ Abstract ] [ PDF ] [ Code ] [ Link ] [ BibTex ] [ 10.1145/3394171.3413995 ] [ WOS:000810735003101 ]

Heterogeneous domain adaptation (HDA) transfers knowledge across source and target domains that present heterogeneities e.g., distinct domain distributions and difference in feature type or dimension. Most previous HDA methods tackle this problem through learning a domain-invariant feature subspace to reduce the discrepancy between domains. However, the intrinsic semantic properties contained in data are under-explored in such alignment strategy, which is also indispensable to achieve promising adaptability. In this paper, we propose a Simultaneous Semantic Alignment Network (SSAN) to simultaneously exploit correlations among categories and align the centroids for each category across domains. In particular, we propose an implicit semantic correlation loss to transfer the correlation knowledge of source categorical prediction distributions to target domain. Meanwhile, by leveraging target pseudo-labels, a robust triplet-centroid alignment mechanism is explicitly applied to align feature representations for each category. Notably, a pseudo-label refinement procedure with geometric similarity involved is introduced to enhance the target pseudo-label assignment accuracy. Comprehensive experiments on various HDA tasks across text-to-image, image-to-image and text-to-text successfully validate the superiority of our SSAN against state-of-the-art HDA methods. The code is publicly available at https://github.com/BIT-DA/SSAN.

Miscellaneous Topics

✅ CCF-A & SCI Q1: 2 papers ✅ CCF-B & SCI Q2: 3 papers ✅ CCF-C: 2 papers

Miscellaneous - Edge-cloud Task and Resource Allocation

Neighborhood-oriented Decentralized Learning Communication in Multi-Agent System
Hao Dai, Jiashu Wu, Andre Brinkmann, Yang Wang✉
Proceedings of the 32nd International Conference on Artificial Neural Networks (ICANN), 2023
CCF-C, EI
[ Abstract ] [ PDF ] [ Link (Online Soon) ] [ BibTex (Online Soon) ] [ DoI (Online Soon) ]

Since the partial observation issue is one of the crucial obstacles in multi-agent systems (MAS), a so-called "Centralized Training Decentralized Execution (CTDE)" paradigm has been widely studied by virtue of its integration of the global observations in the course of training process. Traditional CTDE paradigm suffers from observed locally during the execution phase, so numerous efforts have been made to study the communication efficiency among agents to promote the cognitive consistency and better cooperation. However, the vast majority of approaches still take effect in a centralized manner, which facilitates the agents to communicate with each other in a broadcast way. As a consequence, this centralized broadcast-based training process is infeasible when we adopt it to a more complex scenario. To address this issue, we propose a neighborhood-based learning communication approach in this paper to enable the agents to perform the training and execution in decentralized fashion based on the messages of its neighbor nodes. In particular, we design a novel encoder network whereby a two-step decision model is proposed to improve the performance of this decentralized training. To evaluate the method, we further implement a prototype and carry out a number of simulation-based experiments to demonstrate the effectiveness of our proposed method in multi-agent cooperation, when compared with the selected existing multi-agent methods to achieve the best rewards and drastically reduce the training data transmission.
Multi-Scenario Bimetric-Balanced IoT Resource Allocation: An Evolutionary Approach
Jiashu Wu, Hao Dai, Yang Wang✉, Zhiying Tu
Proceedings of the 24th IEEE International Conference on High Performance Computing and Communications (IEEE HPCC), 2022
CCF-C, EI
[ Abstract ] [ PDF ] [ Link ] [ BibTex ] [ 10.1109/HPCC-DSS-SmartCity-DependSys57074.2022.00086 ]

In this paper, we allocate IoT devices as resources for smart services with time-constrained resource requirements. The allocation method named as BRAD can work under multiple resource scenarios with diverse resource richnesses, availabilities and costs, such as the intelligent healthcare system deployed by Harbin Institute of Technology (HIT-IHC). The allocation aims for bimetric-balancing under the multi-scenario case, i.e., the profit and cost associated with service satisfaction are jointly optimised and balanced wisely. Besides, we abstract IoT devices as digital objects (DO) to make them easier to interact with during resource allocation. Considering that the problem is NP-Hard and the optimisation objective is not differentiable, we utilise Grey Wolf Optimisation (GWO) algorithm as the model optimiser. Specifically, we tackle the deficiencies of GWO and significantly improve its performance by introducing three new mechanisms to form the BRAD-GWA algorithm. Comprehensive experiments are conducted on realistic HIT-IHC IoT testbeds and several algorithms are compared, including the allocation method originally used by HIT-IHC system to verify the effectiveness of the BRAD-GWA. The BRAD-GWA achieves a 3.14 times and 29.6% objective reduction compared with the HIT-IHC and the original GWO algorithm, respectively.
PECCO: A Profit and Cost-oriented Computation Offloading Scheme in Edge-Cloud Environment with Improved Moth-flame Optimisation
Jiashu Wu, Hao Dai, Yang Wang✉, Shigen Shen, Chengzhong Xu
Concurrency and Computation: Practice and Experience (CCPE), 2022
SCI Q2, CCF-C, IF: 2.0
[ Abstract ] [ PDF ] [ Link ] [ BibTex ] [ 10.1002/cpe.7163 ] [ WOS:000823064300001 ]

With the fast growing quantity of data generated by smart devices and the exponential surge of processing demand in the Internet of Things (IoT) era, the resource-rich cloud centres have been utilised to tackle these challenges. To relieve the burden on cloud centres, edge-cloud computation offloading becomes a promising solution since shortening the proximity between the data source and the computation by offloading computation tasks from the cloud to edge devices can improve performance and Quality of Service (QoS). Several optimisation models of edge-cloud computation offloading have been proposed that take computation costs and heterogeneous communication costs into account. However, several important factors are not jointly considered, such as heterogeneities of tasks, load balancing among nodes and the profit yielded by computation tasks, which lead to the profit and cost-oriented computation offloading optimisation model PECCO proposed in this paper. Considering that the model is hard in nature and the optimisation objective is not differentiable, we propose an improved Moth-flame optimiser PECCO-MFI which addresses some deficiencies of the original Moth-flame Optimiser and integrate it under the edge-cloud environment. Comprehensive experiments are conducted to verify the superior performance of the proposed method when optimising the proposed task offloading model under the edge-cloud environment.

Miscellaneous - Data Storage

How does SSD Cluster Perform for Distributed File Systems: An Empirical Study
Jiashu Wu, Yang Wang✉, Jinpeng Wang, Hekang Wang, Taorui Lin
Concurrency and Computation: Practice and Experience (CCPE), 2023
SCI Q2, CCF-C, IF: 2.0
[ Abstract ] [ PDF ] [ Link ] [ BibTex ] [ 10.1002/cpe.7709 ] [ WOS:000961344200001 ]

As the capacity of Solid-State Drives (SSDs) is constantly being optimised and boosted with gradually reduced cost, the SSD cluster is now widely deployed as part of the hybrid storage system in various scenarios such as cloud computing and big data processing. However, despite its rapid developments, the performance of the SSD cluster remains largely under-investigated, leaving its sub-optimal applications in reality. To address this issue, in this paper we conduct extensive empirical studies for a comprehensive understanding of the SSD cluster in diverse settings. To this end, we configure a real SSD cluster and gather the generated trace data based on some often-used benchmarks, then adopt analytical methods to analyse the performance of the SSD cluster with different configurations. In particular, regression models are built to provide better performance predictability under broader configurations, and the correlations between influential factors and performance metrics with respect to different numbers of nodes are investigated, which reveal the high scalability of the SSD cluster. Additionally, the cluster's network bandwidth is inspected to explain the performance bottleneck. Finally, the knowledge gained is summarised to benefit the SSD cluster deployment in practice.
A Self-contained and Self-explanatory DNA Storage System
Min Li*, Jiashu Wu*, Junbiao Dai, Qingshan Jiang, Qiang Qu, Xiaoluo Huang, Yang Wang✉
Nature Scientific Reports (NSR), 2021
SCI Q1, IF: 5.0
[ Abstract ] [ PDF ] [ Link ] [ BibTex ] [ 10.1038/s41598-021-97570-3 ] [ WOS:000694868000044 ]

Current research on DNA storage usually focuses on the improvement of storage density by developing effective encoding and decoding schemes while lacking the consideration on the uncertainty in ultra-long-term data storage and retention. Consequently, the current DNA storage systems are often not self-contained, implying that they have to resort to external tools for the restoration of the stored DNA data. This may result in high risks in data loss since the required tools might not be available due to the high uncertainty in far future. To address this issue, we propose in this paper a self-contained DNA storage system that can bring self-explanatory to its stored data without relying on any external tool. To this end, we design a specific DNA file format whereby a separate storage scheme is developed to reduce the data redundancy while an effective indexing is designed for random read operations to the stored data file. We verified through experimental data that the proposed self-contained and self-explanatory method can not only get rid of the reliance on external tools for data restoration but also minimise the data redundancy brought about when the amount of data to be stored reaches a certain scale.
MIX-RS: A Multi-indexing System based on HDFS for Remote Sensing Data Storage
Jiashu Wu, Jingpan Xiong, Hao Dai, Yang Wang✉, Chengzhong Xu
Tsinghua Science and Technology (TST), 2021
SCI Q1, IF: 6.6
[ Abstract ] [ PDF ] [ Link ] [ BibTex ] [ 10.26599/TST.2021.9010082 ] [ WOS:000814631700004 ]

A large volume of remote sensing (RS) data has been generated with the deployment of satellite technologies. The data facilitates research in ecological monitoring, land management and desertification, etc. The characteristics of RS data (e.g., enormous volume, large single-file size and demanding requirement of fault tolerance) make the Hadoop Distributed File System (HDFS) an ideal choice for RS data storage as it is efficient, scalable and equipped with a data replication mechanism for failure resilience. To use RS data, one of the most important techniques is geospatial indexing. However, the large data volume makes it time-consuming to efficiently construct and leverage. Considering that most modern geospatial data centres are equipped with HDFS-based big data processing infrastructures, deploying multiple geospatial indices becomes natural to optimise the efficacy. Moreover, because of the reliability introduced by high-quality hardware and the infrequently modified property of the RS data, the use of multi-indexing will not cause large overhead. Therefore, we design a framework called Multi-IndeXing-RS (MIX-RS) that unifies the multi-indexing mechanism on top of the HDFS with data replication enabled for both fault tolerance and geospatial indexing efficiency. Given the fault tolerance provided by the HDFS, RS data is structurally stored inside for faster geospatial indexing. Additionally, multi-indexing enhances efficiency. The proposed technique naturally sits on top of the HDFS to form a holistic framework without incurring severe overhead or sophisticated system implementation efforts. The MIX-RS framework is implemented and evaluated using real remote sensing data provided by the Chinese Academy of Sciences, demonstrating excellent geospatial indexing performance.

Miscellaneous - Data Processing

Towards Fast Theta-join: A Prefiltering and Amalgamated Partitioning Approach
Jiashu Wu, Yang Wang✉, Xiaopeng Fan, Kejiang Ye, Chengzhong Xu
Concurrency and Computation: Practice and Experience (CCPE), 2021
SCI Q2, CCF-C, IF: 2.0
[ Abstract ] [ PDF ] [ Link ] [ BibTex ] [ 10.1002/cpe.6996 ] [ WOS:000779988000001 ]

As one of the most useful online processing techniques, the theta-join operation has been utilized by many applications to fully excavate the relationships between data streams in various scenarios. As such, constant research efforts have been put to optimize its performance in the distributed environment, which is typically characterized by reducing the number of Cartesian products as much as possible. In this article, we design and implement a novel fast theta-join algorithm, called Prefap, by developing two distinct techniques—prefiltering and amalgamated partitioning—based on the state-of-the-art FastThetaJoin algorithm to optimize the efficiency of the theta-join operation. Firstly, we develop a prefiltering strategy before data streams are partitioned to reduce the amount of data to be involved and benefit a more fine-grained partitioning. Secondly, to avoid the data streams being partitioned in a coarse-grained isolated manner and improve the quality of the partition-level filtering, we introduce an amalgamated partitioning mechanism that can amalgamate the partitioning boundaries of two data streams to assist a fine-grained partitioning. With the integration of these two techniques into the existing FastThetaJoin algorithm, we design and implement a new framework to achieve a decreased number of Cartesian products and a higher theta-join efficiency. By comparing with existing algorithms, FastThetaJoin in particular, we evaluate the performance of Prefap on both synthetic and real data streams from two-way to multiway theta-join to demonstrate its superiority.

Thesis

[PhD Thesis] Research on IoT Intrusion Detection via Domain Adaptation Approach
PhD Thesis at University of Chinese Academy of Sciences (En route)
[Master Thesis] Learning to Rank with Small Set of Ground Truth Data
Master Thesis at University of Melbourne, 2020
[ Abstract ] [ PDF ] [ Link ] [ BibTex ] [ 10.48550/arXiv.2207.01188 ]

Over the past decades, researchers had put lots of effort investigating ranking techniques used to rank query results retrieved during information retrieval, or to rank the recommended products in recommender systems. In this project, we aim to investigate searching, ranking, as well as recommendation techniques to help to realize a university academia searching platform. Unlike the usual information retrieval scenarios where lots of ground truth ranking data is present, in our case, we have only limited ground truth knowledge regarding the academia ranking. For instance, given some search queries, we only know a few researchers who are highly relevant and thus should be ranked at the top, and for some other search queries, we have no knowledge about which researcher should be ranked at the top at all. The limited amount of ground truth data makes some of the conventional ranking techniques and evaluation metrics become infeasible, and this is a huge challenge we faced during this project. This project enhances the user's academia searching experience to a large extent, it helps to achieve an academic searching platform which includes researchers, publications and fields of study information, which will be beneficial not only to the university faculties but also to students' research experiences.

Paper under review: 2 papers

Granted Patent

An AI model transferring method, system, terminal and storage medium
一种人工智能模型传输方法、系统、终端以及存储介质
2023, China Patent , CN202111439591.9
[ Link ]
A dynamic resource partitioning method
一种动态资源分区方法
2023, China Patent , CN202011384022.4
[ Link ]
An IoT resource allocation method, system, terminal and storage medium
一种物联网资源分配方法、系统、终端以及存储介质
2023, China Patent , CN202210026373.0
[ Link ]
Network communication classification method, device and storage medium
通讯检测方法、设备及存储介质
2023, China Patent , CN202210973887.7
[ Link ]
A distributed deadlock avoidance algorithm, device, computing mechanism and readable storage medium with lock-free property
lock-free的分布式死锁避免方法及装置、计算机设备及可读存储介质
2022, China Patent , CN202011438337.2
[ Link ]
A Breakage and Impact-aware Intelligent Shared Bike Helmet
一种具有破损与撞击感应功能的共享单车头盔
2022, China Patent , CN202221977872.X
[ Link ]
An optimisation method, system, terminal and storage medium for θ-join of data streams
数据流θ连接优化方法、系统、终端以及存储介质
2022, China Patent , CN202110331197.7
[ Link ]
A resource allocation strategy, mechanism and electrical device for online scenarios
在线场景的资源分配方法、装置及电子设备
2022, China Patent , CN202011428352.9
[ Link ]
An optimisation method, system, terminal and storage medium for data stream joining
一种数据流连接优化方法、系统、终端以及存储介质
2021, China Patent , CN202011435327.3
[ Link ]

Patent under substantive examination: 10+ CN patents, 8 PCT patents

Selected patents under examination

A training method, mechanism, device and storage medium for IoT intrusion detection
物联网入侵检测模型训练方法、装置、设备及存储介质
2022, China Patent, CN202210931364.6
A transferring method, system, terminal and storage medium for AI model under cloud environment
云计算缓存人工智能模型迁移方法、系统、终端以及介质
2021, China Patent, CN202111524148.1
A computation offloading method, system, terminal and storage medium for edge-cloud tasks
一种边云计算任务卸载方法、系统、终端以及存储介质
2021, China Patent, CN202111513744.X
An optimisation method, system, terminal and storage medium for Moth-flame Optimiser
一种飞蛾扑火算法的优化方法、系统、终端以及存储介质
2021, China Patent, CN202111487405.9
A serving method, system, terminal and storage medium for AI model requests
人工智能模型请求响应机制优化方法、系统、终端及介质
2021, China Patent, CN202111487405.9, PCT/CN2021/138017
A retrieval model training method, system, terminal and storage medium
搜索模型训练方法、装置、终端设备及存储介质
2020, China Patent, CN202011403845.7, PCT/CN2020/140016

Award

Outstanding Graduate of Beijing (北京市优秀毕业生), Beijing Education Commission, 2024
Outstanding Graduae of the University of Chinese Academy of Sciences (中国科学院大学优秀毕业生), University of Chinese Academy of Sciences, 2024
National Scholarship (国家奖学金-博士), Ministry of Education, P.R.China, 2023
President Scholarship of the Chinese Academy of Sciences (中国科学院院长奖学金优秀奖，350 out of 62000 students), Chinese Academy of Sciences, 2023
Pacemaker for Outstanding Student (中国科学院大学三好学生标兵), University of Chinese Academy of Sciences, 2023
Outstanding Student (中国科学院大学三好学生), University of Chinese Academy of Sciences, 2022 [ Certificate ]
President Scholarship of Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences (中国科学院深圳先进院院长奖学金优秀奖) , University of Chinese Academy of Sciences, 2022
Graduate with Distinction (all subjects in First Class Honours), University of Melbourne, 2020
Dean's Honours List, University of Melbourne, 2019 [ Certificate ]
Dean's List of Excellence in Academic Performance, University of Sydney, 2018 [ Certificate ]
Dean's List of Excellence in Academic Performance, University of Sydney, 2017 [ Certificate ]

Curriculum Vitae

Please feel free to download my CV
[

Jiashu Wu's CV, PhD @ University of Chinese Academy of Sciences (English Version) ]
[

吴嘉澍简历中国科学院大学博士（中文版） ]

Selected Projects

Research on IoT Intrusion Detection via Domain Adaptation Approach, PhD Research Topic & Internship at Beijing Institute of Technology

Propose to utilise domain adaptation (DA) techniques to tackle IoT intrusion detection. Propose several algorithms targeting scenarios with diverse data scarcity. The project outcomes include 5 CCF/Tsinghua A/B papers and 4 patents.
I propose several domain adaptation algorithms from diverse perspectives including improving self-supervision efficacy, probabilistic correlation semantic alignment, geometric alignment, etc. These algorithms can tackle domain heterogeneities, under transfer and negative transfer issues under diverse data scarcities.
The proposed algorithms are comprehensively verified from several perspectives, including accuracy, ablation studies, efficiency, etc. These algorithms improve the intrusion detection accuracy by 4-17/%.
Key skills involved including research work, Python programming, feature engineering, DL algorithm design, algorithm evaluation and academic paper writing.

PackCache: An Online Cost-driven Data Caching Algorithm in the Cloud

I propose a cost-driven online data caching algorithm, named as PackCache. The online caching scenario is more challenging than the offline scenario. The algorithm serves online data requests either individually, or by a packable manner based on the previous data request patterns. The algorithm also maintains and deletes data caches via the anticipatory caching mechanism.
Quantatively, the PackCache algorithm reduces the data request serving cost by 5-11%. Theoretically, I proved the competitive ratio of the PackCache algorithm, reaching the lower bound of the competitive ratio for any deterministic online algorithm on the studied caching problem.
Key skills include online algorithm design, theoretical analysis, Python programming and academic paper writing. The outcomes of this project include 2 CCF-A papers and 3 patents.

MIX-RS: A Multi-indexing System based on HDFS for Remote Sensing Data Storage

Design a multi-indexing remote sensing data storage system based on HDFS. The system constructs a multi-geo-indexing mechanism based on the data replication characteristic of the HDFS in order to improve the querying efficiency and improve the geospatial indexing for different geographical querying patterns.
The multi-indexing mechanism reduces the indexing and querying time by 60%. Besides, the MIX-RS system is immune to data loss and enjoys excellent scalability.
Key skills include designing data storage system and indexing mechanism and academic paper writing. The outcomes of this project include 1 Tsinghua-B paper.

Internship Experience

Software Engineer @ Melbourne eResearch Group, Mar 2020 - Jul 2020, Melbourne Australia

Develop a meeting speaker diarisation Android APP for UniMelb research purposes. Key roles including Java programming, code management using Git, UI design using material design and the interaction with Google ML Speech API.

Student Intern @ Lab of Intelligent and Network Computing, Beijing Institute of Technology, Nov 2019 - Feb 2020, Beijing China

Tackle heterogeneous domain adaptation problem by proposing a novel simultaneous semantic alignment network, which involves techniques such as knowledge distillation, semantic alignment and prototypical network. Key roles including algorithm design, Python (PyTorch) programming and academic paper writing. The paper has been published in ACM MM'2020.

Research and Development Engineer @ University of Melbourne, Jul 2019 - Nov 2019, Melbourne Australia

Develop an academia searching platform to tackle challenges such as limited amount of ground truth ranking data, implicit query searching, etc. Key roles including Python programming, NLP data (20 million+ entries) cleansing and processing and recommender system algorithm design.

Dalyell Scholar Program @ University of Sydney, Mar 2018 - Jul 2018, Sydney Australia

Conduct research on Parkinson disease detection via drawing patterns and develop an Android APP that collects user's drawing trace data under several different drawing patterns. Key roles including Java programming and database management.

Teaching Assistant

Cloud Computing and Big Data, TA, Graduate Course, Fall Semester 2021, University of Chinese Academy of Sciences
Data Structure, Mentor, Undergraduate Course, Semester 2 2017, University of Sydney [ Certificate ]
Java Programming, Mentor, Undergraduate Course, Semester 1 2017, University of Sydney [ Certificate ]
Website Design, Mentor, Undergraduate Course, Semester 1 2017, University of Sydney [ Certificate ]

Assortment

Android

I developed several Android apps during my time in Australia, including a meeting participation analysis app MeetingTracker developed for Melbourne eResearch Group and a drawing pattern collector to facilitate Parkinson diagnosis named as NeuroGraph. These apps were made available to several app markets including Google Play, Huawei App Store, Samsung Galaxy Store, etc.

Voyage 🌏
I enjoy travelling around the world, and have been to many countries in 5 continents:

Asia: China (Beijing (my hometown), Tianjin, Shanghai, Hebei, Henan, Shandong, Inner Mongolia, Shaanxi, Liaoning, Jilin, Zhejiang, Jiangsu, Fujian, Jiangxi, Guangdong, Guangxi, Yunnan, Guizhou, Sichuan, Hainan, Hong Kong, Macau, Taiwan); Japan (Osaka, Kyoto, Aichi, Toyama, Gifu, Ishikawa); Indonesia ; Vietnam ; Qatar ; Singapore
North America: United States (CA, CO, DE, DC, ID, IL, MD, MT, NV, NJ, NY, PA, UT, WY); Canada
Europe: Netherlands ; Spain ; Germany ; Portugal ; France ; Switzerland ; Liechtenstein ; Austria ; Italy ; Belgium
Oceania: Australia (QLD, NSW, VIC, TAS, WA, NT); New Zealand
Africa: Tunisia