Dr Xiaoyu Sun

ANU College of Engineering, Computing and Cybernetics

Areas of expertise

  • Software And Application Security 460406
  • Software Engineering 4612
  • Automated Software Engineering 461201
  • Programming Languages 461204
  • Software Quality, Processes And Metrics 461207
  • Software Testing, Verification And Validation 461208
  • Empirical Software Engineering 461202

Biography

Xiaoyu is a Lecturer of Software Engineering at Australian National University. Prior to that, she obtained her PhD degree at Monash University under the supervision of Li. Li and John Grundy. Her research field interests mainly lie in the field of Mobile Software Engineering (i.e., Mobile Security and quality assurance) and Intelligent Software Engineering (SE4AI, AI4SE). In particular, her research focuses on applying static code analysis, dynamic program testing, and natural language processing techniques to strengthen the security and reliability of software systems. Specifically, her current research projects include developing tools for Android defects detection, e.g., compatibility issues, and privacy leaks. Xiaoyu's research has been published in top-tier conferences and journals including ICSE, ASE, TOSEM, ISSRE, MSR, and IST. She has also established extensive collaboration with the industry, including Bytedance and Alibaba.

Researcher's projects

 

  1. Whole-Program Analysis of Android Apps.                                                              

  2. Automated LLM-based Compatibility Issues Detector For Android Applications. 

The Android Application Programming Interface provides the necessary building blocks for app developers to harness the functionalities of the Android devices, including for interacting with services and accessing hardware. This API thus evolves rapidly to meet new requirements for security, performance and advanced features, creating a race for developers to update apps. Unfortunately, given the extent of the API and the lack of automated alerts on important changes, Android apps suffer from API-related compatibility issues. These issues can manifest themselves as runtime crashes creating a poor user experience.  To mitigate this impact, various approaches have been proposed to automatically detect such compatibility issues.

Unfortunately, these approaches have only focused on detecting signature-induced compatibility issues (i.e., a certain API does not exist in certain Android versions), leaving other types of compatibility issues unresolved. One unresolved type is related to semantic changes of APIs, which are non-trivial to pinpoint as it is yet not possible to comprehend automatically all the semantics of code.

The advent of large Language Model(LLM) like GPT4 offers a potential way to address this issue by understanding the semantics of bytecode. In this research, we will first conduct a preliminary study to investigate the capability of LLMs for constructing API implementations. Then we  will use the most appropriate LLM to extract the semantics of code for each Android system API across all API levels (i.e., from API level 1 to 32). Finally, we will propose an automatic tool to detect the compatibility issues based on the evolution of the semantic changes.

 

2.Privacy Risks of AI Models in Android Applications

AI is everywhere. Thousands of new AI apps launch every day. We're embedding AI into our browsers, email inboxes, and document management systems. We're giving it autonomy to act on our behalf as a virtual assistant. We're sharing our personal and company information with it. All this is creating some new cybersecurity risks and amplifying the risk of traditional hacker games:

  • Data breaches and identity theft. 
  • AI-induced vulnerabilities. Generative AI models may pose content with flaws (e.g., code suggestion models with complex algorithms may generate code with vulnerabilities, which makes it difficult for developers to identify. As a result, such models can do harm to the mobile ecosystem).
  • Data leaks that expose confidential info.
  • Malicious use of AI models (misinfo/deepfakes)

 

 


 

Current student projects

  1. The Discrepancy Between Privacy Policy and API usage through Code Evolution. 

Privacy policy serves as a legal contract between the mobile app provider and the user, defining how personal information will be handled, thus playing a key role in maintaining user trust and confidence. In addition, mobile apps are empowered by various third-party libraries (TPLs) to normally function. The TPLs are composed by lower level API usages on device permissions. 

Such device permission usage and corresponding data collection should be clearly displayed in the privacy policy documents. As TPLs update and evolve, some API usages are added while some are removed, but privacy policies may not update accordingly.

We first plan to conduct an empirical study to investigate the status quo of this issue in the Google Play app store. It composes two major parts: a) privacy policy analysis and b) TPLs analysis based on source code.

A follow-up project could be: based on our previous observation, we aim to propose a framework (it can be based on ML methods, DL techniques, or LLMs) to automatically detect such privacy policy mis-claiming issues.





 

Return to top

Updated:  08 July 2024 / Responsible Officer:  Director (Research Services Division) / Page Contact:  Researchers