Dr Artem Lenskiy

FELLOW (LEVEL C)
College of Engineering & Computer Science
T: +6140 555 5553

Areas of expertise

  • Pattern Recognition And Data Mining 080109
  • Signal Processing 090609
  • Computer Vision 080104
  • Image Processing 080106
  • Data Encryption 080402
  • Programming Languages 080308
  • Dynamical Systems In Applications 010204
  • Stochastic Analysis And Modelling 010406
  • Neurosciences 1109

Biography

Dr Artem Lenskiy received his PhD in Electrical Engineering from the University of Ulsan, South Korea in 2010. His BS in Computer Engineering and MSc in Signal Processing are both from the Novosibirsk State Technical University, Novosibirsk. 

He is an experienced educator and investigator whose expertise consists of a unique combination of research fields, including machine learning, medical data/signal analysis, and various branches of applied mathematics. In 2011, he pursued an internship at RIKEN Brain Science Institute in Japan where he worked on EEG signal analysis. In 2011–2019, he worked as an Assistant Professor in the Korea University of Technology and Education (Koreatech) at the Department of Information and Communication Engineering, South Korea. He contributed in such fields as physiological signal processing, computer vision, recommender systems, machine learning and homomorphic encryption.

Artem joined the ANU College of Engineering and Computer Science as a Research Fellow in 2019. He is a senior research scientist in the big data program of the ANU OHIOH Grand Challenge, created in partnership with Australia Capital Territory Health. As part of this program, Dr. Lenskiy analyses data of patients living with multiple sclerosis and diabetes. Artem has 4 book chapters, and 21 peer-reviewed papers published in international journals or conferences and his research funding totals 200,000,000KRW and 65000AUD. Artem has also been involved in a number of contracts with Australian Government Department of Industry, Science, Energy and Resources, and Australian Government Department of Health.

Dr Lenskiy has designed and taught over 15 university courses and has supervised 11 graduate students and is currently supervising 6 graduate students and 7 undergraduates.

 

Teaching experience 

  1. COMP1100: Programming as Problems Solving (2020)
  2. Probability theory and random processes (2011~2018)
  3. Introduction to programming (2011~2018)
  4. C programming (2011~2018)
  5. Data communcation (2012~2016)
  6. Computer networks (2012~2017)
  7. Computer architecture (2011)
  8. Digital systems (2011)
  9. Linear models for time-series analysis (2017~2018)
  10. Matrix theory and applications (2017~2018)
  11. Differential equations (2018)
  12. Real analysis (2019)
  13. Introduction to the mathematical foundations of machine learning (2019)

 

 

Researcher's projects

Selected publications

 

Available student projects

If you are interested in one of the listed below projects, give this assessment test a try and email me your solutions. I don't expect perfect solutions but rather looking to assess your analytical and critcial thinking skills by the effort you have put. This will be the primary selection criterion. Once a student is accepted, the project will move from "Available projects" to "Current student projects". 

 


(1) End-to-end paper recommender system with pretrained language models and neural nearest neighbors (Theme: Intellligence, Area: Machine learning):  

The project aims at employing Bidirectional Encoder Representations from Transformers (BERT) [1] or any other type of pre-trained language model to obtain a vector embedding of a paper summary. These vector embeddings are then used to search for a relevant paper by its summary in a database. The search is based on the neural nearest neighbour algorithm [2], this in turn allows for further fine-tuning in order to find the most relevant products. The model is trained in an unsupervised way on cited papers, the assumption is made that the cited papers are relevant to the paper that cites them. It is also important to keep in mind the curse of dimensionality that states that "it is not possible to quickly reject candidates by using the difference in one coordinate as a lower bound for a distance based on all the dimensions". As part of this project a comparsion between the neural nearest neighbors search and a classic one should be performed.

[1] J. Devlin, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

[2] Xiang Wu et al, Multiscale Quantization for Fast Similarity Search

 

(2) Quntifying privacy leaks via recommender systems (Theme: Intellligence, Area: Machine learning): reconstructing users’ demographic profiles and other user information by analysing users’ preferences (e.g. items purchased, music listened, movies watched, etc) with two goals in mind: (1) looking for privacy leaks and possible crosslinks between databases, (2) improving recommendation by taking into account reconstructed demographics.

 

(3) Building on Neural ODEs for modelling cardiovascular system: ODEs are regarded as an effective way to describe dynamic processes in nature or man-made systems. A classic example is Hodgkin–Huxley Model of neural excitation that models active ionic mechanisms in excitable tissues. ODEs in applications to physiological processes have a long history, although they are usually quite simplistic especially when it comes to modelling physiological systems and in particular cardiovascular system. However, a high complexity of the cardiovascular system makes it hard for ODEs to model the system especially given many unobservable parameters. In fact, most models simplify the cardiovascular structure to varying degrees, for example, according to previous study, T.G. Myers et al. designed a four compartments model of the cardiovascular system based on Ottesen’s approach that relies on the ODEs [1].  In order to overcome the limitation of unobservability and complexity of cardiovascular system, neural ODEs are proposed as a replacement for standard ODEs. According to Tian Qi Chen et al. [2], neural ODEs are better suitable for grasping the dynamics of complex systems. The core step of the proposed project is to improve the EKG accuracy in modelling healthy EKGs and EKGs with heart diseases. 

[1]. T. G. Myers, Vicent Ribas Ripoll, Anna Sáez de Tejada Cuenca, Sarah L. Mitchell and Mark J. McGuinness.

Modelling the cardiovascular system for assessing the blood pressure curve. Mathematics-in-Industry Case Studies,2017.

[2]. Tian Qi Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. Neural ordinary differential equations. 

 

(4) Feature space representation using oscillatory functions (Theme: Theory, Area: Machine learning): Solutions of partial differential equation that describe many physical phenomena such as heat dissipation, a vibrating drum are oscillatory, even an approximated solution of the Schrödinger equation is oscillatory. String theory represents particles as vibrating strings that twist and turn, and their frequencies define their physical properties. Could this oscillatory nature of our universe and in particular some oscillatory functions (e.g. harmonic functions that emerge as a solution of the heat equation, or Hermite, Chebyshev, Legendre polynomials that are solutions of the same named equations) be used as a basis of the feature space to compactly represent the feature space? Expanding the feature space into a superposition of these polynomials is possible for lower dimensional spaces [1], however quickly becomes infeasible because of the curse of dimensionality [2]. Consider MNIST digits image dataset, every image in the dataset is represented by a 784-dimensional vector (28x28 greyscale images), the data is extremely sparse, and vectors reside closely to each other. This space can be adequately represented by as little as 2 dimensions. Usually, such compact feature space representations are discovered by finding a low-dimensional manifold often with a corresponding probability function using some form of an autoencoder (e.g. variational autoencoder). The goal of this project is to compactly estimate a low-dimensional representation probability density function using Hermite, Chebyshev, Legendre polynomials or using any other basis. The authors of this paper [2] explain good performance of neural networks by symmetry, locality, compositionality, and polynomial log-probability of our universe. This could be a good starting point for connecting oscillatory nature of the universe with the properties mentioned in the paper. A motivating example of a lower dimensional feature space representation being used in solving complex tasks is discussed in [3].

 

[1] K. Pawelec, A. Lenskiy, A Moment-Based Classification Algorithm and Its Comparison to the K-Nearest Neighbours for 2-Dimensional Data

[2] H. W. Lin, M. Tegmark, D. Rolnick , Why does deep and cheap learning work so well?

[3] D. Ha, J. Schmidhuber, World Models

 

(5) Control theory (Theme: Theory, Area: Machine learning): Controlling the spread of COVID-19 is a hot problem. In order to answer such control questions as (1) what percentage of people should be let out of the isolation and (2) when and for how long, a model of the spread has to be constructed, followed by building a controller. The epidemic model should rely on real-time transport data as well as consider multiple compartments (e.g. a very minimal SEIR model consists of such compartments as susceptible, exposed, infected, and recovered). Every geographical community is modelled separately and then aggregated by taking transportation flows into account. Having built the model, a controller will be designed to answer aforementioned questions. 

 

(6) Computational physiology and machine learning (Theme: Intellligence, Area: Machine learning): Developing a phone app that is capable of estimating physiological rhythms could be beneficial in a number of medical projects starting from diagnosing users with diabetes and to assessing general wellbeing. However, the goal of this project is more entertaining rather than practical. The goal is to develop an app to match people (similar to Tinder) in accordance to their nervous system activity as well as other estimated parameters. Specifically, this activity will encapsulate users’ biological age (different from chronological), physiological stress, height, weight etc. Note that these parameters are estimated by the algorithms and not provided by the user. 

 

(7) Combinatorics and number theory (Theme: Theory, Area: Algorithms): Counting integer partitions is a long (almost) standing problem in number theory. There is a fascinating movie devoted to a famous Indian mathematical genius https://en.wikipedia.org/wiki/The_Man_Who_Knew_Infinity_(film) who worked on the problem of counting integer partitions. This video quite well introduces this problem  https://www.youtube.com/watch?v=NjCIq58rZ8I. Recently a solution to this problem was introduced by Ken Ono, however the formula is cumbersome to use. This project aims at tackling this problem from a geometric perspective, with the hope of obtaining either a closed form formula or a “good” approximation that’s easy to compute.

 

(8) Risk management (Theme: Theory, Area: Optimization and Planning): Assessing mutual dependence across financial assets is a crucial problem in risk management. According to modern portfolio theory, in order to minimise the probability of major portfolio drawdowns, assets with little correlation are included into the portfolio.  This theory scored Harry Markowitz a Nobel Prize in economics. His model is solely based on assets' correlation. By definition, correlation is a measure of linear relationship and as such is incapable of identifying nonlinear dependence. The idea of this project is to extend Markowitz’s theory by proposing asset selection method that accounts for possible nonlinear relationships between assets. One interesting and unorthodox approach of measuring nonlinear dependence is described here https://arxiv.org/pdf/1610.09519.pdf

 

 

(9) Quantitative finance (Theme: Information, Area: Signal processing): The goal of this project is to predict company's fundamental parameters (market cap, price-to-earnings ratio, etc, see https://finance.yahoo.com/quote/AMD/key-statistics?p=AMD&.tsrc=fin-srch) from its stock price dynamics. A number of approaches could potentially be applied: the simplest one is to train a machine learning model (i.e. CNN), while a more sophisticated approach is to construct a feature space, with meaningful and interpretable features. Presumably, companies (represented by vectors in the space) with similar fundamental characteristics will reside closely and so even simple machine learning models could grasp such representation and predict fundamental parameters.  The project is based on the hypothesis (among others) that big companies' stock price dynamics is more efficient than that of the small ones. Market efficiency postulates that (1) the distribution of returns tends to be Gaussian, that is associated with market liquidity (higher number of investors results in Gaussianity of returns by the central limit theorem) and (2) price dynamics does not exhibit long range dependency, as all information known to one actor is known to everyone and hence instantaneously accounted for the market. While on the other hand, for smaller companies, the market is inefficient. These inefficiencies could be measured and used in deducing fundamental characteristics. The figure below demonstrates that features extracted from the stock price dynamics contain sufficient information in order to separate the companies into a small/large market caps (there are about 50 companies, red circles are erroneously classified companies by a simple linear discriminant analysis).

(10) Neuroscience and data analysis (Theme: Information, Area: Computer vision): The project aims at diagnosing neurological disorders (Multiple Sclerosis, Alzheimer, etc) by analysing videos recorded by a webcam [1]. A patient is given a cognitive test (IQ test, or Paced Auditory Serial Addition Test (PASAT)), while the test is being taken, a video of the patient is recorded. Using methods of computer vision and signal processing such physiological processes as (1) heart rate variability, (2) body movement (e.g. micro-movements or tremor), (3) eye blinks and possibly (4) eye movements are extracted. This information is then employed in training a model to classify a patient into a healthy or neurological disorder categories.

[1] B. Yan, Resting and Postexercise Heart Rate Detection From Fingertip and Facial Photoplethysmography Using a Smartphone Camera: A Validation Study

 

(11) Predicting deterministic chaos (Theme: Information, Area: Signal processing): It is known that N-body gravitational systems uniformly spread orbits over a plane.  In this project, the modelling of the N-body problem is performed on a sphere rather than on a plane (or in general in space with zero curvature). The system is highly sensitive to initial conditions and hence exhibits chaotic behaviour. It has been recently shown that it is possible to identify governing equations from data generated by nonlinear dynamical systems [1]. Another approach to learn mathematical expressions from data is based on grammar variational autoencoder [2]. A more recent approach employs graph neural networks to discover analytic expressions of physical phenomenon [3].  Echo state networks are well suited class of models known to model chaotic systems especially those that exhibit strange attractors [4].  The goal of this project is to apply some of these models to crack the famous N-body problem i.e. to identify the N-body gravitational system and generate trajectories of future bodies' positions on the sphere.

 

 

[1] S. Bruntona, Discovering governing equations from data by sparse identification of nonlinear dynamical systems

[2] M. Kusner, Grammar Variational Autoencoder

[3] M. Cranmer, Discovering Symbolic Models from Deep Learning with Inductive Biases

[4] J. Pathak, Using Machine Learning to Replicate Chaotic Attractors and Calculate Lyapunov Exponents from Data


Current student projects

(a) Homomorphically Encrypted Time Series Analysis: The aim of this project is to develop a framework for analysing homomorphically encrypted time series (HETSA). Namely, we will implement a set of building blocks for time series analysis and showcase them on the following applications: First, to demonstrate the capabilities of real-time processing, we will implement a set of trading strategies that make trading decisions on encrypted market data. Second, to demonstrate the developed signal analysis tools, we will analyse encrypted heart rate variability of patients with diabetes that consequently will support clinical judgement and decision-making. Third, to show the versatility, we will implement a set of general time-series analysis tools in homomorphic encryption context. Furthermore, we will develop a bootstrapping technique that provides the means for real time data processing. We believe that focusing on these areas, where issues surrounding privacy play a crucial role, will not only showcase the potential of the proposed framework, but also ignite further research and development. See Github for project description. Preliminary results were accepted by SECRYPT and available at paper.

 

(b) Fractal dimension-based feature descriptors: Despite of tremendous popularity of SIFT and SURF image descriptors, they lack theoretical background in proving the invariance of the descriptors under various image or scene transformations. Some example of scene transformations are alternations in a viewing angle, changes in scaling (zoom-in/out) and variations in illumination. Fractal geometry and in general functional analysis define a space filling measure coined fractal dimension - a generalization of the Euclidean dimension. It has been proved that under bi-Lipschitz mappings (constitute a wide class of continues and non-linear mappings) a transformed object/set does not change its fractal dimension. Hence, this property provides a foundation for developing image feature descriptors that are robust under projective and as stated above nonlinear transformations. The goal of this project is to investigate the applicability of fractal dimensions as an image descriptor, test its properties and apply it to object recognition task. 

 

(c) Exploring novel methods of financial data visualization: The famous quote “data is the new oil” represents well the truth about the data, but unlike oil, data is not finite and is doubling approximately every two years. This in turn results in larger practical datasets that makes the analysis hard, and not only due to the requirement of having huge computational resources but also due to a simple lack of intuition and understanding of the internal structure of large arrays of data. An important aspect of data analysis is data visualization. Visualization provides new perspective on data by either providing a summary in a comprehendible way or by representing the data in such a manner that it highlights latent relationship between the variables. Financial sector generates enormous amount of data on a daily basis, and hence data visualization is crucial in financial decision making. Some examples of data in financial technical analysis are orderbooks, price quotes, last trades, traded volumes whereas fundamental analysis operates with companies' market capitalization, book-to-market, and earnings-to-price. In this project, novel methods of data visualization for technical and fundamental analysis are developed. Machine learning algorithms such as topological data analysis, self-organizing maps, canonical correlation analysis and t-Distributed stochastic neighbor embedding are considered for data representation and further visualisation. The developed novel methods of financial data visualisation will aid financial decision making. 

 

(d) Automated Topic Segmentation: Throughout the last 30 years, there have been a number of different methods proposed for automated topic segmentation. These methods roughly fall into one of three categories - lexical cohesion methods, topic model methods, neural network/- supervised methods. The goal of this project is to merge these methods and potentially propose a novel approach for automated topic segmentation of transcribed spontaneous interviews for enhanced linguistic analysis.

 

(e) Geriatic patients data analysisGeriatic patients suffering from hip fractures are at a high risk of mortality. While statistical methods have identified links between death and demo-graphic features such as age, several major gaps remain. Firstly, there exists few systematic, scalable ways to identify high-risk patients upon hospitaladmission. Secondly, there is limited understanding of what comorbidities contributeto the risk of death in the first place. This project supplies the missing link by using probabilistic machine learning methods. To pinpoint vulnerable patients, the projects aims to reconstruct the probability density function, predicting mortality as well as other adverse medical outcomes. Additionally, the projects attempts to find causal relationships through uncovering conditional distributions. This will elucidate which feature combinations contribute the most to death.

(f) Counterfactual explanations with causal relationshipsThe project aims to develop algorithms for drawing reasonable counterfactual explanations for machine learning models. The counterfactual answers causal questions of the form ‘If X has not occurred, what would Y be?’ This requires a hypothetical reality that contradicts the observed data (for example, a world without smoking). This can indeed help us understand how the model works and has wide applications such as medical and finance. For instance, when we use machine learning system to evaluate whether the bank should lend the money to borrowers, the borrowers who fail to borrow may want to know under what situations, they can be allowed to borrow money and then they can try to satisfy such requirements. Some factors may have causal relationships and recommendation in changing these values might result in unacceptable effects on other variables. Hence, methods of causal inference and constraint optimisation are explored to overcome aforementioned issues.

Past student projects

  • ”Effects of leave of absence on the academic performance of undergraduate students in Korea”, 2018.
  • ”Echo state networks”, 2017.
  • ”Characterizing the mental workload of subjects using blink rate variability dynamics”, 2017. 
  • ”Probabilistic approach to item genre prediction by taking into account users’ perception", 2017.
  • ”Moment based classification in comparison to kNN”, 2016. 
  • ”Educational mobile robot platform”, 2015.
  • ”Perception system for autonomous vehicles”, 2014. 

Projects and Grants

Grants information is drawn from ARIES. To add or update Projects or Grants information please contact your College Research Office.

Return to top

Updated:  29 November 2021 / Responsible Officer:  Director (Research Services Division) / Page Contact:  Researchers