Our model uses a combination of 3 trained multi class SVM’s using the three kernels. The multiclass classifier works as a one class vs the rest classifier, so for each kernel we are actually training 5 SVMs. That is basically one classifier being benign vs the rest of the apps, another classifier being adware vs the rest of the apps and so on The ensemble model we used was as a max voting model, where it would pick the most common prediction across the 3 kernels as the final prediction. This helps in masking the weaknesses of certain kernels and augments the strengths of others.
Matrix A (apps X APIs) - encodes which APIs exist in a given app. This is similar to a bag of words model.
Matrix B (APIs X APIs) - encodes APIs that occur in the same code block.
Matrix P (APIs X APIs) - encodes APIs that are invoked by the same package. In smali files APIs are in the following convention - package->API.
AA^T: Gives us the number of similar APIs between two apps.
ABA^T: Gives the number of APIs that coexist in the same code block for two apps.
APA^T: Find the number of APIs that share the same package for two apps.
Our dataset consists of 300 benign apps and 300 apps of each malware category. To get a good and varied representation of benign applications, our benign dataset is made up of apps from 6 different categories.
Infests app/webpage with advertisements that can prevent accessing the site/app as originally intended.
Malware that is disguised as safe and legitimate software that runs malicious code on a users computer stealthily.
Malware that gains remote access to a system or device by using vulnerabilities to bypass security measures.
Malware that asks or demands for ransom from a user by denying them access to their computer until demands (usually in the form of payment) have been met.
We’d want to include more kernels into our multi-kernel learning model. Since our model performs best on ransom and adware, and worst on trojans, we definitely want to include a kernel with a metapath that optimizes the usage of the features of trojan applications to improve the classification of trojans.
Another possible future direction for our project is using our ranking algorithm to screen new apps before they get published on the Google playstore. That way, apps that contain too much of certain types of APIs would need a more rigorous background check.