In this paper, we design two discrimination models to classify network traffic. The first is a model to classify the network traffic into encrypted and not-encrypted traffic, and the second is used to identify the network traffic in the designated applications. Both models have two distinct significant components, including feature extraction and classifier. The deep autoencoder is designed for classification with high accuracy in facing new network flows. The proposed deep autoencoder model will be proven with a minimum MSE measure. The proposed autoencoder will be frozen, and the learned model will be transferred to examine four distinct classifiers. The classifiers containing Logistic Regression (LR), Random Forest (RF), Decision Tree (DT), and Support Vector Machine (SVM) are added to the deep autoencoder to classify the network traffic for both applications (encrypted and non-encrypted network traffic and application identifications). The training and testing dataset will be introduced in the following subsections, and then preprocessing will be discussed. The proposed deep autoencoder with different classifiers will be introduced in the following subsections. The top view of the proposed model is shown in Fig.2.
3-1. Dataset
This research uses the "ISCX VPN-nonVPN" network traffic dataset for training and testing. The Canadian Cyber Security Institute has collected this dataset and published it in the university of New Brunswick (UNB) [20]. The total amount of this dataset is 28 GB. This dataset has captured and collected the regular session and a session over VPN. This dataset has 14 total traffic categories: VOIP, VPN-VOIP, P2P, VPN-P2P, etc. The applications are divided into seven classes for each VPN and regular network traffic. 70 percent of the samples in this dataset are used for training, and 30 percent are for the testing phase.
The application distribution is presented in Table 2. It shows the dataset has 30092 regular traffic samples and 29613 VPN traffic samples.
Table 2
Applications distribution in ISCX-VPN-NONVPN
VPN
|
Non-VPN
|
Applications
|
#No of Samples
|
Applications
|
#No of samples
|
Browsing
|
9999
|
Browsing
|
10000
|
Chat
|
2839
|
Chat
|
2505
|
ftp
|
4704
|
ftp
|
3975
|
E-mail
|
2444
|
E-mail
|
1364
|
P2P
|
3415
|
P2P
|
4000
|
Streaming
|
1115
|
Streaming
|
1284
|
VoIP
|
5576
|
VoIP
|
6485
|
Total VPN Samples
|
30092
|
Total Non-VPN Samples
|
29613
|
The application distribution with VPN and Non-VPN clustering is shown in Fig.3. This distribution states that the dataset is imbalanced. More samples are more likely to be predicted; therefore, it should be solved to make the likelihood the same to be classified for each class.
The model tries to use this labelled (70 percent of the dataset) to extract patterns to classify the network traffic into VPN and Non-VPN, in addition to application identification. The rest of the dataset is used for testing because the extracted model has not seen these samples before; therefore, it can give us the performance of the proposed model at facing new network traffic.
3-2. Preprocessing
The preprocessing phase is significant in designing a classification model. To summarize the steps, the taken actions are listed here:
- Data quality assessment ensures the features' quality and sample data.
- Data transformation is done here because we used the CICflowmeter tool to convert Pcap files to CSV format [28]. The main data collected is based on PCAP format, which should be converted to CSV format.
- The labels which are not numerical should be converted to the numerical class. The labels are converted to one-hot-encoding to prepare the dataset for training and testing.
- Several unnecessary features should be omitted. For example, Timestamps, source and destination IPs, and flow ID are removed from the training and testing part of the dataset.
- The weighted class method is used to make the imbalanced dataset balanced. The weight of samples belonging to its class is calculated as below:
![](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAqQAAABdCAYAAABzRHRRAAAbbklEQVR4Ae2djdHjNBdGtwVqoAV6oIStgRbogA7ogAqogAZogA7oYb85MGe/y13JdhIndvw+mvHYsaX7cyRbj5W8u5++pIRACIRACIRACIRACITAgQQ+Heg7rkMgBEIgBEIgBEIgBELgSwRpBkEIhEAIhEAIhEAIhMChBCJID8Uf5yEQAiEQAiEQAiEQAhGkGQMhEAIhEAIhEAIhEAKHEoggPRR/nIdACIRACIRACIRACESQZgyEQAiEQAiEQAiEQAgcSiCC9FD8cR4CIRACIRACIRACIRBBmjEQAiEQAiEQAiEQAiFwKIEI0kPxx3kIhEAIhEAIhEAIhEAEacZACIRACIRACIRACITAoQQiSA/FH+chEAIhEAIhEAIhEAIRpBkDIRACIRACIRACIRAChxKIID0Uf5yHQAiEQAiEQAiEQAhEkGYMhEAIhEAIhEAIhEAIHEoggvRQ/HEeAtcn8Pvvv3/58ccfv24///zz9ZNOhiEQAiEQAjcRiCC9CVcqvxuBKoQ4Rhz18ueff34VS9bn3KPljz/++IL40iZ7Pv/2229f/v7770fNv037v/7665+cP3369IXtp59+epvYE2gIhEAIhMBrCESQvoZzvBxEAAH63Xff/SOEEEOfP3/+JhLE4a+//vq1zi+//PKQYMQe4hN/+FaQfv/99199jOL4JrCLnVCQIshTQiAEQiAEQqASiCCtNHJ8SQIIwipKWbEbFQQTovHRohhlJbCuhFahigD+SAXmCtIZ/4/EI7mGQAiEQAj8l0AE6X955NMFCSCE+KpcUcoKaC98RW+9fu2Wz9rBVhWj2mB1kGt8nf+RinnTBykhEAIhEAIh0AlEkHYi+XwpAgpEViRZsUQMjlZBFUyPfp2M2MUH26ggUtfEKD8zIFa/6id2VhWxPRLTXLN+v05b7fTc+MxPB7hOu5GAJgfioR5b/W0tfv2NLP7rtZ677Lf8VIE4iNvYiI/PszKqjz+5zdrlfAiEQAiEwHkIjGfN88SXSELgIQKIElckEYKKxS4KEVZce/TrZIUttpYE2igpfCO+aEs8xIgY/OGHH77GjdCqBR+u/Job59hqO68RHwJOP55n38UsfhSS1lNQ1jy9RhzYHhVjGfmo9bmOHerjo+YPk17Mn5cM6sOMvUyWhGy3lc8hEAIhEALHEYggPY59PL+AAAIKwWTxD4u6sEMAIWIeLQgyxRC+EExbSm3X21QhjdiyKMbIkTr6RcRxTE4csykaEXycV8BxTSaI1FpghB180g4btuNY0ayY59xIAJKb/olzVhS/il7r4V/fnnNv7P1Fwpw7S9tlHwIhEAIhcC4C/5+pzxVXogmBXQggWBBgliqe6moegqcLIdvculdAKcLwWX2N7LliOVpBrIK02uG41idPfCIiu+DWvkK12lkSggo6Y6A9PqowJh/Pd79csy3tZsWVbOzU2KgvT/qyFxl3QUq9JfHb7eRzCIRACITAsQTmM8SxccV7CDxMAJGCYEEQWjxXRZWCabS6Z7tb99h09Q5fHCvuui0FF2JsVBRrVVj3eog4xdmonrGMBJ9idSl/Y6jcagzaGIl6V1dHcWGD2BW0XehyTaFdxbe+bUedkSi1XvYhEAIhEALnJhBBeu7+SXQPEPBrW/a1KHAQURQF094raogpbSPkRmIQ/4g4rlfhXONdu05dRTV2eh5VrHbBR1tF3Uww1xhlVuPjWKYj0ahYneVXxS6xWug37cKgXqt1yFm++B/Vs372IRACIRAC5yQQQXrOfklUOxBAACFU+sqZK5JeUzDt4HJoovobrUIqqEZisYrJLqyrM4UvAq4XxSrCs5ela7WuK6yzGMxhdF3BO8oPH/KnHseKUD4jRLvArnFxjJDWBnEsrUb3tvkcAiEQAiFwDgIRpOfoh0TxBAIIG8RJL4g8RRICUSHU6+35WZHVVxARU4q5kfBSaFJnaeVPQTZahdQGdXpRtI9++2nd+jOHUQyIUOKDY79e8+svBtq3L4gBBmzdjnWX9ghebY36faltroVACIRACBxLIIL0WP7x/kQCiCRW2EbFP+Rx5W8k5EbtZucQtks2FIxdkCK+ZoJUocf1uvI5EmvaGK1QznyTi0K5rl52+1zD/kzk+ZOCnhv2zQGhaOn2jX3UnjZwrfFpZ7TXHzZTQiAEQiAE3ofA7k9tV1ycZOp+NuGAi0nK1Y3RSo5Il+xvnbS0lf11CShMZmOurtwxRkdC7hY6CDvGbxdb2KgrjPitpV6rglYRqNjznqA9fupqahW1I//eg7WNMfRrCPUu4hXvxmBb9vom/5Fv+ONDQT2KX8Hc/cLG/Pu9zQvA6OcPxGBONc4ch0AIhMBHIMAzkGd2f2aOcuf5zfN3Nk/ahuc2Nmffclnv0f3ugpSAmCicFNyPJsMavBMX9UcTn3WB3UVpn6Ctm/3HJaCIWrrRXB1kzD16oznOEVDVFkJXPyMBRQ+5SosN6vqZB0q9lxRu5FaL946ir15TmGN7JBiNG59s3EtdNBsP+2oD29TnfG9jDFviV3wTC7bIU2acGz1YqYfvyhqf9nsV98aSfQiEQAhcmQDPZJ6Na3pLIerzf2melBfPeGyPnsfWeXT/FEFKUEwWJtsn0B40k1ytvyRIbatt9nRCSghIwFU1x8hMnCiWuMkeKYgifNYxrG/2iKulMcqNXtsSj/WxXa+NHhwzoUpOvryNxCrXFXDG2YUl/s2FuIgFfxxzHvtVpHaO/d6e9YWiWl/s8dPj0b45W4/PxMY2YmS77EMgBELgigR47vFcXnoeI0SZq+rzk2fo1mcm8wE+1jTdvXyfJkhrwmvJ1hUSJ5ilhABOPTbgpoRAJcBNVzduolFhHFFvJnpGbdbOaVP/Sw+Hbss2/bw2Z7ZsN8qT3NZypM6MgfcmDyEKPvTX45x9Nv5RfLWN9bA/y7XW59j81nLs7fI5BEIgBK5CAI3Fy/jsOW6e9RnsYsUtghQ7+MDXM0TpKQSpqy2KTMTsUnGSpH4FvNQm10IgBG4n4ArqMx4+t0eTFiEQAiEQApWAemht4a+24Zhv4dRct7b128Vb2/UY+uenCdL6telS0MJ04gPQmiBVwC7Z7YnmcwjcSoA3Qcbi2nar3Xeq773GAyglBEIgBELgPARYkPOnSlu/VTJ6vlW6V5Biw7lhbVVWf1v2TxOkiMUtyZoUYK2/JEgVsHTCrR2wBUjqhIAEProg5f7ynvQ3rbLJPgRCIARC4FgCLvzNfpu/FN2jgtRV0tnfJyz5nl07VJAqLv060MlvSZAqYLM6OuvSnA+BfQjUr3S4N+956O0TSayEQAiEQAhUAlVQ3rNKWdvfo6fqIiJabo9yqCBFeDLRAYayJkgVsIjSrI7u0f2xEQJzAjxwuDfd7nnoza3nSgiEQAiEwL0EXB1FN91THhWk+PSf59trlfS+TDZkv/aVvTDqauiaIHV1dC81vpaG8ey5X/OZ6yEQAiEQAiEQAiEwI1B/TlU11Kz+6LwaDH1zzwopNuvf/uyxYHGYIO2roySn8BsBrqujI7jPOGc8e+6fEWdshkAIhEAIhEAIfAwC6iG0yb0/pdpDkK4tPN7aG08TpP7gFWBdfQuiC0+FXz9PUq6O5o8rbu3i1A+BEAiBEAiBELgKgboy2fXV1hzVYSONttVG/TuDkW7basd6TxOkS8kSOBCoU8tMkPo2sEfC1V+OQyAEQiAEQiAEQuCdCPjbzZGO2prHkka7xwaxPFoetzCJYJasf5k1+hHsTJC6OtoF7MT1hzots+z//Z+7wiEcMgaWx8CHekAm2RC4IIH6jLtXF8002i24qg1iQt89Ul4uSF1qHv1hkpDrSuitq6MsIdO+2ngE0Nnbyiz75Uk4fMInY+DfMXD2Z1riC4EQWCZQn2VHClKi3CMWs32pIHV1lBXPUTGxKiZdHd2qvP3/WUcrsCOfORcCIRACIRACIRAC70JArcQ+gnRDr9WlXH90u7Q6ikkhK0j9wyjabS20xc4tbbbaTr0QCIEQCIEQCIEQOJKAWimCdGMvdEG6tjqKWSEjKvl3tvjvQTm3dXUUG/hlu6XNLCXj2XM/85XzIRACIRACIRACIbBGoGoS9M49pWu0e2zUfw/1Vq028ve0r+xxJjRWSNn4PPrtqIFZH0Fq/Xv/jS1tPrI3nj33j8STtiEQAiEQAiEQAh+bQP0r+yVNtURpD0FabaCTHi2PW1iIQCGHqGS1k23pv/y0PrC31Nc1/0MAItYtX9dLJvsQCIEQCIEQCIErEfDnj2gmfxJ5a35VTO5hA/31aHmJIFVoriVtPfdr9U0eketf49M2glQy2YdACIRACIRACFyJwB56Zw9B6t/5oLv2+Db7ZYJ0bXWUwaIQZb+lfh9gtgdSSgiEQAiEQAiEQAhcjUD97ea9K5P1f1m6dxHPf9UI7XXq/8ueAaBAZL9ltbPWv/V3EbWD9gBztQGcfEIgBEIgBEIgBK5BoH5tf2tGaKT6O1QWAO/5b9kRw+i22T/leWtcL1kh3braqSC9JznVPr5S/ktg6Xe7/62ZTyEQAu9KgPs89/q79l7iDoHbCNSv3DneUmob9Vbfb1k8xFddBLx1AXEW61MF6czpM867dPz58+dnmH9Lm/zTV75FbR2wb5logg6BJxPg4cs95HbGb2F8KeeejzB98oCI+RA4AQH0DoJyj99v3pqOv2O9ZwFx5usygtTl5/x+9N+uZrCwWsyAHf2brH2CnQlW2joJu58Nppx/DgG5ux+JjVF/jurdE2G3vZfde2I5qg33k/9rHBPAvb/benb83K/Exr3PeEkJgRC4LgGexdzrbK9+Lvt1/Z7PmcsIUpedz7hy8erbwTeXpR8qw6lOsPAb/YZEYSvfPd+GXs3lXf35smUfjN6Ge3/ygHr0XuBB40NH3+4ZBx/x5c/8t36tddSY85uRR8fAUfHHbwiEwDYCfjPyymcScwPPwr19XkKQCgdAlle/Lej36D0rJIiRLSs4MHKCZT8TsA54hFHKMQSqKJ29FNT+fHT8+1LDWOKhwz3GhgjlHONlJIyPofM6r94vsDhzof8ZJ/TVo2PhzHkmthAIgS9f/9nLVzyXfLbM9MIj/fF/BfeIlYPbMkkyUSiYWBXgQfyKzjk49W/cu6K1JXfEK9xsw/Fo8lKQfsQVsW8AH3QCcVH7abSaTd/Rh4/+jho7is7ROPL32qMYDsLzErewgC/bOxTv24/44vAO/ZMYQ2BPAn6b+exvRZiHniFGYfEeT9aVXlOQMlEAi8lUcbrS9FKXFZhbc3fCcg+/kehkhYxrI3FyKYAvTIaHxtYHh0KTfnCldCQ6FUyPfo3i6uhMeBE39xnjbalwnZjquCGXWTvrU6eXbqdeJx6uj9rVetqv5zym/Vp/eB9svb+wbWzY31Jq/RmnLXasw4sM/bjGxvrZh0AIvC8BnjPc81ufN7dkyjOEZ99II9xiZ6nuJQQpD25XdHj4ftQVAVeutg4Y6zPQ/Gu90WTrytzSQMq12wjAlG1L8YWBfX356iJDwfTow0g7M0G6FjP+Fc7kyDH3J2/V7Ed5OxbxSR1zI+d6b/OwVaj58KWN7Ua51/FNveqfXKt9YtV3z5N2tF97vtBeu8QrC/zMRG+tjx+FJG14Qbi3yHXrM+FeP2kXAiFwDgI+f3h27lV4bvEs8dm7l91u5xKClKToBCajZwPrAM/02YlvNun1WKnPxEdR9DDh9vZMinUS73byeRuBOja5uSvTmQjCsqKC9tSjj9i6yPClYls081r0vz7wfUsxVmKpOVUhjfiqBaGKPwUfx9zLnmecOra5RntXcavg41pfOSYG2jKGqw1YVvsKQO3X+DzGBteXBCLsqMdWJwS51D7XrtdmXKod22zde193Llvbp14IhEAIvIrAZQTpq4Cd2Q+TJdvWQt06UTnhMlFbFCe3ChPbZ/8vAYQRogeOHCM+2BRGsK+CtXJDxHDdQp/Rd75MeJ46iK49ShWH+JvFVn0pEkcxIDAdn/WFh/PE7TlFI2OQ81WMKRrNvwpyz3XBB2NsKo6NQfv6JQ/tdxtc8z6g/YwF54mZOjVu2tf8KzOOjamfxx6xGHu/vuWzcRNXSgiEQAicmcB29XLmLBLb1wlv68TjBFkndVeMqg1FRp9g70GuTyfgPWzeE8eRbeCJQEJoKIDgviQ64IXgstgnnFdQIV74vNeLA/EQo33FmKhjxVjcVzFmTF5jT19jq44tr9fcua7PbqfGA4NaFKSj/LXPXtv46fZp6/Vqm2OZj+K3rvHVvvKatkfX9LnEVzvuvZf6S4nX61779VyOQyAEQuBsBCJIz9Yjd8bjBMWkuKWwcsREVSdljp28FIuKVCf1LbaX6iic8MPxRytwhL3Ca02E2K+9nu0VYAom+20vrvjVF33GeBgVBdds/Hl9JMi0V8cfjHpxbI5suLK6lL+iGDtw7cV7Alu9eB+MfFPXfsJ2vae4Zt/M/GLT3OC05V4z1ll/1Pi13eOqdXIcAiEQAkcTiCA9ugd28u+EOBME3Y2TYD/vqp0TL5+3rMJ0O7PPCtI9bc58nel8FaKIDjeEBSwQLSMhgiAcCRkFkit2fn6GyCcuBd8oFjiTA9dGQrJe78K69pHCjZw6iypWe46OKfz3dtW+Im52j3hPOPZrW/Of5Sd/YudeRPxS13b4nAlCYtY3OWBjiVONa8sxNtlGInxL+9QJgRAIgVcQiCB9BeUX+LhVkDLpjSZmBRATmBP9llWYrSm6SjWa9LfaeMd6iA44KqYQKzLg3GxlTKHSc64CDaYIn2eKfOJnzDAuuijj2pLocWxSZybKyE9RJ5eas2KVPHtZulbrMt6JYSb2FNX9+lp++FB4YgM/bPQptpZyrvHByRiIk/Z7lKW+2cN+bIRACITAHgQiSPegeAIbTvpbRIlipgsL0qiTrwKhT9CjdBFV1Fe0MAkSC2KhFibZJVFAXQWWEymTPe2wPZrcOadwq22wU8vWerYhb4UGdhEZI/89XusqPrVX98SLvbUye3GgnbHZTyMht2a/XifeUX7WIV5y6+PGscc1jmthPBknuSwVxdhovJkj+17s+x5Xr0d8oxipR95cI0ZirqXmV8/XY23PYoBtt1vb12NzxWYv2NfX6Hqvz2frb/U/spFzIRACIfBsAt8+8Z7tMfafQoAJz4lnzYErSl2w2a5OiNhcEim0wR4TOcJDEaZI6AJFcdLPd98IUCdQbY0mYPzhmzrWR9BQtwqbrfWIg3yJE7sKapnhpxYY4kshQgzyW+JGfmxLxT6d1TMm/Pd8l+zOrpEbec+KglEm1quCrV6DBSIWjsRXBTi87C/smCv1RtxmvmmrfccU7Wsc1KkxGnfdz8Q2dRxPlU2P3/gcB9U28RBjF9PkzNZLZdGv8Znx0HmO6nEOxtRlSwmBEAiBMxPIU+rMvXNjbE7Mo0mumlLgOYHXaxzXyXttImOypQ4TchUYCrVqe21ydOLvE7fxVEGgXQQA/nsuxFPPba1HjIqL2h5/+KmiinMKGeNhr6Co5+45lsdI5GCPWO3zEYNbfWqr540dY6EPaj/rA/9sXKfvEYQcs8nePmH84asKT+rTnvO9yJPrtY319I0f7GCji3hjoG4v+u59az3bYhc2o/h9CaEO9qgHB8+PuGFn5JO2xAmvUTEe9mtFW/hPCYEQCIEzE/j26XzmaBPbIgGFJhPhrDj5MuExac/Eq6JsNGFW2wqyJZ/Wpw5+RzYVV0zoXfA4AXeRgd0qlGa53FJPX10U66eflznne9zmfc8eIVH7YCTEsKvgGQmtW/xW0Yct7MKCDTHDOfYzxq7aUc9NEWa/e54+7vmYx2hsyJ52o8J5bbPHVi+OU67TZwpG+4/4Z/2nqNPHKH649Disj4+Rba8TLz7YyBU7Ix/mZC5b7jnZje4d7WUfAiEQAmcgEEF6hl7YKQbF5mhCxgWTJpNZ3WaTGhMZ9bA5K4gKJtXZSk5vp9gbTY5OnEzevShWRrEw0SuYmMTxMZr8t9ZTVJAbbeCjfwVWjY96toHDKMZaf+sxHGo/kdeoIGKoN2I6qr90jlzoB2xV33yejZNqzzEzGjfEb5yj/oHxqB32a9vqz2PihhfbLE7FH7ZqfhzDcK0YA3HORDl53WIbv9itfc0xHEeMjNFcZnFYjz32qL8lx9ouxyEQAiHwagIRpK8m/kR/TGKII7alCW2vEBSRTNJbCoKDyXEkGpw4sdmLK4VLEzATu8JwJBy1uVQPYeNk7x5b5DeKWZuwpo5ttvKwffbPJYAYs2+e6+n51h2jW14CuV/Ie0vd50ceDyEQAiGwTCCCdJnP211VJCK8nl3wwYS31ZeiYCQsFat9JcdVX8TmWkEYulq6tFI5q6dwIZZ7CmJBUTzK8R6be7Qhr7XtFS8we+Ryjw3H6b39eo/PZ7Xx/uYFbq34krT0MrVmI9dDIARC4FUEIkhfRfqFfph4EUYIpGcWJ3q+puyFa1XkuLJThWW9PhKkiDoFZhUTtmPlpwtYY6qCdGu9JUFK/DVPV5963uZxJkHqi8DSvnPseb3zZ/uk9t+75kMO9SXQe6Hn41jOan0nk88hEAJnJRBBetaeeSAuJimE3LNFqaIMPwowJkIEAP6rIFaQcp6CcKwiUyHpyg92OGZ1hwmYiZW82Ct2OY89fbM3bydq9lvqiRvxSn0FGjbxR4715wQ9LtrThno1L+0euSeutU1eR8b5LN/0iWPg3VcLXfXkhcvx7liVn+OQe+HK/Wq+2YdACFyDQATpNfrxmyyYiFxNYf+siYkJnomPCZ8NMVZXJ2tgxoPoQ+TVQny0xQYCQtFJHc/jR0HBZIxgVUDajgmba5at9ayPcMau+eCzxmI9Jn3iUuxQ37yexVrf2W8nQP/TT3Xb3vp8NRmfjnnGaX3pI1cFK/daSgiEQAi8E4EI0nfqrTtiZZJicmKfEgIhcF0CvCTlXr9u/yazELg6gQjSq/dw8guBEAiBEAiBEAiBkxOIID15ByW8EAiBEAiBEAiBELg6gQjSq/dw8guBEAiBEAiBEAiBkxOIID15ByW8EAiBEAiBEAiBELg6gQjSq/dw8guBEAiBEAiBEAiBkxOIID15ByW8EAiBEAiBEAiBELg6gQjSq/dw8guBEAiBEAiBEAiBkxOIID15ByW8EAiBEAiBEAiBELg6gQjSq/dw8guBEAiBEAiBEAiBkxOIID15ByW8EAiBEAiBEAiBELg6gQjSq/dw8guBEAiBEAiBEAiBkxOIID15ByW8EAiBEAiBEAiBELg6gQjSq/dw8guBEAiBEAiBEAiBkxOIID15ByW8EAiBEAiBEAiBELg6gQjSq/dw8guBEAiBEAiBEAiBkxOIID15ByW8EAiBEAiBEAiBELg6gf8BAEM8aefIHmgAAAAASUVORK5CYII=)
This weight causes the classes to be balanced. The class with more samples is more likely to be predicted in an imbalanced dataset, but the classification of classes will be equal in the balanced dataset.
- The normalization is done to make the value of each feature between 0 and 1.
These actions have been taken to preprocess the dataset to prepare the training and test sets.
3-3. The proposed Deep Autoencoder
The autoencoder is a model to extract the features for classification [27]. The multilayer neural network can be used to extract the features and includes the input, hidden, and output layers. The number of hidden layers can increase the deep autoencoder, but this increase is not always necessary and can sometimes decrease the classification accuracy [29]. The hidden layers can improve the feature selection and extraction to improve the classification accuracy. The deep neural network needs to be designed as an autoencoder. To evaluate the autoencoder, the encoder and decoder are connected to cause the encoder model to be evaluated. The encoder and decoder are composed as shown in Fig.2. The better autoencoder should minimize the difference between the value of input and output layers. The number of features reduces the number of nodes called code in Fig.4.
The training phase instruction has been followed to train the AE network. First, the TensorFlow modules must be called to build the model. Neural networks generally consist of three layers: input layer, hidden layer(s), and output layer. Here, the model is built on the same basis. To build a model, the input layer must first be created. The input layer consists of neurons that receive only the input and transmit them to other layers. The number of neurons in the input layer should be equal to the features in the data set. According to the number of features, the input nodes are set at 80. The Dense class is used to implement the layers of the model. The activation function which is used in hidden layers is Relu. The activation function is used for forwarding and backward, but the output layer activation function is different, which is sigmoid in the designed model. The training phase is done with compile() function in python. The proposed model uses the Adam optimizer [30] to update weights. The strategy pseudocode used to design the deep autoencoder is demonstrated in Algorithm.1. The algorithm indicates that the input and output nodes are set based on the dataset features and classes. The number of input nodes is 79, and the output nodes are specified based on the number of classes used for classification. In Algorithm.1, it is stated how the hidden layer with how many nodes are added to the encoder model.
Algorithm 1: The deep autoencoder model evaluation Pseudocode
|
Input: Input layer, Output layer
|
Step1à Create the model with input layer and output layer
Step2à Set i = 0
Step3à Add a hidden layer and set i = i +1
Step4à Set the added hidden layer nodes equal to (Input layer. nodes- (i × 5))
Step5à Encoder = Connect all nodes to make a mesh neural network
Step6à Decoder = Invert the Encoder and create the Decoder
Step7à Combined model = Combine the Encoder and Decoder
Step8à Evaluate the combined model and save the MSE between the input and output in a list
Step9à Continue steps 3 to 8 if the number of nodes in the hidden layer is more than the nodes in the output layer; otherwise, stop the loop and exit
|
Output: The list of MSE evaluated values for each combined Encoder and Decoder
|
The model is an encoder which should decrease the number of features. The hidden layer is added between the input and output layers, and the encoder needs to be evaluated. The reverse encoder as the decoder is added to the encoder, making the model in which the input and output should be the same. It paves the way to evaluate the model with the difference between input and output values. The hidden layer with the neurons, which has five neurons less than the previous layer, is added. This procedure continues until the number of hidden layer neurons is less than the output layer. The model is a full mesh model; hence, all neurons should be connected to the next and previous layers' neurons.
The model evaluation is done with the mean square error (MSE) as a loss function which should be minimized. Fig.4. and Fig.5. are the output of the training phase with the proposed model. The layers and the number of nodes in each layer, encoder, and decoder, are shown in Fig.5. and Fig.6., respectively.
The input layer has 79 nodes equal to the number of features, and the minimum MSE belongs to the output layer with ten nodes; therefore, the features are reduced to ten. It shows that the classifier must have ten nodes input layers.
The minimum MSE between output and input belongs to the hidden layer with ten neurons. Fig.7 shows the combination of encoder and decoder used to evaluate and achieve the minimum MSE in this research.
3-4. The Classifier Layer
The classifier is a layer which should be added to the autoencoder, so it should be able to classify the input network traffic with only ten features which is the output of the autoencoder. According to the proposed deep autoencoder, an accurate classifier can improve the classification accuracy in facing the new network flows. To find the best classifier, the deep autoencoder has been trained with 100 iterations, and all weights and biases have been trained, so the transfer-learning approach is used to freeze the model. The best classifier is selected among four well-known classification algorithms, including Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM) and Logistic Regression (LR). These classifiers should receive the ten features and classify the network traffic. This model should be able to do the classification between VPN and Non-VPN and identification among the applications.
- Application Identification: the proposed model is shown in Fig.8. This model is a layer that classifies the network traffic with ten features given by the proposed autoencoder into seven classes shown in Table 2. This model is trained for the VPN and Non-VPN separately.
- Encrypted Traffic Classification: the proposed model is indicated in Fig.9. This model classifies the network traffic with ten features from the proposed deep autoencoder into VPN or Non-VPN. It predicts that the traffic is VPN or Non-VPN.
After adding the classifier, there are two different models for two distinct responsibilities: VPN and Non-VPN classification and application identification.
3-5. Hyperparameter Tuning
The hyperparameters for the model are developed in python using sklearn.model_selection. The consequences of the GridSearchCV function for each classifier to increase the accuracy are shown in Table 3.
Table 3
Hyperparameter values
Classifier
|
Tuning parameters & values
|
Logistic Regression (LR)
|
C=0.01, penalty=12, solver= 'liblinear'
|
SVM
|
C=0.01, gamma=scale, kernel=’rbf’
|
Decision Tree (DT)
|
Max depgth=6
|
Random Forest (RF)
|
max_features=’log2’, n_estimators=1000
|
The training phase is done with a cross-validation technique to prevent overfitting. The k-fold method with k=10 has been used to train the model.
3-6. Model Deployment
The proposed model has been deployed in the controller using SDN architecture. The Ryu controller has been used as the controller, and the model has been developed in the Ryu with Python 3. The trained model has been deployed in the controller to classify the encrypted network traffic and identify the applications.