Traditionally, we can analyze network traffic by capturing packets on the network and making statistics. However, as network bandwidth and network traffic increase, we will need more computing resources to complete the traffic analysis job on time. Using NetFlow technology may reduce the resources required for traffic analysis. However, as network bandwidth keeps growing, the time it takes to accomplish traffic analysis may increase dramatically. When resources and time are limited, applying the sampling technique to NetFlow generation may reduce the amount of time and resources required. Nowadays, NetFlow data are often used to generate various statistical reports. Thus, we must fully understand whether the sampling technique will affect the statistical results before applying it to the NetFlow generation. In this paper, 28 days of NetFlow data obtained from the Taiwan Academic Network were studied. The differences in the IP address list and top talkers for different sampling rates are examined. The results show that sampling NetFlow does affect the retention rates of IP addresses and top talkers in ranking lists, and the higher the sampling rate is, the greater the impact is.
Date:
2021-03
Relation:
Journal of Internet Technology. 2021 Mar;22(2):457-463.