U.S. Pat. No. 8,556,727
METHOD AND SYSTEM FOR DETECTING BOT SCUM IN MASSIVE MULTIPLAYER ONLINE ROLE PLAYING GAME
AssigneeSNU R&DB Foundation
Issue DateJanuary 2, 2013
Illustrative Figure
Abstract
Provided are a method and system for detecting a bot in a Massive Multiplayer Online Role Playing Game (MMORPG) online and in real time. By analyzing a communication pattern between a client and a server, the bot is detected based on parameters such as data length, inter-arrival time and data length auto correlation. Tests using respective parameters are combined to construct a global decision scheme, and thus more accurate detection results can be obtained. An integrated anti-bot defense system can be built by combining other tests such as a Turing test.
Description
DETAILED DESCRIPTION OF EMBODIMENTS Hereinafter, exemplary embodiments will be described in detail with reference to the accompanying drawings. Throughout the drawings and the detailed description, unless otherwise described, the same drawing reference numerals will be understood to refer to the same elements, features, and structures. The relative size and depiction of these elements may be exaggerated for clarity, illustration, and convenience. The following detailed description is provided to assist the reader in gaining a comprehensive understanding of the methods, apparatuses, and/or systems described herein. Accordingly, various changes, modifications, and equivalents of the methods, apparatuses, and/or systems described herein will be suggested to those of ordinary skill in the art. Also, descriptions of well-known functions and constructions may be omitted for increased clarity and conciseness. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments. As used herein, the singular forms “a,” “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. Hereinafter, exemplary embodiments will be described in detail with reference to the accompanying drawings. According to an exemplary embodiment, a real-time detection method for ground layer character pre-selection in response to a traffic pattern of bot. Thus, the bot may be detected only by short traces samples and simple calculation. Further, a scheme for an anti-bot defense system can be provided, using the real-time detection method. In a field related to bots, anti-bots countermeasure is as important as ...
DETAILED DESCRIPTION OF EMBODIMENTS
Hereinafter, exemplary embodiments will be described in detail with reference to the accompanying drawings. Throughout the drawings and the detailed description, unless otherwise described, the same drawing reference numerals will be understood to refer to the same elements, features, and structures. The relative size and depiction of these elements may be exaggerated for clarity, illustration, and convenience. The following detailed description is provided to assist the reader in gaining a comprehensive understanding of the methods, apparatuses, and/or systems described herein. Accordingly, various changes, modifications, and equivalents of the methods, apparatuses, and/or systems described herein will be suggested to those of ordinary skill in the art. Also, descriptions of well-known functions and constructions may be omitted for increased clarity and conciseness. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments. As used herein, the singular forms “a,” “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
Hereinafter, exemplary embodiments will be described in detail with reference to the accompanying drawings.
According to an exemplary embodiment, a real-time detection method for ground layer character pre-selection in response to a traffic pattern of bot. Thus, the bot may be detected only by short traces samples and simple calculation. Further, a scheme for an anti-bot defense system can be provided, using the real-time detection method.
In a field related to bots, anti-bots countermeasure is as important as bot detection. While many good schemes have been proposed to detect the bot, almost nothing is dealing with some really dissuasive punishment policy. Nowadays, temporary and definitive ban are the available solutions but are somehow irreversible, making multiple checks and parsimony necessary to avoid catastrophic wrong sentences. Thus, countermeasures are still rarely used, encouraging bot users to have a try with limited risks.
Thus, in order to achieve an effective restriction to a bot, it requires a ground layer detection method that investigates exhaustively and on-the-fly every player's traces to yield a pre-selection of “good candidates”.
According to an analysis of client-server communication pattern of bot and human being, it is found that the bots' case has distinctive patterns in regard to packet timing and size, which are different from the case of human being.
Hereinafter, in regard to inter-arrival time, data length, and data length autocorrelation, respectively, differences between the bot and the human activities are described.
1. Inter-Arrival Times
For each packet, inter-arrival time is the time duration elapsed since last packet arrival.FIGS. 1A and 1Billustrate inter-arrival times of chronologically ordered packets within one trace:FIG. 1Aillustrates a case of human being, andFIG. 1Billustrates a case of bot. As shown inFIGS. 1A and 1B, while human has a random values distribution, on small sample of consecutive packets, bot shows either regular and fast packet arrivals pattern or high and sharp peaks resulting in periods of inactivity. Herein, the periods of inactivity is probably occurred due to path finding or character's death. Then, we will consider as bot trace a sample containing only low inter-arrival time values or a sharp peak
2. Data Lengths
A data length of a packet means the size of the corresponding segment's payload, i.e., a length of Transport Service Data Unit (TSDU).FIGS. 2A and 2Billustrate data lengths of chronologically ordered packets within one trace:FIG. 2Aillustrates a case of human being, andFIG. 2Billustrates a case of bot. As shown inFIGS. 2A and 2B, a bot sends almost no big packets having a value over a threshold, unlike human being. This could be used as a test's criterion. However, anti-bot defense system may have better performance by checking the data lengths distribution more accurately with a second threshold which described later.
3. Data Length Auto Correlation
In a given data stream, a packet size is auto-correlated. Human behavior is generally expected to be more random and unpredictable than bot's. As a result, data length autocorrelation is used for another decision test as a discriminative feature between bots and human beings.FIG. 3is a graph illustrating auto correlation profiles for various traces. Human traces show that data length autocorrelation function is maintained within a uniformed range, which is significantly far from 0; but bot traces describe that data legnth auto correlation function has an oscillating shape. Accordingly, since there is shape difference of the data length auto correlation function between human being and bots, the difference can be used as a discriminative feature between bots and humans. Narrowing the scope on the first autocorrelation range, bots' data length autocorrelation is negative, while humans' one is positive. This may be considered as a threshold.
Hereinafter, a test designed for discriminating human beings and bots based on above described three valuables is depicted.
A. Data Length Auto Correlation Test
By using the standard definition of autocorrelation, it is possible to perform on-the-fly calculation on short jumping windows.FIG. 4illustrates the arrangement of each window. Thus, autocorrelation is calculated using the sampling series (m1, var1) and its shifted counterpart (shifted series) (m2, var2). Herein, shift is 1. From this, autocorrelation is expressed as Equation (1) below.
m12-m1m2sqrt((m11-(m1)2)(m22-(m2)2))(1)
Herein, m1=mean(X); m2=mean(Y); m11=mean(X2); m22=mean(Y2); m12=mean(XY), which variables are denoted respectively by X and Y.
Pseudo-code 1 describes the data length autocorrelation test. In detail, the Pseudo-code 1 calculates autocorrelation values on consecutive jumping windows, compares them with a decision threshold, and finally performs a majority vote among the consecutive decisions. Its input parameters are the sampling series size acn, the number of “voters” acv, and the decision threshold acthr.
[Pseudo-code 1: DATA LENGTH AUTOCORRELATION]vote_counter; bot_vote ← 0m1;m2;m11;m22;m12 ← 0value_counter ← 0while new packets arrive dox ← waitNewV alue( ) {value stored at packet arrival}if value_counter ≧ 1 thenm1 ← m1 + x, m11 ← m11 + x * x,m12 ← m12 + x * pvend ifm2 ← m2 + x, m22 ← m22 + x * x,pv ← x, {storage of shifted value}value_counter ← value_counter + 1if value_counter = acn + 1 thenm2 ← m2 − x, m2 ← m2 − x * x, {correction}m1 ← m1/n, m2 ← m2/n, m11 ← m11/n,m22 ← m22/n, m12 ← m12/n,d ← sqrt((m11 − (m1)2)(m22 − (m2)2)),aux ← (m12 − m1m2)/dif aux acv=2 thenreturn BOTelsereturn HUMANend ifvote_counter; bot_vote ← 0m1;m2;m11;m22;m12 ← 0value_counter ← 0end ifend ifend while
B. Inter-Arrival Time Test
This test is twofold: counting, within a window having a length of ‘win’ parameter, the number of values above a threshold itthr1; and, among these values, calculating the ratio of values over a second threshold itthr2. According toFIG. 5, this is equivalent to derive this is equivalent to derive [set2] and [set3]/[set2]. The first test checks regularity; and the second test checks the presence of peak. After several possibilities are considered, checking the concentration of very high values out of the “regularity zone” is found as the most efficient technique. That is, if [set2] is strictly smaller than a threshold itrt1and if [set3]/[set2] is strictly greater than another threshold itrt2, presence of peak is confirmed. If one of these properties is found for the sample window, the test returns BOT. The corresponding online algorithm is described in the following Pseudo-code 2.
[Pseudo-code 2: INTERARRIVAL TIMES]index; set2_counter; set3_counter ← 0while new packets arrive dox ← waitNewV alue( ) {value stored at packet arrival}if x > itthr1 thenset2_counter ← set2_counter + 1end ifif x > itthr2 thenset3_counter ← set3_counter + 1end ifindex ← index + 1if index = win thenif set2_counter itrt2 thenreturn BOTelsereturn HUMANend ifend ifindex; set2_counter; set3_counter ← 0end ifend while
C. Data Lengths Test
This test checks two properties of the sample window having a length of ‘win’ parameter: one checks that the number of values over a threshold dlthr1is strictly smaller than a quantity dlrt1(regularity); and the other checks that the number of values over another threshold dlthr2is strictly smaller than another quantity dlrt2(short length). If the sample window exhibits both regularity and short length properties, the test returns BOT. The corresponding online algorithm is described in the following Pseudo-code 3.
[Pseudo-code 3: DATA LENGTHS]index; set2_counter; set3_counter ← 0while new packets arrive dox ← waitNewV alue( ) {value stored at packet arrival}if x > dlthr1 thenset2_counter ← set2_counter + 1end ifif x > dlthr2 thenset3_counter ← set3_counter + 1end ifindex ← index + 1if index = win thenif set2_counter < itrt1 thenif set3_counter < itrt2 thenreturn BOTelsereturn HUMANend ifelsereturn HUMANend ifindex; set2_counter; set3_counter ← 0end ifend while
D. Global Decision Scheme
In global decision scheme, the notations introduced in the previous subsections are used. The global decision scheme's operation is quite simple: at each packet arrival, it updates each of the different tests until they all have got enough data to return a decision. For example, amounts of data having a length of ‘win’ parameter may have been collected. For synchronization, parameters controlling sample window's size is expressed as Equation (2) below.
win=acv(acn+1) (2)
Once all previous tests returned their results, the global scheme uses the decision tree shown inFIG. 6to classify the collected trace. It keeps returning real-time decisions while the player is connected to the game server. Its parameters are the sample window's size win (reused by INTERARRIVAL TIMES and DATA LENGTHS tests) as well as those introduced in previous subsections for each specific test. The corresponding online algorithm is described in the following Pseudo-code 4.
[Pseudo-code 4: Global decision scheme]index ← 0DATA LENGTHS AUTOCORRELATION ← reset( )INTERARRIV AL TIMES ← reset( )DATA LENGTHS ← reset( )while new packets arrive dox ← waitNewV alue( ) {value stored at packet arrival}DATA LENGTHS AUTOCORRELATION ← update(x)INTERARRIV AL TIMES ← update(x)DATA LENGTHS ← update(x)index ← index + 1if index = win thenif DATA LENGTHS AUTOCORRELATION = BOT thenreturn BOTelseif INTERARRIV AL TIMES = HUMAN thenreturn HUMANelsereturn DATA LENGTHSend ifend ifindex ← 0DATA LENGTHS AUTOCORRELATION ← reset( )INTERARRIV AL TIMES ← reset( )DATA LENGTHS ← reset( )end ifend while
This method shows linear time complexity and constant space complexity according to sample window's size. In the Pseudo-code 4, the previously defined algorithms are called by the global decision scheme. The reset( ) instruction means that local variables are reset to their initial state (e.g., counters are set to 0), the update(x) instruction means that the value x is forwarded to the summoned algorithm, which then updates its own status (i.e. local variables). The name of an algorithm also refers to its output.
Performance of the method for bot detection according to an exemplary embodiment is evaluated by using standard performance metrics. Herein, three items having accuracy, false alarm rate, and detection time are checked: the accuracy means a ratio of well classified traces; the false alarm rate means a ratio of wrongly classified traces among those classified as bot; and the detection time means a trace length used to return decision. The detection time can be converted into actual time according to packets arrival speed.
Through experiments, individual tests as well as the global detection scheme is optimized. Accordingly, the parameters' values used for the final tests are described in the following Table I.
TABLE IGlobal schemeAutoInter-Param-Correlationarrival timesData lengthseterValueParameterValueParameterValueParameterValuewin100acn19itthr12 sdlthr159 Bacv5itthr26 sdlthr250 Bacthr−0.15itrt11dlrt11itrt20.3dlrt27
Using a 100 packets long trace, i.e., detection time between 18 s and 1 min 40 s, an exemplary embodiment yields accuracy of about 86.06% and false alarm rate of about 7.74%.
According to an exemplary embodiment, through the global detection scheme using the rationales that bots send less information than human and inter-arrival times-data lengths duality, an overall accuracy can be improved 10% better than that of any other individual test.
Otherwise, it is understood that collecting data lengths at client or server side makes no difference for the obtained values. Concerning inter-arrival times, there might be some difference; however, large thresholds are used for detection (2 sec. and 6 sec.), which are not likely to be much affected by small fluctuations. So, the method according to an exemplary embodiment may be deployed at a server side.
Based on an exemplary embodiment, an integrated defense strategy system can be developed. That is, the method according to an exemplary embodiment could be used in combination with more accurate and costly tests, as a ground layer pre-selection tool.
FIG. 7is a block diagram illustrating an integrated defense strategy system including an online bot detection unit according to an exemplary embodiment.
As illustrated inFIG. 7, the integrated defense strategy system according to the exemplary embodiment includes an online detection unit710detecting a bot by using an online detection method described above and a Turing test unit720. In the Turing test unit720, a test result is updated whenever a test is performed in operation730. The test result is combined with a result of the online detection unit710by a combination unit740. The combined result is delivered to an event generating unit750. Accordingly, a control event760to the bot may be occurred.
As described above, in an exemplary embodiment, a method for online bot detection based on traffic pattern analysis exhibiting low time and space complexity is provided. Using the detection method, a ground layer character pre-selection may be performed. The method for bot detection according to an exemplary embodiment can be performed with high accuracy by small amounts of calculation; and thus, online real-time bot detection may be achieved. Further, if combined with other test method, the bot detection method according to an exemplary embodiment could be a component of a bigger anti-bot defense system.
The invention can also be embodied as computer readable codes on a computer-readable storage medium. The computer-readable storage medium is any data storage device that can store data which can be thereafter read by a computer system. Examples of the computer-readable storage medium include ROMs, RAMs, CD-ROMs, DVDs, magnetic tapes, floppy disks, registers, buffers, optical data storage devices, and carrier waves (such as data transmission through the Internet). The computer-readable storage medium can also be distributed over network coupled computer systems so that the computer readable codes are stored and executed in a distributed fashion. Also, functional programs, codes, and code segments for accomplishing the present invention can be easily construed by programmers skilled in the art to which the present invention pertains.
A number of exemplary embodiments have been described above. Nevertheless, it will be understood that various modifications may be made. For example, suitable results may be achieved if the described techniques are performed in a different order and/or if components in a described system, architecture, device, or circuit are combined in a different manner and/or replaced or supplemented by other components or their equivalents. Accordingly, other implementations are within the scope of the following claims.
Claims
- A method for detecting a bot in a Massive Multiplayer Online Role Playing Game (MMORPG), the method comprising: counting the number of packets having an inter-arrival time over a first threshold (itthr 1 ) among consecutive packets inputted from a user within a window having a predetermined size;counting the number of packets having the inter-arrival time over a second threshold (itthr 2 ) among the consecutive packets;and determining the user as a bot when the number of packets having the inter-arrival time over the first threshold (itthr 1 ) is less than a third threshold (itrt 1 ), or a ratio of the number of packets having the inter-arrival time over the second threshold (itthr 2 ) and the number of packets having the inter-arrival time over the first threshold (itthr 1 ) is more than a fourth threshold (itrt 2 ).
- A method for detecting a bot in a Massive Multiplayer Online Role Playing Game (MMORPG), the method comprising: counting the number of packets having a data length over a first threshold (dlthr 1 ) among consecutive packets inputted from a user within a window having a predetermined size;counting the number of packets having the data length over a second threshold (dlthr 2 ) among the consecutive packets;and determining the user as a bot when the number of packets having the data length over the first threshold (dlthr 1 ) is less than a third threshold (dlrt 1 ), and the number of packets having the data length over the second threshold (dlthr 2 ) is less than a fourth threshold (dlrt 2 ).
Disclaimer: Data collected from the USPTO and may be malformed, incomplete, and/or otherwise inaccurate.