Data anti-leakage DLP technology depth analysis

In the enterprise, it is mentioned in the enterprise. You may often remember the documentation. Few people will pay attention to the content of the documentation. The management of the data is also a single one. It is usually full encryption, full authority, and the importance of the document is not distinctive. With the development of society, there are more and more documents, and the security incidents continue to make people change the attention of data. The data is also divided into structured data and non-structural data, more concerned in document content. Sensitive information, what are the use of documentation, and a different type of document, a different type of document, a different content, is different from management and storage.

Most of you should control data, most of them are strong control, directly isolating, or all encryption, we call it a lot of unnecessary troubles in the actual data production, use, and circulation. People More flexible ways to process data, at this time, intelligent data security controls are born, and enterprise administrators can control data according to the importance of data.

Data anti-leakage core capabilities

What is DLP? Translation is translated into “Data Leakage (LOSS) Prevention Data Protection Protection, its core capability is the content identification, and the identification can be extended to the prevention and control of the data. Content identification The recognition capabilities are specific for keywords, regular expressions, document fingerprints, exact data sources (database fingerprints), support vector machines, which will extend a variety of composite capabilities for each capability.

DLP should also have protection, and the scope of protection includes network protection and terminal protection. Network protection is mainly audited, controlled, and terminal protection is in addition to audit and control capabilities, and there should be traditional host control capabilities, encryption, and authority control capabilities.

In general, DLP is actually a complex, the final achievement, should be intelligent discovery, intelligent encryption, intelligent control, intelligent auditing, and a set of data leaks.

Data anti-leakage composition

The following figure illustrates the physical configuration of the DLP, and the resident location of different models within the organization. “Network DLP” product is always in DMZ, while other products are still stationed in the enterprise LAN or data center. In addition to the “terminal DLP” product, all other products are based on the server.

Data anti-lepse general technology

In order to prevent data loss, all types of confidential data must be accurately detected regardless of where data storage, replication, or transmission location is. If there is no accurate detection, the data security system generates many false positives (will not be illegal messages or file identities as illegal) and missing newspapers (not a messages or file identifies that violate the strategy). The false positives will take a lot of time and resources required to further investigate and address significant accidents. The leak will cover the safety vulnerability, leading to data loss, potential financial losses, legal risks and loss of terms. Therefore, accurate detection techniques should be required to protect. In order to ensure the highest accuracy, DLP uses three basic detection techniques and three advanced testing techniques.

Basic testing technology

There are usually three ways in basic detection technology, regular expression detection (indicators), keywords, and keywords are detected, document property detection. Basic detection methods use conventional testing technology to perform content search and match, more common are regular expressions and keywords, this two methods can detect clear sensitive information content; document attribute detection is mainly for documentation, The size of the document, the name of the document is detected, the detection of the type of document is based on the file format, not a simple sub-detection, the file type detection can accurately detect the file type detection for the scene of the hyperfix name. Type, currently support more than 100 standard file types, and can identify documents in special file type formats through custom features.

Advanced testing technology

There are three ways in advanced testing techniques, accurate data comparison (EDM), fingerprint document alignment (IDM), vector classification comparison (SVM). EDM is used to protect data typically structured format, such as customer or employee database records. IDM and SVM are used to protect non-structured data, such as Microsoft Word or PowerPoint documents. For EDM, IDM, SVM, sensitive data will be identified by the enterprise, and then discriminate their feature by DLP to perform precision continuous detection. The process of discriminating features includes DLP access and retrieving text and data, which is regulated and protected using irreversible chaos.

DLP detection is based on actual confidential content, not based on the file itself. Therefore, the DLP does not only detect the retrieval or derivative of sensitive data, and can identify sensitive data different from the file format and the feature information format. For example, if the feature of the confidential Microsoft Word document has been discriminated, the DLP can be submitted when the same content is submitted by email in the same manner, and will accurately detect it.

Accurate data comparison

Accurate Data Comparison (EDM) protects the data of customers and employees, and other structured data typically stored in the database. For example, customers may write a policy on using EDM detection to find “Name”, “ID”, “Bank Account”, or “Phone Number” in the message, and mapped it To the record in the customer database. The EDM allows for detection based on any of the data columns in a particular data column; that is, n fields in the m fields are detected in a specific record. It can be triggered on the “value group” or the specified data type; for example, accept the combination of the two fields of the name and the ID number, but does not accept the name of the two fields of the mobile phone number.

Since a separate chaos is stored for each data store, only the mapping data from a single column can trigger a detection policy that is looking for different data combinations. For example, there is a combination of “Name + ID + mobile phone number”, “Zhang San” + “13333333333” “110001198107011533” can trigger this strategy, but even “Li Si” is also in the same database, ” Li Si “+” 13333333333 “” 110001198107011533 “cannot trigger this strategy. EDM also supports similar logic to reduce possible false positives. For free format text processed during the detection, all of the numbers of all data in a single feature column must be within a configurable range, as a match. For example, in accordance with the default, in the text of the text of the email, “Zhang San” + “13333333333” “110001198107011533” The number of words must appear in the selected range. For text containing table data (e.g., Excel spreadsheet), all data in a single feature column must be located on the same row of the table text, and can be considered as a match to reduce the overall false positivity.

Fingerprint Document Alignment

“Fingerprint Document Comparison” (IDM) ensures that unstructured data stored in document, such as Microsoft Word, Finance, M & I document, and other sensitive or proprietary information. IDM creates a document fingerprint feature to detect the protected documents of the protected part of the original document, a draft or different versions.

The IDM first wants to learn and training when sensitive files, when the documentation of sensitive content, IDM uses the technique of semantic analysis, and then performs semantic analysis, and comes out to learn and train sensitive information document fingerprint model, and then use the same The method is grabbed by fingerprinting the measured document or content, and the resulting fingerprint is compared with the training fingerprint. Depending on the preset similarity to whether the detected document is a sensitive information document. This approach allows IDM to have extremely high accuracy and greater scalability.

Vector machine classification comparison

Support Vector Machines is proposed by Vapnik et al. In 1995. With the development of statistical theory, the support vector machine has also gradually been concerned about researchers in various fields, and it has been widely used in a short period of time. Support Vector Machine is based on the principle of the VC dimensional theory and structural risk of statistical learning theory, and the information provided by the limited sample is the best compromise on the complexity and learning ability of the model. To achieve the best generalization. The basic idea of ??SVM is to map the training data nonlinearly mapped into a higher-dimensional feature space (Hilbert space), in this high-dimensional feature space, find a super plane to make the isolation edge between the numbers and inverse examples It is maximized. The emergence of SVM has effectively solved the problem of traditional neural network results, local minimal values, and prediments. And in small samples, nonlinear, data high dimension, etc., exhibited a lot of attention, extensively, is widely used in the fields of pattern identification, data mining.

The SVM is suitable for those with subtle features or difficulties, such as financial reporting and source code, and the like for the algorithm. In the process of use, first subtract the document according to the content, each type of document collection is the meaning of this class. After the SVM comparison, it is determined which type of document being detected, and obtains the permissions and strategies of such documents . At the same time, for the characteristics of SVM, the document on the terminal or server can be subproved according to the classification meaning.

The alignment difference between IDM and SVM is that IDM will finish comparison comparison of each file in the fingerprint of the file to be detected; and the SVM is quantified to be detected, and it is hosted to have a class of training. Vector space.

Data anti-leakage control and encryption technology

Device filtering technology

A device filtering drive programming technology that enables security and control of the terminal arbitrary device (USB port, printer, optical drive, floppy drive, infrared, Bluetooth, and network card, etc.). Automatically identify information such as hardware information, user ID, storage device, and non-storage devices, authorized devices, and non-authorized devices.

File-level intelligent dynamic plus decryption technology

One file-level filtering drive programming technology, dynamically tracks and transparent / decrypting files by intercepting the read / write request of the file system in real time. Its main advantages: file plus / decryption dynamics, transparency, do not change the user’s operating habits; performance is small, high operating efficiency; does not change the format and status of raw files, while deployment and internal use are very convenient.

Significant features are: encryption mandatory, transparency, confidentiality, application irrelevant, flexible expansion. Its development has passed three stages: single buffet filtering drive technology, dual cache filtering drive technology and virtual file system technology (Layerfsd). At present, most of the core-level encrypted manufacturers in the commercial market use single buffet filter driver technology, and small manufacturers have developed to double buffer filtering drive technology, which develops to virtual file system technology (Layerfsd) and realizing product chemical manufacturers.

Network-level intelligent dynamic plus decryption technology

Network filtering drive programming technology, commonly known as NDIS and TDI technology, filtering and controlling the network transmission protocol and network application protocol data. Currently, this type of technology is mainly applied to the related fields of firewall, VPN, network quasi-access.

Disk-level intelligent dynamic plus decryption technology

One disk-level filtering drive programming technology, also known as the FDE, FullDiskencryption, its core technology works in the underlying operating system, enacs encryption protection of all data including the hard disk including the operating system file.

Adding to the physical sector-based encryption method, all data saved on the hard disk can be encrypted, different from the file encryption, and the disk encryption can encrypt any data on the hard disk, of course, can also encrypt the operating system, non-authorized users not only Can’t see the file content on the hard disk, and you can’t see the name of any file saved on the disk! The file-level encryption method can generally obtain the file name, time of use of the encrypted file, and even obtain certain content information from temporary files, disk exchange files, and disk encryption makes all data on the hard disk in an encrypted state. People who get the encrypted hard drive cannot get any information. Because in the encrypted partition, there is no file concept at all! Don’t say information about the name and content of the file.

In order to facilitate user operation and the user’s computer usage habits, the dynamic encryption and decryption of the user is used, a data encryption and decryption program is installed between the operating system and the disk, which does not require user intervention, automatically stored The data of the disk is encrypted, and the data read from the disk is decrypted, and the user does not feel the existence of this program at all when using the computer.

Data anti-leakage product evolution

Cage cage DLP product

At this stage, the product is mainly characterized by equipment control, using logical isolation, build safety isolation containers.

Since 2000, foreign security management products have flooded into China, just starting to be conceptual guidance, slowly transformation into products, famous product manufacturers include Symantec, Landesk, 2005-2008 their market share in China has been By 80%. After 2008, as developing domestic products began to enter the market, the foreign terminal management products have been replaced by domestic products, although the market has already appeared in a full state, there is nearly 40 million yuan, from this strong Controlled terminal management products.

Shag-shaped DLP product

This stage of the product is mainly manifested as a document strong management, providing content source level depth defense capabilities; classification, grading, encryption, authorization and management of data documents

Different from terminal management, data encryption and authority control products have changed concern from devices into specific data files, more finely grained, confidential, and since 2007, there are many power in the market. Excellent manufacturers, because the national regulatory requirements, encryption products can only be obtained, password certification can be used in China, so foreign products cannot be sold, encryption and authority products in China have also have every year. The market share of about 1 billion yuan, all industries have data protection needs, although the market is competitive, but the user is still worried that the data will be kidnapped and is within the global. However, all products are now mature, very stable.

Supervision DLP product

The supervision-type product is an act of active auditing, using accurate keywords on audit of data operation behavior, new, modification, transmission, storage, and deletion of documentation.

Audit is divided into network behavior auditing and terminal behavioral auditing, network behavior audit can effectively monitor network access behavior of employee working hours, while terminal behavior audit can more targeted the operational behavior of critical data files. The audit products coexist with other networks and end products, can be complemented to each other, and the market share of market is still high, but with the continuous improvement and improvement of many networks and end products, individual behavior audit products have not been able to survive and diversify It is favored by customers.

Smart DLP products

When you get a smart product, you’re pursuing intelligent control, recognition, discovery, management, providing common-sex control capabilities.

In order to control the data more fully, the terminal management product and the encryption rights product have made a lot of combined programs, but they are all intended to control, there must be certain limitations, which cannot be applied to more complex data environments.In the case, there is a variety of data leaks in the world, and people’s attention to the importance of the data is on the content. At this time, the content of the content-sensing DLP products will be born, and the content is identified by the content.Sexual, classified by content to classify data, by content to classify data, intelligent management methods also brings convenience and flexibility.Since 2013, China has vigorously promoted the production and application of domestic DLP products. It has set a trend in the financial industry and operator industry, but domestic products are still in a germination stage, and the incorporation of products and unstable DLP domesticThe road has brought resistance, many terminals, encryption and auditors began to transform, but the real DLP products do not exceed three.