中国开发网: 论坛: 程序员情感CBD: 贴子 653061
haitao
带大括号的INI?
http://code.google.com/apis/protocolbuffers/docs/overview.html

Developer Guide
Welcome to the developer documentation for protocol buffers – a language-neutral, platform-neutral, extensible way of serializing structured data for use in communications protocols, data storage, and more.

This documentation is aimed at Java, C++, or Python developers who want to use protocol buffers in their applications. This overview introduces protocol buffers and tells you what you need to do to get started – you can then go on to follow the tutorials or delve deeper into protocol buffer encoding. API reference documentation is also provided for all three languages, as well as language and style guides for writing .proto files.

What are protocol buffers?
Protocol buffers are a flexible, efficient, automated mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages. You can even update your data structure without breaking deployed programs that are compiled against the "old" format.

How do they work?
You specify how you want the information you're serializing to be structured by defining protocol buffer message types in .proto files. Each protocol buffer message is a small logical record of information, containing a series of name-value pairs. Here's a very basic example of a .proto file that defines a message containing information about a person:

message Person {
required string name = 1;
required int32 id = 2;
optional string email = 3;

enum PhoneType {
MOBILE = 0;
HOME = 1;
WORK = 2;
}

message PhoneNumber {
required string number = 1;
optional PhoneType type = 2 [default = HOME];
}

repeated PhoneNumber phone = 4;
}
As you can see, the message format is simple – each message type has one or more uniquely numbered fields, and each field has a name and a value type, where value types can be numbers (integer or floating-point), booleans, strings, raw bytes, or even (as in the example above) other protocol buffer message types, allowing you to structure your data hierarchically. You can specify optional fields, required fields, and repeated fields. You can find more information about writing .proto files in the Protocol Buffer Language Guide.

Once you've defined your messages, you run the protocol buffer compiler for your application's language on your .proto file to generate data access classes. These provide simple accessors for each field (like query() and set_query()) as well as methods to serialize/parse the whole structure to/from raw bytes – so, for instance, if your chosen language is C++, running the compiler on the above example will generate a class called Person. You can then use this class in your application to populate, serialize, and retrieve Person protocol buffer messages. You might then write some code like this:

Person person;person.set_name("John Doe");person.set_id(1234);person.set_email("jdoe@example.com");fstream output("myfile", ios::out | ios::binary);person.SerializeToOstream(&output);Then, later on, you could read your message back in:

fstream input("myfile", ios::in | ios::binary);Person person;person.ParseFromIstream(&input);cout << "Name: " << person.name() << endl;cout << "E-mail: " << person.email() << endl;You can add new fields to your message formats without breaking backwards-compatibility; old binaries simply ignore the new field when parsing. So if you have a communications protocol that uses protocol buffers as its data format, you can extend your protocol without having to worry about breaking existing code.

You'll find a complete reference for using generated protocol buffer code in the API Reference section, and you can find out more about how protocol buffer messages are encoded in Protocol Buffer Encoding.

Why not just use XML?
Protocol buffers have many advantages over XML for serializing structured data. Protocol buffers:

are simpler
are 3 to 10 times smaller
are 20 to 100 times faster
are less ambiguous
generate data access classes that are easier to use programmatically
For example, let's say you want to model a person with a name and an email. In XML, you need to do:

<person>
<name>John Doe</name>
<email>jdoe@example.com</email>
</person>while the corresponding protocol buffer message definition (in protocol buffer text format) is:

person {
name = "John Doe"
email = "jdoe@example.com"
}In binary format, this message would probably be 28 bytes long and take around 100-200 nanoseconds to parse. The XML version is at least 69 bytes (if you remove whitespace) and would take around 5,000-10,000 nanoseconds to parse.

Also, manipulating a protocol buffer is much easier:

cout << "Name: " << person.name() << endl; cout << "E-mail: " << person.email() << endl;Whereas with XML you would have to do something like:

cout << "Name: " << person.getElementsByTagName("name")->item(0)->innerText() << endl; cout << "E-mail: " << person.getElementsByTagName("email")->item(0)->innerText() << endl;However, protocol buffers are not always a better solution than XML – for instance, protocol buffers would not be a good way to model a text-based document with markup (e.g. HTML), since you cannot easily interleave structure with text. In addition, XML is human-readable and human-editable; protocol buffers, at least in their native format, are not. XML is also – to some extent – self-describing. A protocol buffer is only meaningful if you have the message definition (the .proto file).

Sounds like the solution for me! How do I get started?
Download the package – this contains the complete source code for the Java, Python, and C++ protocol buffer compilers, as well as the classes you need for I/O and testing. To build and install your compiler, follow the instructions in the README.

Once you're all set, try following the tutorial for your chosen language – this will step you through creating a simple application that uses protocol buffers.

A bit of history
Protocol buffers were initially developed at Google to deal with an index server request/response protocol. Prior to protocol buffers, there was a format for requests and responses that used hand marshalling/unmarshalling of requests and responses, and that supported a number of versions of the protocol. This resulted in some very ugly code, like:

if (version == 3) { ... } else if (version > 4) { if (version == 5) { ... } ... }Explicitly formatted protocols also complicated the rollout of new protocol versions, because developers had to make sure that all servers between the originator of the request and the actual server handling the request understood the new protocol before they could flip a switch to start using the new protcol.

Protocol buffers were designed to solve many of these problems:

New fields could be easily introduced, and intermediate servers that didn't need to inspect the data could simply parse it and pass through the data without needing to know about all the fields.
Formats were more self-describing, and could be dealt with from a variety of languages (C++, Java, etc.)
However, users still needed to hand-write their own parsing code.

As the system evolved, it acquired a number of other features and uses:

Automatically-generated serialization and deserialization code avoided the need for hand parsing.
In addition to being used for short-lived RPC (Remote Procedure Call) requests, people started to use protocol buffers as a handy self-describing format for storing data persistently (for example, in Bigtable).
Server RPC interfaces started to be declared as part of protocol files, with the protocol compiler generating stub classes that users could override with actual implementations of the server's interface.
Protocol buffers are now Google's lingua franca for data – at time of writing, there are 48,162 different message types defined in the Google code tree across 12,183 .proto files. They're used both in RPC systems and for persistent storage of data in a variety of storage systems.









开放源码在Google
News about Google's Open Source projects and programs新闻关于Google的开放源代码的项目和计划
This Blog Google Blogs 网络 博客 资讯
This Blog


Google Blogs


网络


博客


资讯


Protocol Buffers: Google's Data Interchange Format

议定书的缓冲器: Google的数据交换格式

Monday, July 7, 2008 at 3:01 PM周一, 2008年7月7日在下午3时01分
By Kenton Varda, Software Engineering Team 由肯瓦达,软件工程队

At Google, our mission is organizing all of the world's information.在Google ,我们的使命是组织世界上所有的资料。 We use literally thousands of different data formats to represent networked messages between servers, index records in repositories, geospatial datasets, and more.我们使用的数以千计的不同数据格式为代表的网络邮件服务器之间,指数纪录,在存放,地理空间数据集,和更多。 Most of these formats are structured, not flat.大部分的这些格式的结构,而不是单位。 This raises an important question: How do we encode it all?这就提出了一个重要的问题:我们该如何编码,这一切呢?

XML? XML的呢? No, that wouldn't work.不,那是行不通的。 As nice as XML is, it isn't going to be efficient enough for this scale.作为尼斯作为XML是,这是不打算提高效率,足以让这种规模。 When all of your machines and network links are running at capacity, XML is an extremely expensive proposition.当您的所有机器和网络联系正在运行的能力, XML是一种非常昂贵的主张。 Not to mention, writing code to work with the DOM tree can sometimes become unwieldy.更不用提,写代码的工作与DOM树,有时可以成为笨拙。

Do we just write the raw bytes of our in-memory data structures to the wire?我们刚才写的原始字节的我们在记忆体的数据结构,以铁丝网? No, that's not going to work either.不,不是去工作。 When we roll out a new version of a server, it almost always has to start out talking to older servers.当我们推出了新版本的服务器,它几乎总是要开始谈老年人的服务器。 New servers need to be able to read the data produced by old servers, and vice versa, even if individual fields have been added or removed.新的服务器需要能够读取数据所产生的旧服务器,反之亦然,即使个别领域已被添加或删除。 When data on disk is involved, this is even more important.当数据在磁盘上所涉及的是,这是更重要的。 Also, some of our code is written in Java or Python, so we need a portable solution.此外,我们的一些代码是写在Java或Python ,所以我们需要一种便携式的解决办法。

Do we write hand-coded parsing and serialization routines for each data structure?我们写,另一方面编码解析,系列化例程为每个数据结构? Well, we used to.那么,我们使用。 Needless to say, that didn't last long.不用说,这并不长久。 When you have tens of thousands of different structures in your code base that need their own serialization formats, you simply cannot write them all by hand.当您有数以万计的不同的结构在您的代码基础,需要自己的系列化格式,您根本不能写他们的所有手。

Instead, we developed Protocol Buffers .反之,我们发展的议定书,缓冲器 。 Protocol Buffers allow you to define simple data structures in a special definition language, then compile them to produce classes to represent those structures in the language of your choice.议定书缓冲器可让您定义简单的数据结构处于一个特殊的定义语言,然后编译它们能够生产班,以代表这些结构在您所选择的语言。 These classes come complete with heavily-optimized code to parse and serialize your message in an extremely compact format.这些类来完成与大量优化的代码解析和序列化您的邮件在一个极其紧凑的格式。 Best of all, the classes are easy to use: each field has simple "get" and "set" methods, and once you're ready, serializing the whole thing to – or parsing it from – a byte array or an I/O stream just takes a single method call.最重要的是,该班是易于使用的:每个领域的简单“获得”和“订定”方法,当您准备好,序列化整个的事-或剖析它从-一个字节数组或一个I/ O流只需一个单一的方法调用。

OK, I know what you're thinking: "Yet another IDL ?"确定,我知道您的思想: “又一IDL的 ” ? Yes, you could call it that.是的,您可致电它。 But, IDLs in general have earned a reputation for being hopelessly complicated.但是,在一般idls赢得了良好的信誉正在绝望复杂。 On the other hand, one of Protocol Buffers' major design goals is simplicity.在另一方面,一个缓冲器和议定书的主要设计目标是简单。 By sticking to a simple lists-and-records model that solves the majority of problems and resisting the desire to chase diminishing returns, we believe we have created something that is powerful without being bloated.坚持一个简单的名单和记录模式,解决了大部分的问题和抵制的愿望,追逐收益递减的,我们相信我们已经创造了一些非常强大而不臃肿。 And, yes, it is very fast – at least an order of magnitude faster than XML.和,不错,这是非常快-至少一个数量级,速度比X ML的。

And now, we're making Protocol Buffers available to the Open Source community.现在,我们已经取得了议定书缓冲器提供给开放源代码社区。 We have seen how effective a solution they can be to certain tasks, and wanted more people to be able to take advantage of and build on this work.我们已经看到,如何有效的解决办法,他们可以以某些任务,并希望更多的人能够利用和建立在这方面的工作。 Take a look at the documentation , download the code and let us know what you think .采取看看文件 , 下载的代码 ,并让我们知道您的想法 。
Permalink 永久

Labels: protocol buffers 标签: 议定书缓冲器


4 comments: 4评论:
nico said... 尼克说, ...
Well...以及... first, it sounds good!首先,它听起来好!

July 7, 2008 4:20 PM 2008年7月7日下午4时20分
Helder Suzuki said... helder铃木说: ...
This is awesome news!!这是可怕的消息!

July 7, 2008 4:57 PM 2008年7月7日下午4时57分
David said... 大卫说, ...
This does sound good.这听起来不错。 It's great to see that Google has released this into the open, which is helping open source projects instead of keeping it as a trade secret, which would be in their best interest.它的伟大地看到, Google已经公布到这一点的开放,这是帮助开放源代码项目,而是保持它作为商业秘密,这将是他们的最佳利益。 Now we won't have to duplicate the work.现在,我们将不会有重复的工作。

July 7, 2008 5:18 PM 2008年7月7日下午5时18分
azwar akbar said... azwar阿克巴尔说, ...
wow..哇.. that's very good, I never think before about this, very nice..说的非常好,我从来不认为之前,这一点,非常好的..

July 7, 2008 7:02 PM 2008年7月7日下午7时02分
Post a Comment 张贴评论


http://google-opensource.blogspot.com/2008/07/protocol-buffers-googles-data.html


Monday, July 7, 2008 at 3:01 PM
By Kenton Varda, Software Engineering Team

At Google, our mission is organizing all of the world's information. We use literally thousands of different data formats to represent networked messages between servers, index records in repositories, geospatial datasets, and more. Most of these formats are structured, not flat. This raises an important question: How do we encode it all?

XML? No, that wouldn't work. As nice as XML is, it isn't going to be efficient enough for this scale. When all of your machines and network links are running at capacity, XML is an extremely expensive proposition. Not to mention, writing code to work with the DOM tree can sometimes become unwieldy.

Do we just write the raw bytes of our in-memory data structures to the wire? No, that's not going to work either. When we roll out a new version of a server, it almost always has to start out talking to older servers. New servers need to be able to read the data produced by old servers, and vice versa, even if individual fields have been added or removed. When data on disk is involved, this is even more important. Also, some of our code is written in Java or Python, so we need a portable solution.

Do we write hand-coded parsing and serialization routines for each data structure? Well, we used to. Needless to say, that didn't last long. When you have tens of thousands of different structures in your code base that need their own serialization formats, you simply cannot write them all by hand.

Instead, we developed Protocol Buffers. Protocol Buffers allow you to define simple data structures in a special definition language, then compile them to produce classes to represent those structures in the language of your choice. These classes come complete with heavily-optimized code to parse and serialize your message in an extremely compact format. Best of all, the classes are easy to use: each field has simple "get" and "set" methods, and once you're ready, serializing the whole thing to – or parsing it from – a byte array or an I/O stream just takes a single method call.

OK, I know what you're thinking: "Yet another IDL?" Yes, you could call it that. But, IDLs in general have earned a reputation for being hopelessly complicated. On the other hand, one of Protocol Buffers' major design goals is simplicity. By sticking to a simple lists-and-records model that solves the majority of problems and resisting the desire to chase diminishing returns, we believe we have created something that is powerful without being bloated. And, yes, it is very fast – at least an order of magnitude faster than XML.

And now, we're making Protocol Buffers available to the Open Source community. We have seen how effective a solution they can be to certain tasks, and wanted more people to be able to take advantage of and build on this work. Take a look at the documentation, download the code and let us know what you think.
Permalink

Labels: protocol buffers


4 comments:
nico said...
Well... first, it sounds good!

July 7, 2008 4:20 PM
Helder Suzuki said...
This is awesome news!!

July 7, 2008 4:57 PM
David said...
This does sound good. It's great to see that Google has released this into the open, which is helping open source projects instead of keeping it as a trade secret, which would be in their best interest. Now we won't have to duplicate the work.

July 7, 2008 5:18 PM
azwar akbar said...
wow.. that's very good, I never think before about this, very nice..

July 7, 2008 7:02 PM
Post a Comment
我的blog:http://szhaitao.blog.hexun.com & http://www.hoolee.com/user/haitao
--以上均为泛泛之谈--
不尽牛人滚滚来,无边硬伤纷纷现 人在江湖(出来的),哪能不挨刀(总归是要的)
网络对话,歧义纷生;你以为明白了对方的话,其实呢?

您所在的IP暂时不能使用低版本的QQ,请到:http://im.qq.com/下载安装最新版的QQ,感谢您对QQ的支持和使用

相关信息:


欢迎光临本社区,您还没有登录,不能发贴子。请在 这里登录