<?xml version="1.0" encoding="GBK" ?>
<rss version="2.0" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:dcterms="http://purl.org/dc/terms/">
 <channel>
  	  <title><![CDATA[kaineci的博客]]></title>
	  <link>http://kaineci.blog.163.com</link>
	  <description><![CDATA[机器学习 信息检索 数据挖掘 历史 音乐 电影 睡觉 ]]></description>
	  <language>zh-CN</language>
	  <pubDate>Thu, 3 Jul 2008 15:40:34 +0800</pubDate>
	  <lastBuildDate>Thu, 3 Jul 2008 15:40:34 +0800</lastBuildDate>
	  <docs>http://blogs.law.harvard.edu/tech/rss</docs>
	  <generator><![CDATA[NetEase Space]]></generator>
	  <managingEditor><![CDATA[kaineci]]></managingEditor>
	  <webMaster><![CDATA[kaineci]]></webMaster>
		  <ttl>120</ttl>
	  <image>
	  	<title><![CDATA[kaineci的博客]]></title>
	  	<url>http://ava.blog.163.com/photo/LGD6P3bfPBOzCcxRJo2iLQ==/577305177233717722.jpg</url>
	  	<link>http://kaineci.blog.163.com</link>
	  </image>
  <item>
  	<title><![CDATA[ego]]></title>	
    <link>http://kaineci.blog.163.com/blog/static/2689633020084655251665</link>
    <description><![CDATA[<div>In psychodynamics, the Id, Ego, and Super-Ego are the divisions of the
psyche according to psychoanalyst Sigmund Freud's "structural theory."
In 1923, Freud introduced new terms to describe the division between
the conscious and unconscious: 'id,' 'ego,' and 'super-ego.' He thought
these terms offered a more compelling description of the dynamic
relations between the conscious and the unconscious. The “id” (fully
unconscious) contains the drives and those things repressed by
consciousness; the “ego” (mostly conscious) deals with external
reality; and the “super ego” (partly conscious) is the conscience or
the internal moral judge (The Freud Exhibit: L.O.C.).<br>
<br>
<br>
In Freud's theory, the ego mediates among the id, the super-ego and the
external world. Its task is to find a balance between primitive drives,
morals, and reality while satisfying the id and superego. Its main
concern is with the individual's safety and allows some of the id's
desires to be expressed, but only when consequences of these actions
are marginal. Ego defense mechanisms are often used by the ego when id
behaviour conflicts with reality and either society's morals, norms,
and taboos or the individual's expectations as a result of the
internalization of these morals, norms, and taboos.<br>
<br>
The word ego is taken directly from Latin where it is the nominative of
the first person singular personal pronoun and is translated as "I
myself" to express emphasis. Ego is the English translation for Freud's
German term "Das Ich."<br>
<br>
In modern-day society, ego has many meanings. It could mean one’s
self-esteem; an inflated sense of self-worth; or in philosophical
terms, one’s self. However, according to the psychologist Sigmund
Freud, the ego is the part of the mind which contains the
consciousness. Originally, Freud had associated the word ego to meaning
a sense of self; however, he later revised it to mean a set of psychic
functions such as judgement, tolerance, reality-testing, control,
planning, defense, synthesis of information, intellectual functioning,
and memory.<br>
<br>
In a diagram of the Structural and Topographical Models of Mind, the
ego is depicted to be half in the consciousness, while a quarter is in
the preconscious and the other quarter lies in the unconscious.<br>
<br>
The ego is the mediator between the id and the superego; trying to
ensure that the needs of both the id and the superego are met. It is
said to operate on a reality principle, meaning it deals with the id
and the superego; allowing them to express their desires, drives and
morals in realistic and socially appropriate ways. It is said that the
ego stands for reason and caution, developing with age. Sigmund Freud
had used an analogy which likened the ego to a rider and a horse; the
ego being the rider while the id being the horse. The horse provides
the energy and the means of obtaining the energy and information need,
while the rider ultimately controls the direction it wants to go.
However, due to unfavorable conditions, sometimes the horse makes its
own decisions over the rocky terrain.<br>
<br>
When the ego is personified, it is like a slave to three harsh masters:
the id, the super-ego and the external world. It has to do its best to
suit all three, thus is constantly feeling hemmed by the danger of
causing discontent on two other sides. It is said however, that the ego
seems to be more loyal to the id, preferring to gloss over the finer
details of reality to minimize conflicts with the pretending to have a
regard for reality. But the super-ego is constantly watching every one
of the egos’ moves and punishes it with feelings of guilt, anxiety, and
inferiority. To overcome this, this ego employs method of defense
mechanism. Denial, displacement, intellectualization, fantasy,
compensation, projection, rationalization, reaction formation,
regression, repression and sublimation were the defense mechanisms
Freud identified. However, his daughter Anna Freud clarified and
identified the concepts of: undoing, suppression, dissociation,
idealization, identification, introjection, inversion, somatization,
splitting and substitution.</div>]]></description>
	    <author><![CDATA[kaineci]]></author>
	    <comments>http://kaineci.blog.163.com/blog/static/2689633020084655251665</comments>
    <slash:comments>1</slash:comments>
    <guid isPermaLink="true">http://kaineci.blog.163.com/blog/static/2689633020084655251665</guid>
    <pubDate>Tue, 6 May 2008 17:52:51 +0800</pubDate>
    <dcterms:modified>2008-05-06T17:52:51+08:00</dcterms:modified>
  </item>    
  <item>
  	<title><![CDATA[对新奇无休止的迷恋  ]]></title>	
    <link>http://kaineci.blog.163.com/blog/static/26896330200831812316136</link>
    <description><![CDATA[<div><br>&nbsp;选自王力雄散文集《自由人心路》&nbsp;&nbsp;<br>
&nbsp;&nbsp;<br>
    俄罗斯著名作家索尔仁尼琴在去年获美国国家艺术俱乐部文学荣誉奖章时，发&nbsp;&nbsp;<br>
表的演讲题目是：“对新奇无休止的迷恋——我们这个世纪的劫难”. 在演讲中，&nbsp;&nbsp;<br>
索尔仁尼琴抨击了本世纪以来俄罗斯乃至全世界文学艺术领域的以“迫不及待的喧&nbsp;&nbsp;<br>
闹为特征”、置身于“自编自演式的自我欺骗”中的种种“未来派”、“前卫主义”&nbsp;&nbsp;<br>
和“后现代派”. 他认为那些主义安身立命的基础是“对于新奇的无休无止的追求”&nbsp;&nbsp;<br>
.对此，他表示了不加掩饰的反感。他如是说：“这种认为艺术并不需要优美和纯粹，&nbsp;&nbsp;<br>
只要它不停地革新、革新、再革新的观念，它们所掩藏的，是一种不屈不挠并且由&nbsp;&nbsp;<br>
来已久的企图：毁坏、推倒、嘲笑，并连根拔除一切伦理道德原则。没有上帝、没&nbsp;&nbsp;<br>
有真理，宇宙是一片混乱，一切都是相对的”. 它们在本质上是“对于一切内心生&nbsp;&nbsp;<br>
活和精神生活的根深蒂固的敌视”，于是，“否定一切和否定所有的理想被视为一&nbsp;&nbsp;<br>
种勇敢的举动”，“毁坏成了这种倨傲不逊的主张所尊奉的最高信念”. 索尔仁尼&nbsp;&nbsp;<br>
琴对“迷恋新奇”的文学艺术创作也予以否定。他认为，除了那群“迫不及待的革&nbsp;&nbsp;<br>
新者们不绝于耳的自我赞美”之外，没看见“任何有实在价值的创造”，“形式的&nbsp;&nbsp;<br>
翻新自身成了一个目标，并日趋空洞……技巧上的粗劣与作品本身含义的模糊不清&nbsp;&nbsp;<br>
达成一种高度的溶合，以至于完全不知所云”，其作品“大多注重于表现个人对于&nbsp;&nbsp;<br>
周围环境的细微感受，对社会的伤痕和疾病，却流露出一份彻底的漠不关心……忽&nbsp;&nbsp;<br>
略生命中更高的意义，用一种相对主义的态度看待各种概念和文化自身”. 索尔仁&nbsp;&nbsp;<br>
尼琴对这种二十世纪的文化现象所作的结论是：一方面，它导致了人类当代文学艺&nbsp;&nbsp;<br>
术的日趋低下，“一个人对于自己的作品施加越少的限制，他的作品也就越少拥有&nbsp;&nbsp;<br>
获得艺术成功的希望。缺乏一种责任精神和一种内在的组织力量，将导致一件作品&nbsp;&nbsp;<br>
的结构、意义乃至于艺术价值本身趋于平淡，直至完全消失”；另一方面，由于这&nbsp;&nbsp;<br>
种“混乱、急迫而又无聊的‘新奇’”发出对“全部传统生活方式的刺耳诅咒；对&nbsp;&nbsp;<br>
于一切宗教及伦理规范的全面宣战；以及对于彻底摧毁并践踏全部现有文化传统的&nbsp;&nbsp;<br>
高声鼓吹”，使得整个世界“挣扎在一场精神疾病中”，出现了“极其危险的全人&nbsp;&nbsp;<br>
类的精神下坠”，使得“崇高的精神和道德理想在持续地衰落和解体，生命中的精&nbsp;&nbsp;<br>
神支柱变得模糊”，由此导致了一场“人类朝着动物方向复归的反进化”. （以上&nbsp;&nbsp;<br>
引号内皆为索尔仁尼琴语，王昭阳译，摘自《倾向》杂志创刊号）&nbsp;&nbsp;<br>
&nbsp;&nbsp;<br>
    索尔仁尼琴演讲的重点是针对文学艺术，他的批评是否过激暂且不论，但我认&nbsp;&nbsp;<br>
为他提出的问题远远超过文学艺术，揭示了当今人类一种整体性的病症。从这一点&nbsp;&nbsp;<br>
出发，我想借他这个富有启发性的题目，把话题继续展开。&nbsp;&nbsp;<br>
&nbsp;&nbsp;<br>
    ◎“生命的精神支柱”是什么&nbsp;&nbsp;<br>
&nbsp;&nbsp;<br>
    人与其他物种的不同之处，在于人同时生活在两个世界——物质世界和精神世&nbsp;&nbsp;<br>
界，而其他物种只生活在物质世界。精神世界唯人独有，因而是人的本质所在。&nbsp;&nbsp;<br>
&nbsp;&nbsp;<br>
    有一种现象能帮助我们认识精神世界对人的作用：有的人在物质生活上应有尽&nbsp;&nbsp;<br>
有，可是并不幸福；相反，有的人物质生活清贫得多，却活得很快乐。是什么决定&nbsp;&nbsp;<br>
他们的幸福与否呢？可知不取决于物质世界。仅有物质世界的富裕，除了手段和程&nbsp;&nbsp;<br>
度差别，人与动物没有根本不同，也不足以使人生活得美好。人的生存状态怎么样，&nbsp;&nbsp;<br>
归根结底取决于人的精神世界是否满足。当然，不可能完全脱离物质条件，如果衣&nbsp;&nbsp;<br>
食温饱与生存安全尚受威胁，很少有人能把精神世界放在首位。民间语言的形容是&nbsp;&nbsp;<br>
“没有肚子哪有脸”. 然而当基本的温饱与安全得到满足，评价人生的感受就将主&nbsp;&nbsp;<br>
要来自精神世界。&nbsp;&nbsp;<br>
&nbsp;&nbsp;<br>
    人的精神生命没有肠胃或生殖器那种物质生命的客观载体，因而精神世界的组&nbsp;&nbsp;<br>
成和满足都没有客观性与实在的对象。它只能在看不见、摸不着的空无中发展，并&nbsp;&nbsp;<br>
非仅仅是客观的“镜象”，必须重新组织，有自身的条理，建立一个与物质世界不&nbsp;&nbsp;<br>
同的结构，并产生出超越肉体、完全属于自己的目标和追求，以及相应的运行机制，&nbsp;&nbsp;<br>
最终使精神世界成为独立的主体。那么精神世界依据什么进行组织，遵循哪些条理，&nbsp;&nbsp;<br>
确立什么样的目标，怎样约束和管理肉体生命的因素……这一切的根本依托与核心，&nbsp;&nbsp;<br>
在我来看，就是生活的意义与价值判断，也就是索尔仁尼琴所称的“生命的精神支&nbsp;&nbsp;<br>
柱”. 可想而知，没有意义和价值的凝聚，在“空无”中飘忽不定的精神只能是发&nbsp;&nbsp;<br>
散的（通常描述心态的“空虚”二字形象地传达了那种感觉），就无法产生和维系&nbsp;&nbsp;<br>
一个精神世界，人类就只能停留在（或退回到）只有物质世界的动物状态。所以，&nbsp;&nbsp;<br>
自打精神之光开始照亮人类心智，绵延至今，人类在精神世界里锲而不舍地所作的&nbsp;&nbsp;<br>
最大努力，就是对意义与价值“路漫漫其修远兮”的“上下求索”。&nbsp;&nbsp;<br>
&nbsp;&nbsp;<br>
    ◎平衡即美好&nbsp;&nbsp;<br>
&nbsp;&nbsp;<br>
    评价人在物质世界生活好坏，有可以量化的客观标准——如卡路里、工资额、&nbsp;&nbsp;<br>
恩格尔系数、住房面积等。这些标准可以通过人在物质世界的生产或发展来实现，&nbsp;&nbsp;<br>
并不断推动其增长。然而人在精神世界的生活状态，却没有客观标准，也不依赖于&nbsp;&nbsp;<br>
增长，取决的是另一种完全无形的因素——平衡。&nbsp;&nbsp;<br>
&nbsp;&nbsp;<br>
    什么是平衡？如果在一个人的精神世界里，意义与价值体系稳定，目标明确，&nbsp;&nbsp;<br>
信念坚定，知与行高度统一，没有动摇主体的两难、分裂和困惑，没有摧毁性的冲&nbsp;&nbsp;<br>
突，没有绝望的窒息，也没有难以承受之轻的空虚，在我看来就是平衡，而反之就&nbsp;&nbsp;<br>
是失衡。&nbsp;&nbsp;<br>
&nbsp;&nbsp;<br>
    平衡不在于意义必须多么伟大或价值多么崇高。“三十亩地一头牛，老婆孩子&nbsp;&nbsp;<br>
热炕头”的生活对于一个淳朴农民而言，其中的意义和价值可能足以使他平衡，感&nbsp;&nbsp;<br>
到幸福与满足；而一个君临天下的帝王，精神世界却可能在失衡中痛苦不堪。大小&nbsp;&nbsp;<br>
高低不重要，关键在是否平衡。&nbsp;&nbsp;<br>
&nbsp;&nbsp;<br>
    但帝王不可能退到农民的生活状态去获得农民的平衡。每个人的社会角色、文&nbsp;&nbsp;<br>
化修养、生活环境的不同，使他们注定要以不同的意义、价值获得平衡，不可互换。&nbsp;&nbsp;<br>
社会角色的多样化决定了需要提供多种价值与意义才能支持不同的平衡状态；同时，&nbsp;&nbsp;<br>
社会又是一个互动结构，不同的社会角色相互关联，平衡状态很少能一个人独立实&nbsp;&nbsp;<br>
现，很大程度要取决与他人、社会之间的相互支持。这就决定了无论从个人的角度&nbsp;&nbsp;<br>
还是社会角度，获得平衡所倚赖的意义与价值，都需要有一个结构性体系。在那个&nbsp;&nbsp;<br>
体系里，深能追溯到终极意义，使精神得以升华到宗教意识境界，实现对有限时间&nbsp;&nbsp;<br>
和空间的超越；广能囊括各种社会角色所需要的价值和意义，建立协调整个社会的&nbsp;&nbsp;<br>
伦理原则与道德准绳。当这种深度广度都具备，从“修身”到“齐家”到“治国”&nbsp;&nbsp;<br>
到“平天下”，就都容纳进一个无所不包的整体结构。&nbsp;&nbsp;<br>
&nbsp;&nbsp;<br>
    那个体系肯定是相当庞大的，不指望也不需要人们从整体上把握。每个社会角&nbsp;&nbsp;<br>
色只要把握住支持自身的意义和价值就足够了。然而却不能因此就不求体系完整。&nbsp;&nbsp;<br>
只有建立一个完整体系，才能使社会背景、文化修养、生活状态都不同的人群在一&nbsp;&nbsp;<br>
个有序的构架中各归其位，实现整体平衡。局部的平衡只有基于社会精神结构的整&nbsp;&nbsp;<br>
体平衡才能达到，而整体的平衡，又需要局部的平衡互为补充，才能真正实现。&nbsp;&nbsp;<br>
&nbsp;&nbsp;<br>
    ◎“保守”之可贵&nbsp;&nbsp;<br>
&nbsp;&nbsp;<br>
    意义与价值的体系是需要百年甚至千年进化才有可能形成的，它不仅需要多少&nbsp;&nbsp;<br>
代人接力式的苦思冥想，还需要更为漫长的传播教化，磨合调整，直到溶入文明的&nbsp;&nbsp;<br>
血脉，化作整个民族的集体潜意识。在这个过程中；发展当然是不可少的，而继承&nbsp;&nbsp;<br>
却更为重要。世界所有的大文明，其精神体系无不至少传承千年以上。尊重传统，&nbsp;&nbsp;<br>
怀古惜旧之所以在历史上成为一直被推崇的美德，重要原因就在于其有助于保持人&nbsp;&nbsp;<br>
的精神世界和人类社会的平衡与稳定。&nbsp;&nbsp;<br>
&nbsp;&nbsp;<br>
    如果这种对传统的尊重和继承被称为“保守主义”的话，我觉得“保守”非但&nbsp;&nbsp;<br>
不该受到当代人先入为主的那种普遍鄙视，而且应当格外小心地珍惜。只有继承才&nbsp;&nbsp;<br>
有发展，有继承的发展是循序而进，有继承的变化是以“道”生“道”，从而可以&nbsp;&nbsp;<br>
在变化中实现平衡的最佳状态——动态平衡。即使从“进步”的角度评价，没有前&nbsp;&nbsp;<br>
人的“落后”，又如何有今天的先进？至少因为有了前人，我们不用再从蒙昧的黑&nbsp;&nbsp;<br>
暗开始摸索，我们有了步步上升的基础和参照。从这个意义上讲，人类应该永远以&nbsp;&nbsp;<br>
感激之心对待过去，为自己受到前人遗产的恩泽心存感激，而不是把前人当作敌人&nbsp;&nbsp;<br>
和鄙视的对象。&nbsp;&nbsp;<br>
&nbsp;&nbsp;<br>
    同时，健康的保守主义应该不墨守成规，能不断以平衡的运动和变化自觉改善&nbsp;&nbsp;<br>
传统的不适之处，面向未来，鼓励自由探索，给后辈人开拓出广阔的发展空间。遗&nbsp;&nbsp;<br>
憾的是历史上的保守主义往往不能保持这种平衡，太多地倾斜到僵化与教条的一端，&nbsp;&nbsp;<br>
成为社会发展的障碍和扼杀自由精神的势力。在那种僵化的保守主义占据主宰地位&nbsp;&nbsp;<br>
时，激烈的反传统和抨击保守有积极意义。然而今天，倾斜的方向已从整体上发生&nbsp;&nbsp;<br>
了颠倒，尽管僵化的保守主义常常在局部显得更为极端、愚蠢和失衡，但作为保证&nbsp;&nbsp;<br>
平衡不可缺少的另一端，保守已在人类文明的大天平上全面萎缩和消解，而“革新”&nbsp;&nbsp;<br>
的大潮却继续着一浪高过一浪的势头，从这种失衡中，正在令人不安地展现出索尔&nbsp;&nbsp;<br>
仁尼琴所称的“世纪劫难”。&nbsp;&nbsp;<br>
&nbsp;&nbsp;<br>
    ◎变易——我们这个世纪的偶像&nbsp;&nbsp;<br>
&nbsp;&nbsp;<br>
    处身世纪末之刻，回首人类的二十世纪，一片波涛起伏，天翻地覆，令人眼花&nbsp;&nbsp;<br>
缭乱，惊心动魄，感慨万千。尤其是我们中国，从新政到辛亥，从五四到北伐，从&nbsp;&nbsp;<br>
军阀混战到全民抗日，从国共易帜到社会主义改造，从反右到四清到文化革命，再&nbsp;&nbsp;<br>
到今天的商业化大潮，很难想象历史上还有哪个时代的人能经历这么多变化。“革&nbsp;&nbsp;<br>
命”是这个世纪最辉煌的词汇，“新”是最具魅力的字眼，“进步”的口号响彻云&nbsp;&nbsp;<br>
霄，“改革”成了圣经，只要是“先进”就倍受羡慕，“现代化”则更是梦寐以求。&nbsp;&nbsp;<br>
与之相应的，是一系列“砸烂”、“推翻”、“消灭”和“决裂”，构成我们这个&nbsp;&nbsp;<br>
世纪自始至终的主旋律。&nbsp;&nbsp;<br>
&nbsp;&nbsp;<br>
    今天，层出不穷的政治神话和意识形态已大多成了明日黄花，然而对变易和新&nbsp;&nbsp;<br>
奇的追求不但没有终止，反而随着商业社会的发展更为广泛和浅薄地扩散。整个社&nbsp;&nbsp;<br>
会马不停蹄地投入追赶新潮的比赛，标新立异是荣耀，喜新厌旧是哲学，商业社会&nbsp;&nbsp;<br>
不用刀枪，却能用“时髦”实施最广泛的“专政”. 一切都是新的好。女人换衣服；&nbsp;&nbsp;<br>
男人换女人；产品未用就被换代；名人轮番上上下下；传统成了贬义词；谈道德已&nbsp;&nbsp;<br>
是老古董；评价学说或艺术的标准是新旧；话语一旦说到众人都懂（旧）就“过时”；&nbsp;&nbsp;<br>
连人自身，也仅仅为新（年轻）就趾高气昂，为旧（年老）就灰心丧气……&nbsp;&nbsp;<br>
&nbsp;&nbsp;<br>
    全人类都染上了这种疾病：把贬损过去当作进步的动力，把敌视前人、传统当&nbsp;&nbsp;<br>
作光荣，将静止或哪怕稍微慢一点视为耻辱与落后，为进步而进步，为变化而变化，&nbsp;&nbsp;<br>
对进步和变化的意义却茫然地不知不问，似乎只要不停地与过去决裂，变、变、变、&nbsp;&nbsp;<br>
革新、革新、再革新，就能获得成功、幸福以及光明可靠的未来。&nbsp;&nbsp;<br>
&nbsp;&nbsp;<br>
    ◎“世纪劫难”&nbsp;&nbsp;<br>
&nbsp;&nbsp;<br>
    平衡与变化并非矛盾，僵死的平衡与物质生命和精神生命的运动本质不相容。&nbsp;&nbsp;<br>
从这个意义上说，变化本是平衡的必要条件。然而变化若过于频繁剧烈，失衡就必&nbsp;&nbsp;<br>
然会发生。自然界如此，恐龙由于环境的突变而灭绝，地球生态的平衡被人类工业&nbsp;&nbsp;<br>
化的过快进程所打破；同样，人类社会和人的精神世界也如此。&nbsp;&nbsp;<br>
&nbsp;&nbsp;<br>
    在我们不断用今天否定昨天，对“旧”嗤之以鼻，揣揣不安地担心被“新”淘&nbsp;&nbsp;<br>
汰的时刻，难道我们能从无止境地追求变易中得到幸福吗？在那看似鼓满风帆、竞&nbsp;&nbsp;<br>
航于百舸争流行列的人生之船下，只是疲惫的随波逐流而已，实际上失掉了自己把&nbsp;&nbsp;<br>
握航向和获得安宁的舵与锚。多变使人惶惑、疲倦、失去自信、日益浮躁，把时间&nbsp;&nbsp;<br>
精力消耗于不断追赶、适应、为他人而活，最终还是免不了被“后浪”带着刺耳的&nbsp;&nbsp;<br>
嘲笑掩盖，成为筋疲力尽的“老古董”被淘汰出局。&nbsp;&nbsp;<br>
&nbsp;&nbsp;<br>
    以“新”为价值标准的人生将导致一个无解的悲剧结局：无论其他的一切怎样&nbsp;&nbsp;<br>
更新，人自身的“旧”——衰老以至死亡，却是不可逃脱也无法更新的。由此，以&nbsp;&nbsp;<br>
“新”为价值标准，就只能使人生成为一个先吃“大苹果”的下坡过程，成为被存&nbsp;&nbsp;<br>
在主义百思而不得其解的荒谬。&nbsp;&nbsp;<br>
&nbsp;&nbsp;<br>
    否定和毁坏的癖好摧毁了人的信仰，使一切真理变为虚无，使世界成为任人解&nbsp;&nbsp;<br>
读的“文本”. 青年时所学的一切到中年时得知都不对，中年时所做的一切到老年&nbsp;&nbsp;<br>
又发现全愚蠢。我们的前辈也包括我们自己都有这样的亲历。即使今天我们自以为&nbsp;&nbsp;<br>
的“正确”，到明天又可能照样被否得一无是处。这种失去真理支持的人生，最后&nbsp;&nbsp;<br>
的盖棺论定只剩下无意义的徒劳。&nbsp;&nbsp;<br>
&nbsp;&nbsp;<br>
    人类的物质世界在快速变化中越来越富有，精神世界却在快速变化中越来越贫&nbsp;&nbsp;<br>
乏。旧的否定了、丢掉了，新的却扎不下根、又长不起来。两个世界日益分裂失衡，&nbsp;&nbsp;<br>
人类集体地陷入找不到意义的空虚，这就是人的物质生活越来越富有，精神生活却&nbsp;&nbsp;<br>
越来越烦恼不安的根源所在。空虚促使人制造更多的变化以求填充，企图用“新奇”&nbsp;&nbsp;<br>
打发产生于空虚的无聊和“没劲”. 这种求变会象吸毒一样陷入恶性循环，周期越&nbsp;&nbsp;<br>
来越短，“新奇”的变换越来越快，从而使传统的意义和价值体系瓦解得越发散碎。&nbsp;&nbsp;<br>
而在日益散碎的基础上，建立新体系的尝试则越发失去立足的可能。&nbsp;&nbsp;<br>
&nbsp;&nbsp;<br>
    没有体系搭起的阶梯，终极性的意义与价值是不可能独立存在的，人生追求就&nbsp;&nbsp;<br>
只能停留在一些表层价值上——如成功、有钱、地位等。那种追求可以使人很忙碌，&nbsp;&nbsp;<br>
却寄托不了意义。成功者、有钱或有地位的人与其说得到了幸福和充实，不如说更&nbsp;&nbsp;<br>
多的是劳累、厌倦和永无满足地继续匆匆向前。&nbsp;&nbsp;<br>
&nbsp;&nbsp;<br>
    人不可能不追求，没有深层意义的追求，就只有追求表层价值——成功、富裕&nbsp;&nbsp;<br>
和权力。一旦这些价值被所有人共同追求，就一定导致人与人的广泛争斗，因为这&nbsp;&nbsp;<br>
些表层价值都是以胜过他人才能获得的，并且必须以他人的反衬才能显现。尤其在&nbsp;&nbsp;<br>
人口密度增加、经济一体化越来越强的今日世界，人际互动发生得越来越多，倚赖&nbsp;&nbsp;<br>
日益增强，而能提供成功、富裕和权力的资源却越来越紧张。意义与价值体系的瓦&nbsp;&nbsp;<br>
解势必会导致欲望过度，人际关系普遍紊乱，摩擦、冲突和敌对增加，经过阶梯式&nbsp;&nbsp;<br>
的积累、富集和传递，形成政治、经济、社会方面的危机，最终导致整个社会发生&nbsp;&nbsp;<br>
动荡，失去平衡。&nbsp;&nbsp;<br>
&nbsp;&nbsp;<br>
    说到底，今天人类社会的根本危机就在于意义与价值体系的解体。然而我们对&nbsp;&nbsp;<br>
此视而不见，充满我们视野并使我们穷于应付的政治、经济与社会的危机，其实只&nbsp;&nbsp;<br>
是从这个地下之“根”长出地面的“枝干”而已。&nbsp;&nbsp;<br>
&nbsp;&nbsp;<br>
    ◎精神的洪荒时代&nbsp;&nbsp;<br>
&nbsp;&nbsp;<br>
    今天，无论是东方西方，精神的大厦都在崩塌。男男女女们陷入难以自拔的精&nbsp;&nbsp;<br>
神危机，心理疾病以前所未有的速度蔓延；人与人日益相互为敌，道德沦丧，冷漠&nbsp;&nbsp;<br>
残忍；宗教信仰萎靡，精神理想破灭，真理和信念被相对主义腐蚀，人生意义变得&nbsp;&nbsp;<br>
越来越虚无；我们对客观世界的认识达到从未有过的广泛深入，而自己的内心世界&nbsp;&nbsp;<br>
却变得日益模糊，与我们形同陌路；我们把信息当知识，把知识当智慧；这世界失&nbsp;&nbsp;<br>
去了思想的巨人，只剩土拨鼠似的侏儒学究和畸形专家忙忙碌碌，由半文盲的歌星&nbsp;&nbsp;<br>
充当哲学教师；作茧自缚的法律取代了道德良心，武装到牙齿的警察成了维持太平&nbsp;&nbsp;<br>
的唯一保证；我们感到了危机存在，然而我们至今仍只限于索尔仁尼琴所形容的那&nbsp;&nbsp;<br>
样——“用一种以灵巧和机智作为骨架的木马来同这种危机赛跑”，那木马不管是&nbsp;&nbsp;<br>
载着机关算尽的权谋，还是“跟着感觉走”的自慰，还是在意识形态宣传中升华出&nbsp;&nbsp;<br>
来的光辉形象，都是注定要在这场不成对手的赛跑中散架的。&nbsp;&nbsp;<br>
&nbsp;&nbsp;<br>
    这片荒原中已没有通路。我们在“新奇”的诱惑下冲得太远，支持人类上千年&nbsp;&nbsp;<br>
的意义和价值体系已经断裂在我们身后不可复归的鸿沟另一端。只有向前，去向未&nbsp;&nbsp;<br>
知的新天地探索，寻找走出荒原的道路，有人说人类的今天平庸无奇，商业与法律&nbsp;&nbsp;<br>
统治的社会将不会再出英雄。然而人类的精神世界却全然是另一番景象。那里黑暗&nbsp;&nbsp;<br>
荒蛮，风雨交加，猛兽狰狞，那里重新面临天地初开的神话时期，等待着女娲、丹&nbsp;&nbsp;<br>
柯、普罗米修斯和夸父一般顶天立地的英雄，去披荆斩棘、战胜猛兽、驯服洪水和&nbsp;&nbsp;<br>
寻找光明，那里将厮杀得尸横遍野，血流成河！&nbsp;&nbsp;<br>
&nbsp;&nbsp;<br>
    通向终极关怀的道路怎么可能平庸？！</div>]]></description>
	    <author><![CDATA[kaineci]]></author>
	    <comments>http://kaineci.blog.163.com/blog/static/26896330200831812316136</comments>
    <slash:comments>0</slash:comments>
    <guid isPermaLink="true">http://kaineci.blog.163.com/blog/static/26896330200831812316136</guid>
    <pubDate>Fri, 18 Apr 2008 13:23:16 +0800</pubDate>
    <dcterms:modified>2008-04-18T13:23:16+08:00</dcterms:modified>
  </item>    
  <item>
  	<title><![CDATA[disk as the new RAM]]></title>	
    <link>http://kaineci.blog.163.com/blog/static/268963302008382590661</link>
    <description><![CDATA[<div>Northeastern Professor Gene Cooperman recently gave a curious Google engEdu tech talk, "<a href="http://www.youtube.com/watch?v=WQw7c-PliB4">Disk-Based Parallel Computation, Rubik's Cube, and Checkpointing</a>".<br><br>Gene's
starting point is that "disk is the new RAM" and the "disks of a
cluster can serve as if they were RAM" because the bandwidth to 50
disks is 5G/second, same as the bandwidth to RAM.<br><br>The talk just
gets more fun from there, with Gene claiming that "a compute cluster
with 32 quad core nodes, each with 500G of local disk, is a good
approximation of ... a single computer with 10 terabytes of RAM and 200
CPU cores."<br><br>The premise is, of course, outlandish. The obvious
issue to come up is that the latency characteristics of 10T of RAM is
totally different than the latency characteristics of 32 500G disks.<br><br>But,
as long as long as we can batch the reads and writes to the disk, this
difference does not matter. Gene gives a few classes of algorithms --
breadth first state-space search, some algorithms that involve millions
of accesses to hash tables, some types of pointer chasing -- that they
have found amenable to the model.<br><br>This has parallels to
MapReduce and the changes we need to do to algorithms to make them work
well in a MapReduce framework, as one Googler pointed out during the
Q&amp;A</div>]]></description>
	    <author><![CDATA[kaineci]]></author>
	    <comments>http://kaineci.blog.163.com/blog/static/268963302008382590661</comments>
    <slash:comments>0</slash:comments>
    <guid isPermaLink="true">http://kaineci.blog.163.com/blog/static/268963302008382590661</guid>
    <pubDate>Tue, 8 Apr 2008 14:59:00 +0800</pubDate>
    <dcterms:modified>2008-04-08T14:59:00+08:00</dcterms:modified>
  </item>    
  <item>
  	<title><![CDATA[内存池]]></title>	
    <link>http://kaineci.blog.163.com/blog/static/26896330200823161120591</link>
    <description><![CDATA[<div><P style="TEXT-INDENT: 2em">
</P><P style="TEXT-INDENT: 2em"><A >&nbsp;一个内存池的实现实例</A></P>
<P style="TEXT-INDENT: 2em"><A >内部构造</A></P>
<P style="TEXT-INDENT: 2em">内存池类MemoryPool的声明如下：</P>
<P style="TEXT-INDENT: 2em">
<TABLE cellSpacing=0 cellPadding=0 width="100%" border=0>
<TBODY>
<TR>
<TD>
<P></P>
<P style="TEXT-INDENT: 2em">class MemoryPool{private: MemoryBlock* pBlock; USHORT nUnitSize; USHORT nInitSize; USHORT nGrowSize;public: MemoryPool( USHORT nUnitSize, USHORT nInitSize = 1024, USHORT nGrowSize = 256 ); ~MemoryPool(); void* Alloc(); void Free( void* p );};</P></TD></TR></TBODY></TABLE></P>
<P style="TEXT-INDENT: 2em">MemoryBlock为内存池中附着在真正用来为内存请求分配内存的内存块头部的结构体，它描述了与之联系的内存块的使用信息：</P>
<P style="TEXT-INDENT: 2em">
<TABLE cellSpacing=0 cellPadding=0 width="100%" border=0>
<TBODY>
<TR>
<TD>
<P></P>
<P style="TEXT-INDENT: 2em">struct MemoryBlock{ USHORT nSize; USHORT nFree; USHORT nFirst; USHORT nDummyAlign1; MemoryBlock* pNext; char aData[1]; static void* operator new(size_t, USHORT nTypes, USHORT nUnitSize) { return ::operator new(sizeof(MemoryBlock) + nTypes * nUnitSize); } static void operator delete(void *p, size_t) { ::operator delete (p); } MemoryBlock (USHORT nTypes = 1, USHORT nUnitSize = 0); ~MemoryBlock() {}};</P></TD></TR></TBODY></TABLE></P>
<P style="TEXT-INDENT: 2em">此内存池的数据结构如图所示。</P>
<P style="TEXT-INDENT: 2em"><A >&nbsp;内存池的数据结构</A></P>
<P style="TEXT-INDENT: 2em"><IMG src="http://www.ibm.com/developerworks/cn/linux/l-cn-ppp/images6/6_2.gif" border=0> </P>
<P style="TEXT-INDENT: 2em"><A >&nbsp;总体机制</A></P>
<P style="TEXT-INDENT: 2em">此内存池的总体机制如下。</P>
<P style="TEXT-INDENT: 2em">（1）在运行过程中，MemoryPool内存池可能会有多个用来满足内存申请请求的内存块，这些内存块是从进程堆中开辟的一个较大的连续内存区域，它由一个MemoryBlock结构体和多个可供分配的内存单元组成，所有内存块组成了一个内存块链表，MemoryPool的pBlock是这个链表的头。对每个内存块，都可以通过其头部的MemoryBlock结构体的pNext成员访问紧跟在其后面的那个内存块。</P>
<P style="TEXT-INDENT: 2em">（2）每个内存块由两部分组成，即一个MemoryBlock结构体和多个内存分配单元。这些内存分配单元大小固定（由MemoryPool的nUnitSize表示），MemoryBlock结构体并不维护那些已经分配的单元的信息；相反，它只维护没有分配的自由分配单元的信息。它有两个成员比较重要：nFree和nFirst。nFree记录这个内存块中还有多少个自由分配单元，而nFirst则记录下一个可供分配的单元的编号。每一个自由分配单元的头两个字节（即一个USHORT型值）记录了紧跟它之后的下一个自由分配单元的编号，这样，通过利用每个自由分配单元的头两个字节，一个MemoryBlock中的所有自由分配单元被链接起来。</P>
<P style="TEXT-INDENT: 2em">（3）当有新的内存请求到来时，MemoryPool会通过pBlock遍历MemoryBlock链表，直到找到某个MemoryBlock所在的内存块，其中还有自由分配单元（通过检测MemoryBlock结构体的nFree成员是否大于0）。如果找到这样的内存块，取得其MemoryBlock的nFirst值（此为该内存块中第1个可供分配的自由单元的编号）。然后根据这个编号定位到该自由分配单元的起始位置（因为所有分配单元大小固定，因此每个分配单元的起始位置都可以通过编号分配单元大小来偏移定位），这个位置就是用来满足此次内存申请请求的内存的起始地址。但在返回这个地址前，需要首先将该位置开始的头两个字节的值（这两个字节值记录其之后的下一个自由分配单元的编号）赋给本内存块的MemoryBlock的nFirst成员。这样下一次的请求就会用这个编号对应的内存单元来满足，同时将此内存块的MemoryBlock的nFree递减1，然后才将刚才定位到的内存单元的起始位置作为此次内存请求的返回地址返回给调用者。</P>
<P style="TEXT-INDENT: 2em">（4）如果从现有的内存块中找不到一个自由的内存分配单元（当第1次请求内存，以及现有的所有内存块中的所有内存分配单元都已经被分配时会发生这种情形），MemoryPool就会从进程堆中申请一个内存块（这个内存块包括一个MemoryBlock结构体，及紧邻其后的多个内存分配单元，假设内存分配单元的个数为n，n可以取值MemoryPool中的nInitSize或者nGrowSize），申请完后，并不会立刻将其中的一个分配单元分配出去，而是需要首先初始化这个内存块。初始化的操作包括设置MemoryBlock的nSize为所有内存分配单元的大小（注意，并不包括MemoryBlock结构体的大小）、nFree为n-1（注意，这里是n-1而不是n，因为此次新内存块就是为了满足一次新的内存请求而申请的，马上就会分配一块自由存储单元出去，如果设为n-1，分配一个自由存储单元后无须再将n递减1），nFirst为1（已经知道nFirst为下一个可以分配的自由存储单元的编号。为1的原因与nFree为n-1相同，即立即会将编号为0的自由分配单元分配出去。现在设为1，其后不用修改nFirst的值），MemoryBlock的构造需要做更重要的事情，即将编号为0的分配单元之后的所有自由分配单元链接起来。如前所述，每个自由分配单元的头两个字节用来存储下一个自由分配单元的编号。另外，因为每个分配单元大小固定，所以可以通过其编号和单元大小（MemoryPool的nUnitSize成员）的乘积作为偏移值进行定位。现在唯一的问题是定位从哪个地址开始？答案是MemoryBlock的aData[1]成员开始。因为aData[1]实际上是属于MemoryBlock结构体的（MemoryBlock结构体的最后一个字节），所以实质上，MemoryBlock结构体的最后一个字节也用做被分配出去的分配单元的一部分。因为整个内存块由MemoryBlock结构体和整数个分配单元组成，这意味着内存块的最后一个字节会被浪费，这个字节在图中用位于两个内存的最后部分的浓黑背景的小块标识。确定了分配单元的起始位置后，将自由分配单元链接起来的工作就很容易了。即从aData位置开始，每隔nUnitSize大小取其头两个字节，记录其之后的自由分配单元的编号。因为刚开始所有分配单元都是自由的，所以这个编号就是自身编号加1，即位置上紧跟其后的单元的编号。初始化后，将此内存块的第1个分配单元的起始地址返回，已经知道这个地址就是aData。</P>
<P style="TEXT-INDENT: 2em">（5）当某个被分配的单元因为delete需要回收时，该单元并不会返回给进程堆，而是返回给MemoryPool。返回时，MemoryPool能够知道该单元的起始地址。这时，MemoryPool开始遍历其所维护的内存块链表，判断该单元的起始地址是否落在某个内存块的地址范围内。如果不在所有内存地址范围内，则这个被回收的单元不属于这个MemoryPool；如果在某个内存块的地址范围内，那么它会将这个刚刚回收的分配单元加到这个内存块的MemoryBlock所维护的自由分配单元链表的头部，同时将其nFree值递增1。回收后，考虑到资源的有效利用及后续操作的性能，内存池的操作会继续判断：如果此内存块的所有分配单元都是自由的，那么这个内存块就会从MemoryPool中被移出并作为一个整体返回给进程堆；如果该内存块中还有非自由分配单元，这时不能将此内存块返回给进程堆。但是因为刚刚有一个分配单元返回给了这个内存块，即这个内存块有自由分配单元可供下次分配，因此它会被移到MemoryPool维护的内存块的头部。这样下次的内存请求到来，MemoryPool遍历其内存块链表以寻找自由分配单元时，第1次寻找就会找到这个内存块。因为这个内存块确实有自由分配单元，这样可以减少MemoryPool的遍历次数。</P>
<P style="TEXT-INDENT: 2em">综上所述，每个内存池（MemoryPool）维护一个内存块链表（单链表），每个内存块由一个维护该内存块信息的块头结构（MemoryBlock）和多个分配单元组成，块头结构MemoryBlock则进一步维护一个该内存块的所有自由分配单元组成的"链表"。这个链表不是通过"指向下一个自由分配单元的指针"链接起来的，而是通过"下一个自由分配单元的编号"链接起来，这个编号值存储在该自由分配单元的头两个字节中。另外，第1个自由分配单元的起始位置并不是MemoryBlock结构体"后面的"第1个地址位置，而是MemoryBlock结构体"内部"的最后一个字节aData（也可能不是最后一个，因为考虑到字节对齐的问题），即分配单元实际上往前面错了一位。又因为MemoryBlock结构体后面的空间刚好是分配单元的整数倍，这样依次错位下去，内存块的最后一个字节实际没有被利用。这么做的一个原因也是考虑到不同平台的移植问题，因为不同平台的对齐方式可能不尽相同。即当申请MemoryBlock大小内存时，可能会返回比其所有成员大小总和还要大一些的内存。最后的几个字节是为了"补齐"，而使得aData成为第1个分配单元的起始位置，这样在对齐方式不同的各种平台上都可以工作。</P>
<P style="TEXT-INDENT: 2em"><A >&nbsp;细节剖析</A></P>
<P style="TEXT-INDENT: 2em">有了上述的总体印象后，仔细剖析其实现细节。</P>
<P style="TEXT-INDENT: 2em">（1）MemoryPool的构造如下：</P>
<P style="TEXT-INDENT: 2em">
<TABLE cellSpacing=0 cellPadding=0 width="100%" border=0>
<TBODY>
<TR>
<TD>
<P></P>
<P style="TEXT-INDENT: 2em">MemoryPool::MemoryPool( USHORT _nUnitSize, USHORT _nInitSize, USHORT _nGrowSize ){ pBlock = NULL; ① nInitSize = _nInitSize; ② nGrowSize = _nGrowSize; ③ if ( _nUnitSize &gt; 4 ) nUnitSize = (_nUnitSize + (MEMPOOL_ALIGNMENT-1)) &amp; ~(MEMPOOL_ALIGNMENT-1); ④ else if ( _nUnitSize &lt;= 2 ) nUnitSize = 2; ⑤ else nUnitSize = 4;}</P></TD></TR></TBODY></TABLE></P>
<P style="TEXT-INDENT: 2em">从①处可以看出，MemoryPool创建时，并没有立刻创建真正用来满足内存申请的内存块，即内存块链表刚开始时为空。</P>
<P style="TEXT-INDENT: 2em">②处和③处分别设置"第1次创建的内存块所包含的分配单元的个数"，及"随后创建的内存块所包含的分配单元的个数"，这两个值在MemoryPool创建时通过参数指定，其后在该MemoryPool对象生命周期中一直不变。</P>
<P style="TEXT-INDENT: 2em">后面的代码用来设置nUnitSize，这个值参考传入的_nUnitSize参数。但是还需要考虑两个因素。如前所述，每个分配单元在自由状态时，其头两个字节用来存放"其下一个自由分配单元的编号"。即每个分配单元"最少"有"两个字节"，这就是⑤处赋值的原因。④处是将大于4个字节的大小_nUnitSize往上"取整到"大于_nUnitSize的最小的MEMPOOL_ ALIGNMENT的倍数（前提是MEMPOOL_ALIGNMENT为2的倍数）。如_nUnitSize为11时，MEMPOOL_ALIGNMENT为8，nUnitSize为16；MEMPOOL_ALIGNMENT为4，nUnitSize为12；MEMPOOL_ALIGNMENT为2，nUnitSize为12，依次类推。</P>
<P style="TEXT-INDENT: 2em">（2）当向MemoryPool提出内存请求时：</P>
<P style="TEXT-INDENT: 2em">
<TABLE cellSpacing=0 cellPadding=0 width="100%" border=0>
<TBODY>
<TR>
<TD>
<P></P>
<P style="TEXT-INDENT: 2em">void* MemoryPool::Alloc(){ if ( !pBlock ) ① { …… } MemoryBlock* pMyBlock = pBlock; while (pMyBlock &amp;&amp; !pMyBlock-&gt;nFree )② pMyBlock = pMyBlock-&gt;pNext; if ( pMyBlock ) ③ { char* pFree = pMyBlock-&gt;aData+(pMyBlock-&gt;nFirst*nUnitSize); pMyBlock-&gt;nFirst = *((USHORT*)pFree); pMyBlock-&gt;nFree--; return (void*)pFree; } else ④ { if ( !nGrowSize ) return NULL; pMyBlock = new(nGrowSize, nUnitSize) FixedMemBlock(nGrowSize, nUnitSize); if ( !pMyBlock ) return NULL; pMyBlock-&gt;pNext = pBlock; pBlock = pMyBlock; return (void*)(pMyBlock-&gt;aData); }}</P></TD></TR></TBODY></TABLE></P>
<P style="TEXT-INDENT: 2em">MemoryPool满足内存请求的步骤主要由四步组成。</P>
<P style="TEXT-INDENT: 2em">①处首先判断内存池当前内存块链表是否为空，如果为空，则意味着这是第1次内存申请请求。这时，从进程堆中申请一个分配单元个数为nInitSize的内存块，并初始化该内存块（主要初始化MemoryBlock结构体成员，以及创建初始的自由分配单元链表，下面会详细分析其代码）。如果该内存块申请成功，并初始化完毕，返回第1个分配单元给调用函数。第1个分配单元以MemoryBlock结构体内的最后一个字节为起始地址。</P>
<P style="TEXT-INDENT: 2em">②处的作用是当内存池中已有内存块（即内存块链表不为空）时遍历该内存块链表，寻找还有"自由分配单元"的内存块。</P>
<P style="TEXT-INDENT: 2em">③处检查如果找到还有自由分配单元的内存块，则"定位"到该内存块现在可以用的自由分配单元处。"定位"以MemoryBlock结构体内的最后一个字节位置aData为起始位置，以MemoryPool的nUnitSize为步长来进行。找到后，需要修改MemoryBlock的nFree信息（剩下来的自由分配单元比原来减少了一个），以及修改此内存块的自由存储单元链表的信息。在找到的内存块中，pMyBlock-&gt;nFirst为该内存块中自由存储单元链表的表头，其下一个自由存储单元的编号存放在pMyBlock-&gt;nFirst指示的自由存储单元（亦即刚才定位到的自由存储单元）的头两个字节。通过刚才定位到的位置，取其头两个字节的值，赋给pMyBlock-&gt;nFirst，这就是此内存块的自由存储单元链表的新的表头，即下一次分配出去的自由分配单元的编号（如果nFree大于零的话）。修改维护信息后，就可以将刚才定位到的自由分配单元的地址返回给此次申请的调用函数。注意，因为这个分配单元已经被分配，而内存块无须维护已分配的分配单元，因此该分配单元的头两个字节的信息已经没有用处。换个角度看，这个自由分配单元返回给调用函数后，调用函数如何处置这块内存，内存池无从知晓，也无须知晓。此分配单元在返回给调用函数时，其内容对于调用函数来说是无意义的。因此几乎可以肯定调用函数在用这个单元的内存时会覆盖其原来的内容，即头两个字节的内容也会被抹去。因此每个存储单元并没有因为需要链接而引入多余的维护信息，而是直接利用单元内的头两个字节，当其分配后，头两个字节也可以被调用函数利用。而在自由状态时，则用来存放维护信息，即下一个自由分配单元的编号，这是一个有效利用内存的好例子。</P>
<P style="TEXT-INDENT: 2em">④处表示在②处遍历时，没有找到还有自由分配单元的内存块，这时，需要重新向进程堆申请一个内存块。因为不是第一次申请内存块，所以申请的内存块包含的分配单元个数为nGrowSize，而不再是nInitSize。与①处相同，先做这个新申请内存块的初始化工作，然后将此内存块插入MemoryPool的内存块链表的头部，再将此内存块的第1个分配单元返回给调用函数。将此新内存块插入内存块链表的头部的原因是该内存块还有很多可供分配的自由分配单元（除非nGrowSize等于1，这应该不太可能。因为内存池的含义就是一次性地从进程堆中申请一大块内存，以供后续的多次申请），放在头部可以使得在下次收到内存申请时，减少②处对内存块的遍历时间。</P>
<P style="TEXT-INDENT: 2em">可以MemoryPool来展示MemoryPool::Alloc的过程。某个时刻MemoryPool的内部状态。</P>
<P style="TEXT-INDENT: 2em"><A >某个时刻MemoryPool的内部状态</A></P>
<P style="TEXT-INDENT: 2em"><IMG src="http://www.ibm.com/developerworks/cn/linux/l-cn-ppp/images6/6_3.gif" border=0> </P>
<P style="TEXT-INDENT: 2em">因为MemoryPool的内存块链表不为空，因此会遍历其内存块链表。又因为第1个内存块里有自由的分配单元，所以会从第1个内存块中分配。检查nFirst，其值为m，这时pBlock-&gt;aData+(pBlock-&gt;nFirst*nUnitSize)定位到编号为m的自由分配单元的起始位置（用pFree表示）。在返回pFree之前，需要修改此内存块的维护信息。首先将nFree递减1，然后取得pFree处开始的头两个字节的值（需要说明的是，这里aData处值为k。其实不是这一个字节。而是以aData和紧跟其后的另外一个字节合在一起构成的一个USHORT的值，不可误会）。发现为k，这时修改pBlock的nFirst为k。然后，返回pFree。此时MemoryPool的结构如图所示。</P>
<P style="TEXT-INDENT: 2em"><A >&nbsp;MemoryPool的结构</A></P>
<P style="TEXT-INDENT: 2em"><IMG src="http://www.ibm.com/developerworks/cn/linux/l-cn-ppp/images6/6_4.gif" border=0> </P>
<P style="TEXT-INDENT: 2em">可以看到，原来的第1个可供分配的单元（m编号处）已经显示为被分配的状态。而pBlock的nFirst已经指向原来m单元下一个自由分配单元的编号，即k。</P>
<P style="TEXT-INDENT: 2em">（3）MemoryPool回收内存时：</P>
<P style="TEXT-INDENT: 2em">
<TABLE cellSpacing=0 cellPadding=0 width="100%" border=0>
<TBODY>
<TR>
<TD>
<P></P>
<P style="TEXT-INDENT: 2em">void MemoryPool::Free( void* pFree ){ …… MemoryBlock* pMyBlock = pBlock; while ( ((ULONG)pMyBlock-&gt;aData &gt; (ULONG)pFree) || ((ULONG)pFree &gt;= ((ULONG)pMyBlock-&gt;aData + pMyBlock-&gt;nSize)) )① { …… } pMyBlock-&gt;nFree++; ② *((USHORT*)pFree) = pMyBlock-&gt;nFirst; ③ pMyBlock-&gt;nFirst = (USHORT)(((ULONG)pFree-(ULONG)(pBlock-&gt;aData)) / nUnitSize);④ if (pMyBlock-&gt;nFree*nUnitSize == pMyBlock-&gt;nSize )⑤ { …… } else { …… }}</P></TD></TR></TBODY></TABLE></P>
<P style="TEXT-INDENT: 2em">如前所述，回收分配单元时，可能会将整个内存块返回给进程堆，也可能将被回收分配单元所属的内存块移至内存池的内存块链表的头部。这两个操作都需要修改链表结构。这时需要知道该内存块在链表中前一个位置的内存块。</P>
<P style="TEXT-INDENT: 2em">①处遍历内存池的内存块链表，确定该待回收分配单元（pFree）落在哪一个内存块的指针范围内，通过比较指针值来确定。</P>
<P style="TEXT-INDENT: 2em">运行到②处，pMyBlock即找到的包含pFree所指向的待回收分配单元的内存块（当然，这时应该还需要检查pMyBlock为NULL时的情形，即pFree不属于此内存池的范围，因此不能返回给此内存池，读者可以自行加上）。这时将pMyBlock的nFree递增1，表示此内存块的自由分配单元多了一个。</P>
<P style="TEXT-INDENT: 2em">③处用来修改该内存块的自由分配单元链表的信息，它将这个待回收分配单元的头两个字节的值指向该内存块原来的第一个可分配的自由分配单元的编号。</P>
<P style="TEXT-INDENT: 2em">④处将pMyBlock的nFirst值改变为指向这个待回收分配单元的编号，其编号通过计算此单元的起始位置相对pMyBlock的aData位置的差值，然后除以步长（nUnitSize）得到。</P>
<P style="TEXT-INDENT: 2em">实质上，③和④两步的作用就是将此待回收分配单元"真正回收"。值得注意的是，这两步实际上是使得此回收单元成为此内存块的下一个可分配的自由分配单元，即将它放在了自由分配单元链表的头部。注意，其内存地址并没有发生改变。实际上，一个分配单元的内存地址无论是在分配后，还是处于自由状态时，一直都不会变化。变化的只是其状态（已分配/自由），以及当其处于自由状态时在自由分配单元链表中的位置。</P>
<P style="TEXT-INDENT: 2em">⑤处检查当回收完毕后，包含此回收单元的内存块的所有单元是否都处于自由状态，且此内存是否处于内存块链表的头部。如果是，将此内存块整个的返回给进程堆，同时修改内存块链表结构。</P>
<P style="TEXT-INDENT: 2em">注意，这里在判断一个内存块的所有单元是否都处于自由状态时，并没有遍历其所有单元，而是判断nFree乘以nUnitSize是否等于nSize。nSize是内存块中所有分配单元的大小，而不包括头部MemoryBlock结构体的大小。这里可以看到其用意，即用来快速检查某个内存块中所有分配单元是否全部处于自由状态。因为只需结合nFree和nUnitSize来计算得出结论，而无须遍历和计算所有自由状态的分配单元的个数。</P>
<P style="TEXT-INDENT: 2em">另外还需注意的是，这里并不能比较nFree与nInitSize或nGrowSize的大小来判断某个内存块中所有分配单元都为自由状态，这是因为第1次分配的内存块（分配单元个数为nInitSize）可能被移到链表的后面，甚至可能在移到链表后面后，因为某个时间其所有单元都处于自由状态而被整个返回给进程堆。即在回收分配单元时，无法判定某个内存块中的分配单元个数到底是nInitSize还是nGrowSize，也就无法通过比较nFree与nInitSize或nGrowSize的大小来判断一个内存块的所有分配单元是否都为自由状态。</P>
<P style="TEXT-INDENT: 2em">以上面分配后的内存池状态作为例子，假设这时第2个内存块中的最后一个单元需要回收（已被分配，假设其编号为m，pFree指针指向它），如图所示。</P>
<P style="TEXT-INDENT: 2em">不难发现，这时nFirst的值由原来的0变为m。即此内存块下一个被分配的单元是m编号的单元，而不是0编号的单元（最先分配的是最新回收的单元，从这一点看，这个过程与栈的原理类似，即先进后出。只不过这里的"进"意味着"回收"，而"出"则意味着"分配"）。相应地，m的"下一个自由单元"标记为0，即内存块原来的"下一个将被分配出去的单元"，这也表明最近回收的分配单元被插到了内存块的"自由分配单元链表"的头部。当然，nFree递增1。</P>
<P style="TEXT-INDENT: 2em"><A >分配后的内存池状态</A></P>
<P style="TEXT-INDENT: 2em"><IMG src="http://www.ibm.com/developerworks/cn/linux/l-cn-ppp/images6/6_5.gif" border=0> </P>
<P style="TEXT-INDENT: 2em">处理至⑥处之前，其状态如图所示。</P>
<P style="TEXT-INDENT: 2em"><A >图 处理至⑥处之前的内存池状态</A></P>
<P style="TEXT-INDENT: 2em"><IMG src="http://www.ibm.com/developerworks/cn/linux/l-cn-ppp/images6/6_6.gif" border=0> </P>
<P style="TEXT-INDENT: 2em">这里需要注意的是，虽然pFree被"回收"，但是pFree仍然指向m编号的单元，这个单元在回收过程中，其头两个字节被覆写，但其他部分的内容并没有改变。而且从整个进程的内存使用角度来看，这个m编号的单元的状态仍然是"有效的"。因为这里的"回收"只是回收给了内存池，而并没有回收给进程堆，因此程序仍然可以通过pFree访问此单元。但是这是一个很危险的操作，因为首先该单元在回收过程中头两个字节已被覆写，并且该单元可能很快就会被内存池重新分配。因此回收后通过pFree指针对这个单元的访问都是错误的，读操作会读到错误的数据，写操作则可能会破坏程序中其他地方的数据，因此需要格外小心。</P>
<P style="TEXT-INDENT: 2em">接着，需要判断该内存块的内部使用情况，及其在内存块链表中的位置。如果该内存块中省略号"……"所表示的其他部分中还有被分配的单元，即nFree乘以nUnitSize不等于nSize。因为此内存块不在链表头，因此还需要将其移到链表头部，如图所示。</P>
<P style="TEXT-INDENT: 2em"><A >图因回收引起的MemoryBlock移动</A></P>
<P style="TEXT-INDENT: 2em"><IMG src="http://www.ibm.com/developerworks/cn/linux/l-cn-ppp/images6/6_7.gif" border=0> </P>
<P style="TEXT-INDENT: 2em">如果该内存块中省略号"……"表示的其他部分中全部都是自由分配单元，即nFree乘以nUnitSize等于nSize。因为此内存块不在链表头，所以此时需要将此内存块整个回收给进程堆，回收后内存池的结构如图所示。</P>
<P style="TEXT-INDENT: 2em"><A >图 回收后内存池的结构</A></P>
<P style="TEXT-INDENT: 2em"><IMG src="http://www.ibm.com/developerworks/cn/linux/l-cn-ppp/images6/6_8.gif" border=0> </P>
<P style="TEXT-INDENT: 2em">一个内存块在申请后会初始化，主要是为了建立最初的自由分配单元链表，下面是其详细代码：</P>
<P style="TEXT-INDENT: 2em">
<TABLE cellSpacing=0 cellPadding=0 width="100%" border=0>
<TBODY>
<TR>
<TD>
<P></P>
<P style="TEXT-INDENT: 2em">MemoryBlock::MemoryBlock (USHORT nTypes, USHORT nUnitSize) : nSize (nTypes * nUnitSize), nFree (nTypes - 1), ④ nFirst (1), ⑤ pNext (0){ char * pData = aData; ① for (USHORT i = 1; i &lt; nTypes; i++) ② { *reinterpret_cast&lt;USHORT*&gt;(pData) = i; ③ pData += nUnitSize; }}</P></TD></TR></TBODY></TABLE></P>
<P style="TEXT-INDENT: 2em">这里可以看到，①处pData的初值是aData，即0编号单元。但是②处的循环中i却是从1开始，然后在循环内部的③处将pData的头两个字节值置为i。即0号单元的头两个字节值为1，1号单元的头两个字节值为2，一直到（nTypes-2）号单元的头两个字节值为（nTypes-1）。这意味着内存块初始时，其自由分配单元链表是从0号开始。依次串联，一直到倒数第2个单元指向最后一个单元。</P>
<P style="TEXT-INDENT: 2em">还需要注意的是，在其初始化列表中，nFree初始化为nTypes-1（而不是nTypes），nFirst初始化为1（而不是0）。这是因为第1个单元，即0编号单元构造完毕后，立刻会被分配。另外注意到最后一个单元初始并没有设置头两个字节的值，因为该单元初始在本内存块中并没有下一个自由分配单元。但是从上面例子中可以看到，当最后一个单元被分配并回收后，其头两个字节会被设置。</P>
<P style="TEXT-INDENT: 2em">图所示为一个内存块初始化后的状态。</P>
<P style="TEXT-INDENT: 2em"><A >图 一个内存块初始化后的状态</A></P>
<P style="TEXT-INDENT: 2em"><IMG src="http://www.ibm.com/developerworks/cn/linux/l-cn-ppp/images6/6_9.gif" border=0> </P>
<P style="TEXT-INDENT: 2em">当内存池析构时，需要将内存池的所有内存块返回给进程堆：</P>
<P style="TEXT-INDENT: 2em">
<TABLE cellSpacing=0 cellPadding=0 width="100%" border=0>
<TBODY>
<TR>
<TD>
<P></P>
<P style="TEXT-INDENT: 2em">MemoryPool::~MemoryPool(){ MemoryBlock* pMyBlock = pBlock; while ( pMyBlock ) { …… }}</P></TD></TR></TBODY></TABLE></P>
<P style="TEXT-INDENT: 2em"><A >&nbsp;使用方法</A></P>
<P style="TEXT-INDENT: 2em">从上面的分析可以看到，该内存池主要有两个对外接口函数，即Alloc和Free。Alloc返回所申请的分配单元（固定大小内存），Free则回收传入的指针代表的分配单元的内存给内存池。分配的信息则通过MemoryPool的构造函数指定，包括分配单元大小、内存池第1次申请的内存块中所含分配单元的个数，以及内存池后续申请的内存块所含分配单元的个数等。</P>
<P style="TEXT-INDENT: 2em">综上所述，当需要提高某些关键类对象的申请／回收效率时，可以考虑将该类所有生成对象所需的空间都从某个这样的内存池中开辟。在销毁对象时，只需要返回给该内存池。"一个类的所有对象都分配在同一个内存池对象中"这一需求很自然的设计方法就是为这样的类声明一个静态内存池对象，同时为了让其所有对象都从这个内存池中开辟内存，而不是缺省的从进程堆中获得，需要为该类重载一个new运算符。因为相应地，回收也是面向内存池，而不是进程的缺省堆，还需要重载一个delete运算符。在new运算符中用内存池的Alloc函数满足所有该类对象的内存请求，而销毁某对象则可以通过在delete运算符中调用内存池的Free完成。</P>
<P style="TEXT-INDENT: 2em"><A >性能比较</A></P>
<P style="TEXT-INDENT: 2em">为了测试利用内存池后的效果，通过一个很小的测试程序可以发现采用内存池机制后耗时为297 ms。而没有采用内存池机制则耗时625 ms，速度提高了52.48%。速度提高的原因可以归结为几点，其一，除了偶尔的内存申请和销毁会导致从进程堆中分配和销毁内存块外，绝大多数的内存申请和销毁都由内存池在已经申请到的内存块中进行，而没有直接与进程堆打交道，而直接与进程堆打交道是很耗时的操作；其二，这是单线程环境的内存池，可以看到内存池的Alloc和Free操作中并没有加线程保护措施。因此如果类A用到该内存池，则所有类A对象的创建和销毁都必须发生在同一个线程中。但如果类A用到内存池，类B也用到内存池，那么类A的使用线程可以不必与类B的使用线程是同一个线程。</P>
<P style="TEXT-INDENT: 2em">因为内存池技术使得同类型的对象分布在相邻的内存区域，而程序会经常对同一类型的对象进行遍历操作。因此在程序运行过程中发生的缺页应该会相应少一些，但这个一般只能在真实的复杂应用环境中进行验证。内存的申请和释放对一个应用程序的整体性能影响极大，甚至在很多时候成为某个应用程序的瓶颈。消除内存申请和释放引起的瓶颈的方法往往是针对内存使用的实际情况提供一个合适的内存池。内存池之所以能够提高性能，主要是因为它能够利用应用程序的实际内存使用场景中的某些"特性"。比如某些内存申请与释放肯定发生在一个线程中，某种类型的对象生成和销毁与应用程序中的其他类型对象要频繁得多，等等。针对这些特性，可以为这些特殊的内存使用场景提供量身定做的内存池。这样能够消除系统提供的缺省内存机制中，对于该实际应用场景中的不必要的操作，从而提升应用程序的整体性能。
</P><P style="TEXT-INDENT: 2em">&nbsp;</P></div>]]></description>
	    <author><![CDATA[kaineci]]></author>
	    <comments>http://kaineci.blog.163.com/blog/static/26896330200823161120591</comments>
    <slash:comments>0</slash:comments>
    <guid isPermaLink="true">http://kaineci.blog.163.com/blog/static/26896330200823161120591</guid>
    <pubDate>Mon, 31 Mar 2008 18:11:20 +0800</pubDate>
    <dcterms:modified>2008-03-31T18:11:20+08:00</dcterms:modified>
  </item>    
  <item>
  	<title><![CDATA[How to Evaluate a Clustering Search Engine]]></title>	
    <link>http://kaineci.blog.163.com/blog/static/268963302008232414129</link>
    <description><![CDATA[<div>&nbsp;By Raul Valdes-Perez<br><p>Many enterprise search vendors have announced that clustering of
search results is now part of their product and user experience. The
most recent case is Google (<a href="http://www.google.com/press/annc/gsa_new_features_0107.html">press center</a>, <a href="http://googleenterprise.blogspot.com/2007/01/new-year-new-features.html">blog post</a>, <a href="http://www.cmswatch.com/Trends/838-Blogosphere-responds-to-Google%27s-appliance-upgrade">blogosphere reaction</a>). Microsoft researchers have also <a href="http://rwsm.directtaps.net/">experimented</a> with clustering, without these experiments finding their way yet into Microsoft’s products.</p>
<p>By definition, a clustering engine analyzes the top (say 200-500)
search results from a query and displays the main themes, typically as
folders that may consist of subfolders.</p>
<p>The spread of clustering engines is gratifying, since Vivisimo was
founded on a breakthrough clustering algorithm, has been refining the
approach and educating and selling into the search market since 2001,
and has evolved into a complete enterprise search provider.</p>
<p>Just as with search results, or as with any other designed product,
judging the quality of a clustering engine requires some skill. Before
judging quality, let’s first explore clustering’s <strong>end user value, </strong>that is, how it enhances <a href="http://searchdoneright.com/2007/03/lets-modernize-our-concept-of-knowledge-worker/">knowledge worker productivity</a>.</p>
<p>Clustering enhances end user productivity in at least three ways<strong>:</strong></p>
<ol><li>At a glance, users gain an <strong>easy</strong> <strong>overview</strong> of the main distinct themes that are present in the top search results.</li><li>By clicking on clusters that satisfy their needs or interests, users can <strong>quickly arrive at search results that are valuable but low ranked</strong>,
say, #73 or even #429 in the results list, and so would never be
noticed otherwise. The user’s visibility into the content is greatly
enhanced.</li><li>After arriving at a cluster or sub-cluster, <strong>related results are placed together</strong> (”clustered”) rather than scattered throughout the ranked list. This expedites finding related or the best content.</li></ol>
<p>In short, clustering lets users overview, find, discover, and
compare information more productively. How does clustering quality
enhance or detract from this productivity gain?</p>
<ol><li>To grasp an overview of the main themes, the cluster labels should
be concise and natural-looking. Also, the clusters shouldn’t overlap
too much in their contents, otherwise the user will be overloaded with
too many clusters expressing overly related themes. If the clusters
don’t overlap too much, so that on average a search result appears in
only 1.2 to 1.5 clusters, then the main <strong>distinct</strong>
themes will be shown, rather than similar/duplicate themes on
overlapping content. Also, the cluster labels shouldn’t be artificially
limited to labels that contain the query word, or labels that have two
or more words in them, or some other artifact of an inferior clustering
approach. Finally, the underlying search engine snippets (aka excerpts
or dynamic summaries) should be full enough so that clustering has
enough input text to work with.</li><li>To arrive at <strong>low-ranked but valuable</strong> results, the
clustering engine should be fast enough so that 200-500 results or more
can be clustered within an acceptable response time. If user
authentication is an issue (note discussion <a href="http://searchdoneright.com/2007/02/whats-wrong-with-googles-enterprise-search-security-part-1/">here</a>),
then the response time should include the time for the search engine to
verify that the user can view these documents. Also, the cluster labels
should accurately express its contents, otherwise the user wastes time
on <a href="http://chass.colostate-pueblo.edu/magazine/2005/images/snogee.jpg">wild goose chases. </a></li><li>In order for similar results to be placed together in the same clusters, the clustering software should possess the <strong>linguistic knowledge</strong> needed to correctly handle cases like the following:
<ul><li>sort out the meanings in <em>middle ages, middle aged, </em>and<em> medieval</em>, and in <em>news release, new release, </em>and<em> press release</em>.</li><li>realize that <em>king</em> and <em>kingship</em> are very related, unlike <em>gun</em> and <em>gunship.</em></li><li>realize that <em>unfearful</em> and <em>fearless</em> are synonymous, but not <em>unhelpful</em> and <em>helpless.</em></li><li>plus many thousands of other linguistic relationships that take
time and background knowledge to learn, whether by humans in school or
by computers.</li></ul>
</li></ol>
<p>There are endless other subtleties, but enough: what’s the bottom
line? Here are some questions to ask about the quality of a clustering
search engine:</p>
<ol><li>Are the cluster folders determined by analyzing the top search
results? If not, then no overview of the major themes is being given.
Instead, the “folders” are probably based on query logs.</li><li>Does clicking on a cluster cause a new search to be done? If so, <strong>it’s not clustering</strong> but something else, likely query refinement, which leads to a <a href="http://www2.cio.com/ask%5Cexpert/2005/questions/question2070.html">discontinuous user experience</a>.</li><li>Is the clustering engine able to handle 200-500 results or even more?</li><li>Are the cluster names concise and natural-looking, and do they
correctly handle numbers, punctuation, diacritics, foreign words, etc.?</li><li>Is there evidence that the clustering engine possesses considerable
linguistic knowledge? And in other languages besides English, if
needed? For example, are many (30-40% or more) of the search results
left unclustered into an Other category? This suggests a deficiency in
detecting related meanings.</li></ol>
<p>Here are some queries to try:</p>
<ol><li>bird flu vaccine</li><li>shi’ite sunni rivalry</li><li>9/11 hijackers</li><li>C++</li><li>“Raúl González Blanco”</li><li>bill gates</li></ol><br></div>]]></description>
	    <author><![CDATA[kaineci]]></author>
	    <comments>http://kaineci.blog.163.com/blog/static/268963302008232414129</comments>
    <slash:comments>0</slash:comments>
    <guid isPermaLink="true">http://kaineci.blog.163.com/blog/static/268963302008232414129</guid>
    <pubDate>Mon, 3 Mar 2008 14:41:04 +0800</pubDate>
    <dcterms:modified>2008-03-03T14:41:04+08:00</dcterms:modified>
  </item>    
  <item>
  	<title><![CDATA[YouTube Scalability]]></title>	
    <link>http://kaineci.blog.163.com/blog/static/268963302008126060208</link>
    <description><![CDATA[<div><p ><br></p>
<p >YouTube uses Apache for FastCGI serving.  (I wonder if things would have been easier for them had they chosen <a  href="http://nginx.net/">nginx</a>, which is apparently wonderful for FastCGI and less problematic than Lighttpd)</p>
<p >YouTube is coded mostly in Python. Why? “Development speed critical”.</p>
<p >They use <a  href="http://psyco.sourceforge.net/">psyco</a>, Python -&gt; C compiler, and also C extensions, for performance critical work.</p>
<p >They use <a  href="http://www.lighttpd.net/">Lighttpd</a> for serving the video itself, for a big improvement over Apache.</p>
<p >Each video hosted by a “mini cluster”, which is a set of
machine with the same content. This is a simple way to provide headroom
(slack), so that a machine can be taken down for maintenance (or can
fail) without affecting users. It also provides a form of backup.</p>
<p >The most popular videos are on a CDN (Content Distribution
Network) - they use external CDNs and well as Google’s CDN. Requests to
their own machines are therefore tail-heavy (in the “Long Tail” sense),
because the head codes to the CDN instead.</p>
<p >Because of the tail-heavy load, random disks seeks are especially important (perhaps more important than caching?).</p>
<p >YouTube uses simple, cheap, commodity Hardware. The more
expensive the hardware, the more expensive everything else gets
(support, etc.). Maintenance is mostly done with rsync, SSH, other
simple, common tools.<br>
The fun is not over: Cuong showed a recent email titled “3 days of
video storage left”. There is constant work to keep up with the growth.</p>
<p >Thumbnails turn out to be surprisingly hard to serve
efficiently. Because there, on average, 4 thumbnails per video and many
thumbnails per pages, the overall number of thumbnails per second is
enormous. They use a separate group of machines to serve thumbnails,
with extensive caching and OS tuning specific to this load.</p>
<p >YouTube was bit by a “too many files in one dir” limit: at
one point they could accept no more uploads (!!) because of this. The
first fix was the usual one: split the files across many directories,
and switch to another file system better suited for many small files.</p>
<p >Cuong joked about “The Windows approach of scaling: restart everything”</p>
<p >Lighttpd turned out to be poor for serving the thumbnails,
because its main loop is a bottleneck to load files from disk; they
addressed this by modifying Lighttpd to add worker threads to read from
disk. This was good but still not good enough, with one thumbnail per
file, because the enormous number of files was terribly slow to work
with (imagine tarring up many million files).</p>
<p >Their new solution for thumbnails is to use Google’s <a  href="http://labs.google.com/papers/bigtable.html">BigTable</a>,
which provides high performance for a large number of rows, fault
tolerance, caching, etc. This is a nice (and rare?) example of actual
synergy in an acquisition.</p>
<p >YouTube uses MySQL to store metadata. Early on they hit a
Linux kernel issue which prioritized the page cache higher than app
data, it swapped out the app data, totally overwhelming the system.
They recovered from this by removing the swap partition (while live!).
This worked.</p>
<p >YouTube uses <a  href="http://www.danga.com/memcached/">Memcached</a>.</p>
<p >To scale out the database, they first used MySQL
replication. Like everyone else that goes down this path, they
eventually reach a point where replicating the writes to all the DBs,
uses up all the capacity of the slaves. They also hit a issue with
threading and replication, which they worked around with a very clever
“cache primer thread” working a second or so ahead of the replication
thread, prefetching the data it would need.</p>
<p >As the replicate-one-DB approach faltered, they resorted
to various desperate measures, such as splitting the video watching in
to a separate set of replicas, intentionally allowing the
non-video-serving parts of YouTube to perform badly so as to focus on
serving videos.</p>
<p >Their initial MySQL DB server configuration had 10 disks
in a RAID10. This does not work very well, because the DB/OS can’t take
advantage of the multiple disks in parallel. They moved to a set of
RAID1s, appended together. In my experience, this is better, but still
not great. An approach that usually works even better is to
intentionally split different data on to different RAIDs: for example,
a RAID for the OS / application, a RAID for the DB logs, one or more
RAIDs for the DB table (uses “tablespaces” to get your #1 busiest table
on separate spindles from your #2 busiest table), one or more RAID for
index, etc. Big-iron Oracle installation sometimes take this approach
to extremes; the same thing can be done with free DBs on free OSs also.</p>
<p >In spite of all these effort, they reached a point where
replication of one large DB was no longer able to keep up. Like
everyone else, they figured out that the solution database partitioning
in to “shards”. This spread reads and writes in to many different
databases (on different servers) that are not all running each other’s
writes. The result is a large performance boost, better cache locality,
etc. YouTube reduced their total DB hardware by 30% in the process.</p>
<p >It is important to divide users across shards by a
controllable lookup mechanism, not only by a hash of the
username/ID/whatever, so that you can rebalance shards incrementally.</p>
<p >An interesting DMCA issue: YouTube complies with takedown
requests; but sometimes the videos are cached way out on the “edge” of
the network (their caches, and other people’s caches), so its hard to
get a video to disappear globally right away. This sometimes angers
content owners.</p>
<p >Early on, YouTube leased their hardware.</p></div>]]></description>
	    <author><![CDATA[kaineci]]></author>
	    <comments>http://kaineci.blog.163.com/blog/static/268963302008126060208</comments>
    <slash:comments>0</slash:comments>
    <guid isPermaLink="true">http://kaineci.blog.163.com/blog/static/268963302008126060208</guid>
    <pubDate>Tue, 26 Feb 2008 12:06:00 +0800</pubDate>
    <dcterms:modified>2008-02-26T12:06:00+08:00</dcterms:modified>
  </item>    
  <item>
  	<title><![CDATA[java 内存泄露：谨慎对待内嵌类]]></title>	
    <link>http://kaineci.blog.163.com/blog/static/2689633020080291326113</link>
    <description><![CDATA[<div>在测试热切换框架的时候，发现有内存泄露。最后发现，其中一个源头是因为一个class中有一个非静态的内嵌类。java会在非静态内嵌类中保存super的指针。这样如果不是显示置null的话，如果只将super class指针丢弃，那么它和它内部的内嵌类则会循环指向，jvm是永远不会collection这些内存的。<br>推荐使用jrockit真的很好用。<br></div>]]></description>
	    <author><![CDATA[kaineci]]></author>
	    <comments>http://kaineci.blog.163.com/blog/static/2689633020080291326113</comments>
    <slash:comments>0</slash:comments>
    <guid isPermaLink="true">http://kaineci.blog.163.com/blog/static/2689633020080291326113</guid>
    <pubDate>Tue, 29 Jan 2008 13:32:06 +0800</pubDate>
    <dcterms:modified>2008-01-29T13:32:06+08:00</dcterms:modified>
  </item>    
  <item>
  	<title><![CDATA[ Delicious Integrated Into Yahoo Search Results]]></title>	
    <link>http://kaineci.blog.163.com/blog/static/2689633020080222257611</link>
    <description><![CDATA[<div>link-based早就了google。但是网页link的变化以及网络的扁平化，多样化势必会降低pr的有效性。搜索引擎必然会逐渐加重基于Userdata的ranking的权重。<br>delicious结果被嵌入到yahoo搜索中，便是顺理成章的事情。<br></div>]]></description>
	    <author><![CDATA[kaineci]]></author>
	    <comments>http://kaineci.blog.163.com/blog/static/2689633020080222257611</comments>
    <slash:comments>0</slash:comments>
    <guid isPermaLink="true">http://kaineci.blog.163.com/blog/static/2689633020080222257611</guid>
    <pubDate>Tue, 22 Jan 2008 02:02:57 +0800</pubDate>
    <dcterms:modified>2008-01-22T02:02:57+08:00</dcterms:modified>
  </item>    
  <item>
  	<title><![CDATA[Twine: The First Mainstream Semantic Web App?]]></title>	
    <link>http://kaineci.blog.163.com/blog/static/2689633020080108177721</link>
    <description><![CDATA[<div><a href="http://www.readwriteweb.com/archives/twine_first_mainstream_semantic_web_app.php" target="_blank">http://www.readwriteweb.com/archives/twine_first_mainstream_semantic_web_app.php</a><br><br>On Friday <a href="http://www.radarnetworks.com/">Radar Networks</a> is announcing a new Semantic Web application called <a href="http://www.twine.com/">Twine</a>.
Founder Nova Spivack showed me a demo today of the new app, which he
described as a "knowledge networking" application. It has aspects of
social networking, wikis, blogging, knowledge management systems - but
its defining feature is that it's built with Semantic Web technologies.
Spivack told me that Twine aims to bring a usable and scalable
interface to the long-promised dream of the Semantic Web.
<p>Spivack went as far as to claim that Twine will be "the first
mainstream Semantic Web application" - and it's certainly fair to say
that we've heard lots of theory about the Semantic Web ever since Tim
Berners-Lee defined it, but as yet there have been very few large scale
success stories (if any). Will Twine finally be the Semantic Web app
that breaks through? Let's find out more...</p>
<p>First some background: Nova Spivack has an illustrious history in
the Semantic Web and AI business, having worked for both AI legend Ray
Kurzweil and tech guru Danny Hillis (Thinking Machines). The genesis
for Twine, said Spivack, came from an R&amp;D project about 5 years
ago, which turned into a research project, then a Series A round with
Microsoft co-founder Paul Allen in 2006. As of now the Twine team is 30
people working from San Francisco -- and they're finally ready to
unveil their new mainstream Semantic Web product.</p>
<h2>What is Twine?</h2>
<p>The aim of Twine is to enable people to share knowledge and
information. At first glance it is very much like Wikipedia, but there
is a whole lot more smarts to the system. Spivack described it to me as
"knowledge networking"- i.e. it aims to connect people with each other
"for a purpose". It's not based around socializing, but to share and
organize information you're interested in. Using Twine, you can add
content via wiki functionality (there are many post types), you can
email content into the system, and "collect" something (as an object,
e.g. a book object). The screenshots below show of this in action --
note that the product itself isn't available just yet, as it's in
private testing.</p>



<div  >
<p>Other features of Twine include: RSS feeds to track all kinds of
things (topics, events, search, etc); commenting and viewing related
things, sharing tags, and more. Also, and <a href="http://marc.blogs.it/">Marc Canter</a> will like this, Twine users will be able to import <strong><em>and export</em></strong> their own data. Nova said that Twine will be an open platform - there will be a SPARQL API and a REST API.</p>
<h2>Semantic Graph</h2>
<p>Where Twine is differentiated from the likes of wikipedia is that
its underlying data structure is entirely Semantic Web. Spivack told me
that the following Semantic Web technologies are being used: RDF, OWL,
SPARQL, XSL. Also he said that they plan to use <a href="http://www.internetnews.com/dev-news/article.php/3699101">GRDDL</a>
in the near future. Spivack had an interesting term for what Twine is
doing with Semantic Web technologies, riffing off the Facebook Social
Graph. Spivack is calling Twine a "Semantic Graph", which he says will
map relationships to both people and topics. So Twine's Semantic Graph
actually integrates the Social Graph. Spivack said that his company has
patents pending on this.</p>
<h2>Who will use it?</h2>
<p>So who is Twine aimed at? Spivack said that it's aimed at
professionals and teams. Also he said content providers are expressing
interest, because their data can be turned into Semantic Web data and
re-used. </p>
<p>As for the business model, it will be advertising and also
subscriptions (for higher capabilities). The advertising part won't be
in the first release and Twine hasn't yet decided how to run that -
e.g. they may use a single ad network provider, or (as Facebook is
considering) create their own ad network.</p>
<h2>Conclusion</h2>
<p>Overall, while the app isn't ready yet for the public, I was
impressed with what I saw in Nova's demo. The proof will be in the
pudding regarding whether this will be the first mainstream Semantic
Web app - i.e. how much uptake it gets and whether there will be good
use cases for all this semantic data. But it certainly looked like a
usable and slick system - and one I'm looking forward to playing with.</p>
<h2>Screenshots</h2>
<p>Click each picture to see full-length screenshots.</p>
<p><a href="http://www.readwriteweb.com/images/twine/TwineSummary_large.jpg"><img src="http://www.readwriteweb.com/images/twine/TwineSummary.jpg"></a></p>
<p><a href="http://www.readwriteweb.com/images/twine/TwineHome_large.jpg"><img src="http://www.readwriteweb.com/images/twine/TwineHome.jpg"></a></p>
<p><a href="http://www.readwriteweb.com/images/twine/myHome_large.jpg"><img src="http://www.readwriteweb.com/images/twine/myHome.jpg"></a></p>
<p><a href="http://www.readwriteweb.com/images/twine/itemsList_large.jpg"><img src="http://www.readwriteweb.com/images/twine/itemsList.jpg"></a></p>
<p><a href="http://www.readwriteweb.com/images/twine/Item_large.jpg"><img src="http://www.readwriteweb.com/images/twine/Item.jpg"></a></p>
</div><br><br></div>]]></description>
	    <author><![CDATA[kaineci]]></author>
	    <comments>http://kaineci.blog.163.com/blog/static/2689633020080108177721</comments>
    <slash:comments>0</slash:comments>
    <guid isPermaLink="true">http://kaineci.blog.163.com/blog/static/2689633020080108177721</guid>
    <pubDate>Thu, 10 Jan 2008 20:17:07 +0800</pubDate>
    <dcterms:modified>2008-01-10T20:17:07+08:00</dcterms:modified>
  </item>    
  <item>
  	<title><![CDATA[你要的是哪个“苹果” 语义智能搜索时代来临 ]]></title>	
    <link>http://kaineci.blog.163.com/blog/static/268963302008010426700</link>
    <description><![CDATA[<div><div>青年参考</div><div>&nbsp;</div><div>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
当人们在电脑搜索框中输入“苹果”两个字，汹涌而来的各色信息让我们头晕脑胀：“苹果示范园”，“吃苹果的七大好处”，“苹果中国人事大变动”，
“苹果单车的店铺”……在所有这些反馈的信息中，只有一小部分是你想要的。如果你是数码产品发烧友，你期待看见的可能是苹果手机、苹果电脑；如果你是水果
批发商，你会对“苹果示范园”感兴趣；如果是想美容的女士，就会看看“吃苹果的七大好处”。</div><div><strong>　　你要的是哪个<span>“苹果”？</span></strong></div><div>　　目前的网络搜索，仅仅能够反馈给用户夹杂着各种噪音和无用信息的“信息大包裹”，用户得人工分拣出有用信息。但随着一种新软件Twine的出现，这种情况有可能彻底改观。</div><div>　　位于美国芝加哥的Radar Network公司正在开发一种新软件，有望大大提高人们的搜索效率。</div><div>　　人们所要做的，就是登陆Twine网站，把自己的日常资料一股脑地“倾倒”进这个软件，之后就可以轻松离开，Twine会用一眨眼的工夫帮你把一切资料贴上电子标签，分门别类地收藏起来，包括你浏览过的网页、收发的电子邮件，长长的小说和深奥的文件。</div><div>　
　Twine会按照特定的主题，比如人物、地点、组织、企业等指标分类保存，一旦用户需要查找某个信息，它就能很快地按类别搜出这个资料。当你要查找“苹
果”时，它通过收藏的资料判断出您最近打算购买苹果公司生产的iphone手机，于是它会“聪明”地列出这款手机的价格和供货商信息。</div><div><strong>　　<span>“语义网络”潜力无限</span></strong></div><div>　　在这款软件背后，是互联网发展的一个巨大新潮流：语义网络（Semaneic Web）。</div><div>　
　所谓的语义网络，就是为各种信息加注智能标签，再把各种信息通过标签联系起来，而网络用户本身的信息也被贴上了标签，与虚拟世界相连。当一个数
码发烧友在搜索框中输入“苹果”时，电脑“明白”他需要的不是“苹果示范园”和“吃苹果的七大好处”之类风马牛不相及的信息。这就是Twine所做的——
充当个人资料的“电子保姆”，并在此基础上进行“个性化”搜索。</div><div>　　Twine并非第一个语义网络的产品或工具。多年以来，很多公司都已使用了可以自动将信息分类和检索的数据库软件。当前网络上最热的博客功能，也使用了语义网络的原理：人们在自己的博文里添加若干标签，这样在数据库里就能检索到博客的主要内容了。</div><div>　　纽约大学“交互式远程通讯”项目的克雷·舍基教授憧憬道：“语义网络的潜力无可限量，表面上看，语义网络只不过是一场给信息加上标签以利于检索的时尚运动，可实际上，这一运动将给机器智能带来一场翻天覆地的革命。”</div><div><strong>　　机器帮你总结文章内容</strong></div><div>　　<strong>据Radar
Network公司创始人兼CEO斯皮沃克说，Twine的编制遵循了由国际组织万维网协会（W3C）建立的语义网络标准草案，这就意味着，Twine符
合某种规范，也正因为如此，它可与其他语义网络应用程序兼容并共享信息。这大大地扩展了Twine的搜索范围。</strong></div><div>　　除此之外，<strong><span style="color: red;">Twine</span><span style="color: red;">还
使用先进的机器学习程序和自然语言处理程序理解语义，这使它的认知能力远远超过了仅仅通过手工添加的标签进行搜索的系统。斯皮沃克介绍说，自然语言分析能
力能帮助系统迅速“理解”含义模糊的词组——它可根据上下文来判断J.P．　摩根是人名还是一家公司的名字。更令人叫
绝的是，给Twine一段文字，它能利用机器学习能力，在维基百科一类的词海中寻找匹配的信息，总结出这段文字的主题甚至概括中心思想。</span></strong>对此，斯皮沃克信心满满地说：“我们（的软件）会用全新的方式和手段去理解一段文字。”</div><div><strong>　　我们没有夸大其辞</strong></div><div>　　斯皮沃克说，人们研究了几十年的“人工智能”、“人类语言处理技术”，而今天，人们把这些研究成果运用在语义网络上，让这张网变成智能网络，能“看明白”人类的自然语言。</div><div>　　然而，并不是所有的人都对Twine的前景乐观。托尼·肖“语义世界”的领导人。他认为现在就断定Twine是否能够赢得客户还为时过早。仅仅是技术上的可行并不意味着成功，还要引导消费者提升对先进技术的期望值，告诉人们“我们没有夸大其辞”。</div><div>　
　斯皮沃克介绍说，今后几个月，这款软件将会接受更多用户的测试。2008年夏季，Twine可能会完全开放。另外，Twine还会建立一个开发
平台，允许程序员开发基于这个平台的程序，例如可视化软件。这样，用户可以从不同的视角搜索信息。“不过首先，我们必须从最基本的开始。”斯皮沃克说。</div><div>　　（美国《科技评论》杂志）</div></div>]]></description>
	    <author><![CDATA[kaineci]]></author>
	    <comments>http://kaineci.blog.163.com/blog/static/268963302008010426700</comments>
    <slash:comments>0</slash:comments>
    <guid isPermaLink="true">http://kaineci.blog.163.com/blog/static/268963302008010426700</guid>
    <pubDate>Thu, 10 Jan 2008 16:02:06 +0800</pubDate>
    <dcterms:modified>2008-01-10T16:02:06+08:00</dcterms:modified>
  </item>    
  <item>
  	<title><![CDATA[网络游戏带给网络社区的启示]]></title>	
    <link>http://kaineci.blog.163.com/blog/static/2689633020080101551460</link>
    <description><![CDATA[<div><div style="text-indent: 21.75pt;">引自思践语丝:<br>我们来看一下几个有意思的对比数据。</div>
<div style="margin: 0cm 0cm 0pt 18pt; text-indent: -18pt;"><span>1、<span style="font-family: 'Times New Roman'; font-style: normal; font-variant: normal; font-weight: normal; font-size: 7pt; line-height: normal; font-size-adjust: none; font-stretch: normal;"><font size="3">&nbsp; <font face="Arial">MSN每日最高同时在线人数为50万。魔兽世界每日最高同时在线人数为500万；</font></font></span></span></div>
<div style="margin: 0cm 0cm 0pt 18pt; text-indent: -18pt;"><span><span style="font-family: 'Times New Roman'; font-style: normal; font-variant: normal; font-weight: normal; font-size: 7pt; line-height: normal; font-size-adjust: none; font-stretch: normal;"></span></span><span>2、<span style="font-family: 'Times New Roman'; font-style: normal; font-variant: normal; font-weight: normal; font-size: 7pt; line-height: normal; font-size-adjust: none; font-stretch: normal;"><font size="3">&nbsp; </font></span></span>一般门户网站比较好的业务每日单用户贡献的CLICK约20；在线游戏单用户每分钟贡献的CLICK超过20，会员用户平均每日在线游戏时间超过1小时，如果游戏里的一个CLICK算作一个PV的话，那么游戏的PV就是网站的3600倍。</div>
<div style="margin: 0cm 0cm 0pt 18pt; text-indent: -18pt;"><span>3、<span style="font-family: 'Times New Roman'; font-style: normal; font-variant: normal; font-weight: normal; font-size: 7pt; line-height: normal; font-size-adjust: none; font-stretch: normal;"><font size="3">&nbsp; </font></span></span>中国虚拟游戏装备的年交易量不小于游戏点卡的2倍，约40亿元。</div>
<div>网络游戏取得的效果和用户黏度令人羡慕，但这里并不是要来说明网络社区必须按照网络游戏那样搞。其实网络游戏也是网络社区的一种类型，只不过这
个社区群体比较特殊一点，但是它的一些成功是有其内在规律的，而这些规律对于其他类型尤其以文字图片浏览为主的网络社区也是具有很强的借鉴指导意义的。下
面将以对比的方式来找出这些规律来。</div>
<div style="margin: 0cm 0cm 0pt 21pt; text-indent: -21pt;"><strong><span>一、</span>凭什么让用户主动贡献CLICK</strong></div>
<div>网络社区：70%来自内容标题吸引、20%来自用户个性资料，主要的CLICK行为包括：浏览、发表、评论、投票、收藏、编辑等。MYSPACE等以个人为中心的社区平台个性资料编辑的CLICK量有所增加，但是所占比例也不超过30%</div>
<div>网络游戏：70%来自游戏规则和游戏情节本身，20%来自会员互动，10%来自个人设定</div>
<div>吸引用户CLICK的形式社区和游戏虽然有很大不同。但是驱动用户持续CLICK的内在动力是可以相通的。有人说游戏画面漂亮，网络社区不可能
做成游戏那样拟真化的效果，这是事实，但是早期的网络游戏也是纯文本的（MUD游戏），那个时候它的PV/UV也是大大超过网络社区的。其实内在的核心驱
动要素就是我们要说的第一条社区化平台原则——</div>
<div><strong>价值快速反馈原则</strong></div>
<div>游戏中，每一次CLICK的价值反馈基本上都是即时的，我挥刀，就见血，有没有砍倒敌人画面上直接反馈出来，砍掉一半血需要再砍5刀，那就再点
5下鼠标。而网络社区里的CLICK的价值反馈除了浏览行为都是延时的，而单一的浏览行为无法形成CLICK的价值快速反馈，这篇文章对我有没有触动需要
花时间看完才能做出判断，作出了判断后是否要把这种判断反馈到网站平台上并不一定，而且事实上只有少数人有反馈。而网站社区平台渴求和需要网友作出的内容
贡献行为（发帖、回复、投票等）无法向游戏那样给出明确的价值驱动。游戏在此胜出的就在于提供了一个用户必须通过CLICK来获取系统即时响应后的感官刺
激，在此过程中用户获得快感。</div>
<div>那么网络社区平台能从这里面得到什么样的借鉴呢？那就是你希望用户做出的主动行为，需要有一个快速价值反馈的机制。这个机制就体现在网站和网友
的互动机制的设计上。具体的体现有：点击行为的快速反馈机制，回复留言提醒功能，文章被推荐或回复被引用的快速提醒功能。对于发帖的人，他自身贡献的PV
往往要高于其他浏览者，因为他时刻会上来看看他的帖子有什么新的回复。因此，缩短这个反馈周期会大大提升单个用户每天贡献的CLICK。</div>
<div style="margin: 0cm 0cm 0pt 21pt; text-indent: -21pt;"><strong><span>二、</span>凭什么让用户持续地贡献CLICK</strong></div>
<div>网络社区：社区CLICK的核心贡献者我们称之为“话语领袖”，他们乐于分享，迫切关注其他人对他们的反馈。这种情感的积累往往成为各网络社区
的粘性的核心驱动力。第二点来自于群体氛围的积累，当一个话语领袖在一个社区平台聚合了一帮共同话语沟通群体后，那么他对这个群体就有了话语的依赖性。</div>
<div>网络游戏：也是来自于三方面，第一是游戏本身的发展路线需要他持续地贡献出CLICK，第二是网友互动形成的创造性结果的悬念促使他不断地得到
新鲜的刺激，第三是所有的CLICK对于网友所追求的游戏目标都具有贡献，杀100个怪物升一级，每升一级会得到更多的能力和装备，以增加自身在游戏国度
里的竞争力。</div>
<div>让用户持续地贡献CLICK游戏充分体现了一个原则——</div>
<div><strong>进阶原则</strong></div>
<div>所谓进阶原则就是阶段性的MILESTONE，就是升级，每升一级会得到更多的价值反馈，并且具有积累效应。</div>
<div>映射到网络社区的构建里，就是等级制度，以及等级和功能权限的对应关系。有些社区也有等级，但是不同等级除了在等级描述上有不同，并没有在功能
权限甚至荣誉上有如何差别，那么用户的持续CLICK的积极性也就会下降，这个规律在游戏里也一样，设计的不好的游戏不同等级的进阶效果不明显，那么用户
对升级的需求也就会随之下降。</div>
<div style="margin: 0cm 0cm 0pt 21pt; text-indent: -21pt;"><strong><span>三、</span>你能给我什么，你给我的我是否需要，要得到我需要做什么，我的付出和得到是否平衡？</strong></div>
<div>对于任何一个网民，这些问题是每个人都思考过的，游戏厂商也是深入了解过的，但是社区运营者往往想得不够充分。</div>
<div>网络社区：提供精彩的原创内容、提供交友机会、提供展示你自我风采的平台。你需要先积极参与发贴和回复，需要积极参与论坛活动，有了一定积累后
你可以当版主。当了版主能够得到一些其他网友没有的权限，这个过程一般需要几个月。你在这个过程中可以不断得到别人的认可积累自己的人脉获得自我展示的满
足感。付出和得到需要自己平衡。</div>
<div>网络游戏：新鲜刺激的画面感官刺激，虚拟故事的自我实现满足，成为强者在虚拟世界里体验当强者的各种感觉，得到别人的拥护，团队合作完成任务的成就感。</div>
<div>我们从这里又可以看到差别，这个核心差别就引出了第三条原则——</div>
<div><strong>发展路线清晰原则</strong></div>
<div>对于一个游戏玩家，他选择了某一类游戏，那么在这个游戏里的发展路线他可以很快地搞清楚，并且在搞清楚的过程中不断有成就感。而对于网络社区，
一般用户对于在社区的发展路线是不清楚的，大多数社区给这些用户设计的路线和阶段性的价值回报也是模糊的，这就让网络社区成了一种茶余饭后闲时消遣的随意
性行为，目的性就弱了很多，没有了目的性，其行为的触动性也就大大减弱了。</div>
<div>那么网络社区在这一点上可以如何改进呢？有几个点：第一，价值驱动要突出，对于网络平台这个价值包括网站固有的媒体效应——也就是曝光率或者叫
成名机会，还包括各类利益驱动，早先的QQ币以及百度推出百度币都在为提供更多的利益驱动奠定了基础。第二，结合网站自身资源优势发挥资源整合效应。比如
新浪博客，由于先期大搞名人策略，使得很多博客网站都在批驳其违背了WEB2.0的草根原则，但事实上证明无论是名人还是草根对于新浪博客都非常认可，访
问量和活跃度持续上升。原因就在于它提供了一个草根走向名人的发展路线，而这条路线各环节的价值体现都充分发挥了新浪的媒体特性。比如ACOSTA这样一
个原本草根的博客写手通过它就成名了，然后由于新浪的媒体影响力迅速签约了一家知名时尚杂志成为代言人。</div>
<div style="margin: 0cm 0cm 0pt 21pt; text-indent: -21pt;"><strong><span>四、</span>网友互动如何产生</strong></div>
<div>网络社区：对于网友互动现在很多人都提MYSPACE、SNS这些概念。但是很多模仿MYSPACE的国内网站都没有做起来。网络社区都网友互
动仍然要寻找网友互动的内在驱动力。网友和平台的互动前面已经讲了，而网友间的互动，做的好的社区基本上是通过这么几点来实现的。共同兴趣、群体创造价
值、小群体私密交流、偷窥心理和从众心理、个体社交资源管理需求。</div>
<div>网络游戏：和网络社区不同的在于，对于网络游戏，在游戏规则的设计上使得合作成为一种必须而非增值，比如CS战队，魔兽工会。同时，群体互动产生的价值直接而且吸引力巨大。</div>
<div>从这里面可以总结出一个网友互动的原则来——</div>
<div><strong>1</strong><strong>＋1&gt;10</strong><strong>原则</strong></div>
<div>也就是说网友和网友互动产生的价值要远远大于网友和平台的互动，这一点平台要主动地去体现和展现。QQ群的活跃，淘宝商盟对盟员的价值体现都是很好的例子。</div>
<div>可以参考的一些改进措施包括：群体价值的充分展示、定期的线下主题活动、网站的小群体互动功能（一对一的功能和小群体的沟通功能）</div>
<div><strong>五、人人平等还是社会公平</strong></div>
<div>阶级是社会的特性，网络社区虽然提倡人人平等，但是要真正形成具有凝聚力的社区形态，那么必然要从扁平管理转向成熟的阶层管理。</div>
<div>网络社区：对于网络社区来讲，最简单的阶层关系是管理员——版主——网友。但是这个阶层关系仍然太简单。它和现实社会的阶层划分仍然有着很大的差距。</div>
<div>网络游戏：现在的网路游戏由于其高度拟真化和玩家的投入，已经形成了组织体系非常完善和复杂的阶层关系，各阶层不但有着非常具体的分工和资源，
也有着各自协调配合的一整套利益关系，这个利益关系确保了每个角色在为了团队利益付出的同时获取着自身利益的最大化。而这个就是网络社区的第五个原则——</div>
<div><strong>阶级化原则</strong></div>
<div>这个原则往往是很多社区意识 不到的一个误区，在WEB2.0风潮的冲击下，更多地认为网络上可以形成一个人人平等的乌托邦，其实这是违背社会自然规律的。有人的地方就有江湖，话语权可以重新分配，但是听众资源永远最终是一个多数人参与竞争最终只有少数人获得的。</div>
<div>&nbsp;</div>
<div>网络游戏带给网络社区的启示只是一个引子，因为我前面已经说了，网络游戏也是网络社区的一种形态，他们的内在规律是一直的，好的游戏和好的社区都是遵循了社会关系学的基本规律，不好的游戏和社区基本上都是在某方面违背了这些规律。</div></div>]]></description>
	    <author><![CDATA[kaineci]]></author>
	    <comments>http://kaineci.blog.163.com/blog/static/2689633020080101551460</comments>
    <slash:comments>0</slash:comments>
    <guid isPermaLink="true">http://kaineci.blog.163.com/blog/static/2689633020080101551460</guid>
    <pubDate>Thu, 10 Jan 2008 13:05:51 +0800</pubDate>
    <dcterms:modified>2008-01-10T13:05:51+08:00</dcterms:modified>
  </item>    
  <item>
  	<title><![CDATA[google的运算能力]]></title>	
    <link>http://kaineci.blog.163.com/blog/static/2689633020080952419129</link>
    <description><![CDATA[<div>zz：<br><p>Google currently processes over 20 petabytes of data per day through an average of 100,000 MapReduce
jobs spread across its massive computing clusters. The average
MapReduce job ran across approximately 400 machines in September 2007,
crunching approximately 11,000 machine years in a single month. These
are just some of the facts about the search giant's computational
processing infrastructure revealed in an <acronym title="Association for Computing Machinery">ACM</acronym> paper by Google Fellows Jeffrey Dean and Sanjay Ghemawat.</p>

<p>Twenty petabytes (20,000 terabytes) per <strong>day</strong> is a
tremendous amount of data processing and a key contributor to Google's
continued market dominance. Competing search storage and processing
systems at Microsoft (Dyrad) and Yahoo! (Hadoop) are still playing catch-up to Google's suite of <acronym title="Google File System">GFS</acronym>, MapReduce, and BigTable.</p>

<table style="width: 90%;">
<caption>MapReduce statistics for different months</caption>
<thead>
<tr><th><br></th><th><abbr title="2004-08">Aug. 2004</abbr></th><th scope="col"><abbr title="2006-03">Mar. 2006</abbr></th><th scope="col"><abbr title="2007-09">Sep. 2007</abbr></th></tr>
</thead>
<tbody>
<tr><td scope="row">Number of jobs (1000s)</td><td style="text-align: right;">29</td><td style="text-align: right;">171</td><td style="text-align: right;">2,217</td></tr>
<tr><td scope="row"><abbr title="Average">Avg.</abbr> completion time (<abbr title="seconds">secs</abbr>)</td><td style="text-align: right;">634</td><td style="text-align: right;">874</td><td style="text-align: right;">395</td></tr>
<tr><td scope="row">Machine years used</td><td style="text-align: right;">217</td><td style="text-align: right;">2,002</td><td style="text-align: right;">11,081</td></tr>
<tr><td scope="row" style="border-top: thin solid rgb(0, 0, 0);"><code>map</code> input data (<abbr title="terabyte">TB</abbr>)</td><td style="border-top: thin solid rgb(0, 0, 0); text-align: right;">3,288</td><td style="border-top: thin solid rgb(0, 0, 0); text-align: right;">52,254</td><td style="border-top: thin solid rgb(0, 0, 0); text-align: right;">403,152</td></tr>
<tr><td scope="row"><code>map</code> output data (<abbr>TB</abbr>)</td><td style="text-align: right;">758</td><td style="text-align: right;">6,743</td><td style="text-align: right;">34,774</td></tr>
<tr><td scope="row"><code>reduce</code> output data (<abbr>TB</abbr>)</td><td style="text-align: right;">193</td><td style="text-align: right;">2,970</td><td style="text-align: right;">14,018</td></tr>
<tr><td scope="row"><abbr>Avg.</abbr> machines per job</td><td style="text-align: right;">157</td><td style="text-align: right;">268</td><td style="text-align: right;">394</td></tr>
<tr><td scope="row" colspan="4" style="border-top: thin solid rgb(0, 0, 0); border-bottom: thin solid rgb(0, 0, 0);">Unique implementations</td></tr>
<tr><td scope="row"><code>map</code></td><td style="text-align: right;">395</td><td style="text-align: right;">1,958</td><td style="text-align: right;">4,083</td></tr>
<tr><td scope="row"><code>reduce</code></td><td style="text-align: right;">269</td><td style="text-align: right;">1,208</td><td style="text-align: right;">2,418</td></tr>
</tbody>
</table>

<p>Google processes its data on a standard machine cluster node consisting two 2 <abbr title="gigahertz">GHz</abbr> Intel Xeon processors with Hyper-Threading enabled, 4 <abbr title="gigabytes">GB</abbr> of memory, two 160 <abbr>GB</abbr> <acronym title="Integrated Drive Electronics">IDE</acronym> hard drives and a gigabit Ethernet link. This type of machine costs approximately $2400 each through providers such as Penguin Computing or Dell or approximately $900 a month through a managed hosting provider such as Verio (for startup comparisons).</p>

<p>The average MapReduce job runs across a $1 million hardware cluster,
not including bandwidth fees, datacenter costs, or staffing.</p>

<h3>Summary</h3>

<p>The January 2008 MapReduce paper provides new insights into Google's
hardware and software crunching processing tens of petabytes of data
per day. Google converted its search indexing systems to the MapReduce
system in 2003, and currently processes over 20 terabytes of raw web
data. It's some fascinating large-scale processing data that makes your
head spin and appreciate the years of distributed computing fine-tuning
applied to today's large problems.</p><br></div>]]></description>
	    <author><![CDATA[kaineci]]></author>
	    <comments>http://kaineci.blog.163.com/blog/static/2689633020080952419129</comments>
    <slash:comments>0</slash:comments>
    <guid isPermaLink="true">http://kaineci.blog.163.com/blog/static/2689633020080952419129</guid>
    <pubDate>Wed, 9 Jan 2008 17:24:19 +0800</pubDate>
    <dcterms:modified>2008-01-09T17:24:19+08:00</dcterms:modified>
  </item>    
  <item>
  	<title><![CDATA[有感：赢在中国一个项目“电子商务专业搜索”]]></title>	
    <link>http://kaineci.blog.163.com/blog/static/2689633020080942315299</link>
    <description><![CDATA[<div>据说http://www.5maiya.com/就是那个网站。看了一下，<br>感觉两个搜索。一个侧重商品，一个侧重文章和信息。<br>其实后者完全可以精细化。现在只是根据标题分类为促销，购物指南，精品购物。<br>其实，购物搜索已经有很多了，比价，精细的挖掘，曾出不穷。<br>这个站点没有看出自己的有啥根别人不一样的地方。<br><br>购物这个动作，一般分为前期，中期和后期。<br>前期就是为了买，看介绍，评论。卖点是用户点评数据分析。<br>中期就是买的动作，卖点是方便，安全，快捷。和前后期无缝对接。<br>后期就是维修，评论，借此可以拉动社会化交流，比如电玩可以带动游戏社区。也给前期积<br>累数据。<br>中国社会现在老百姓对卖家，乃至广告不太相信，商品的正面信息很容易获取。网络应该是<br>一个获取负面信息的最好场所。所以，差评的维护与搜索，分析应该是重点。<br>但是，问题是如果分析是差评，这个本身对于nlp有难度的课题。而且会带来法律问题。也<br>可能产生“骂托”。所以，差评网络数据的采集和分析基本上从技术上很难，产品上很难，<br>运营很难。<br>唯一出路就是自己培育社区，养成数据。三个按钮，好，中，差。积累用户数据。从这一点<br>上口碑网有先发优势。qihoo在做的一个系统好像跟这个有点类似?<br><br></div>]]></description>
	    <author><![CDATA[kaineci]]></author>
	    <comments>http://kaineci.blog.163.com/blog/static/2689633020080942315299</comments>
    <slash:comments>0</slash:comments>
    <guid isPermaLink="true">http://kaineci.blog.163.com/blog/static/2689633020080942315299</guid>
    <pubDate>Wed, 9 Jan 2008 16:23:15 +0800</pubDate>
    <dcterms:modified>2008-01-09T16:23:15+08:00</dcterms:modified>
  </item>    
  <item>
  	<title><![CDATA[keso的好文]]></title>	
    <link>http://kaineci.blog.163.com/blog/static/26896330200809159600</link>
    <description><![CDATA[<div><div><a href="http://blog.donews.com/keso/archive/2007/12/30/1241838.aspx">东拉西扯：系统 VS You</a></div><br><div><p>《时代》周刊把2006年的年度人物给了<a href="http://www.time.com/time/magazine/article/0,9171,1569514,00.html" target="_blank" title="You" mce_href="http://www.time.com/time/magazine/article/0,9171,1569514,00.html">You</a>，如果让我来选，今年中国互联网的年度人物，我想给“系统”。不是因为它的强大，而是因为它的脆弱。</p>
<p>“系统”这个词，来自《南方周末》12月20日的<a href="http://kesocn.googlepages.com/gaandjou%27r%27me%27y" target="_blank" title="一篇报道" mce_href="http://kesocn.googlepages.com/gaandjou'r'me'y">一篇报道</a>。这是一篇非常出彩的报道，它借一个网络游戏玩家的眼睛，让我们看到了存在于《征途》中的那个强大的、无所不在的、不可抗拒的“系统”。</p>
<p>在报道中，主人公吕洋“惊愕地发现，‘系统’两个字不能显示了，变成了**；再试‘GM’，还是**；再试‘史玉柱’，这次是***”。熟悉吧？作
为中国互联网的用户，我们和吕洋仿佛身处同一个系统之中，尽管我们并不玩《征途》。大量优秀的国外网站被屏蔽，成千上万个网站被批量拔掉服务器的网线，越
来越多的中文词组莫名其妙地成了敏感词、非法词，甚至在更懂中文的搜索引擎上都找不到……有人说，史玉柱的成功，在于他更了解国情，我信了。虽然史玉柱的
那个系统要小得多，但它们似乎完全来自同一个设计。</p>
<p>更有趣的事情发生在这篇报道发布之后。通过豆瓣的<a href="http://www.douban.com/group/topic/2360884/" target="_blank" title="南方周末小组" mce_href="http://www.douban.com/group/topic/2360884/">南方周末小组</a>我得知，南方周末的报道从南方周末自己的网站上神秘地<a href="http://www.nanfangdaily.com.cn/southnews/zmzg/200712200863.asp" target="_blank" title="消失了" mce_href="http://www.nanfangdaily.com.cn/southnews/zmzg/200712200863.asp">消失了</a>。其他新闻网站的所有转载，也都同时消失了。</p>
<p>史玉柱或许以为，一篇报道，只要你把它从网站上删除了，它就不曾存在过。若是在几年前，这种做法的确奏效，所以大多数企业的所谓网络公关，主要任务就是让网站发文和删文。不过这一次，史玉柱明显地失算了。控制的努力激起强烈的反弹，而且，这次挥起大锤的不只是<a href="http://www.youtube.com/watch?v=z9PQ16KVntQ" target="_blank" title="一个女孩儿" mce_href="http://www.youtube.com/watch?v=z9PQ16KVntQ">一个女孩儿</a>，而是成千上万个You，这篇文章被删除之后，反而得到更大范围转载和传播，并引起更多人的注意。</p>
<p>截至我写此文时，Google上可以找到<a href="http://www.google.com/search?hl=en&amp;rls=zh-cn&amp;q=%22%E5%8D%97%E6%96%B9%E5%91%A8%E6%9C%AB%22+%22%E7%B3%BB%E7%BB%9F%22+%22%E5%90%95%E6%B4%8B%22" target="_blank" title="5300篇" mce_href="http://www.google.com/search?hl=en&amp;rls=zh-cn&amp;q=%22%E5%8D%97%E6%96%B9%E5%91%A8%E6%9C%AB%22+%22%E7%B3%BB%E7%BB%9F%22+%22%E5%90%95%E6%B4%8B%22">5310篇</a>该报道的转载，百度上也可以找到<a href="http://www.baidu.com/s?wd=%22%C4%CF%B7%BD%D6%DC%C4%A9%22+%22%CF%B5%CD%B3%22+%22%C2%C0%D1%F3%22&amp;cl=3" target="_blank" title="1850篇" mce_href="http://www.baidu.com/s?wd=%22%C4%CF%B7%BD%D6%DC%C4%A9%22+%22%CF%B5%CD%B3%22+%22%C2%C0%D1%F3%22&amp;cl=3">1850篇</a>
转载（为保证搜索结果的精确，我使用了“南方周末”、“系统”和“吕洋”3个关键词，并给每个关键词都加上了半角引号）。这是You的力量，跟系统相比，
它很弱小，很不起眼，但一旦他们有了一致的行动，老大哥就开始紧张了。系统越是表现得很强大，越是表明它很脆弱。从厦门PX项目，到山西黑窑奴事件，到周
老虎事件，这种力量让2007年的中国互联网光彩照人。</p>
<p><span style="color: rgb(255, 0, 0);">所以，一定会有人</span><a style="color: rgb(255, 0, 0);" href="http://www.donews.com/Content/200712/7b2ccb47c0c14a59a1c19c2c79ccfb30.shtm" target="_blank" title="指桑骂槐地丑化、妖魔化" mce_href="http://www.donews.com/Content/200712/7b2ccb47c0c14a59a1c19c2c79ccfb30.shtm">指桑骂槐地丑化、妖魔化</a><span style="color: rgb(255, 0, 0);">草根的力量，并试图将其纳入系统的控制。也一定会有人以官方的身份合法地</span><a style="color: rgb(255, 0, 0);" href="http://www.chinanews.com.cn/it/kong/news/2007/12-29/1119451.shtml" target="_blank" title="没收" mce_href="http://www.chinanews.com.cn/it/kong/news/2007/12-29/1119451.shtml">没收</a><span style="color: rgb(255, 0, 0);">那些尚未来得及控制的权利，恢复系统原有的秩序。</span>但我要说，互联网赋予You的权利是无法被没收和撤销的，时间将证明系统的可笑和脆弱。</p>
<p>2008年就要来了，我并不期待它比2007年更好或者更坏，它只是2007年的后一年，2009年的前一年，仅此而已。</p><br><p>这文章要是中宣部看了，啥感想？<br></p><p><a href="http://blog.donews.com/keso/archive/2007/12/30/1241838.aspx" target="_blank">http://blog.donews.com/keso/archive/2007/12/30/1241838.aspx</a></p><p><br></p></div></div>]]></description>
	    <author><![CDATA[kaineci]]></author>
	    <comments>http://kaineci.blog.163.com/blog/static/26896330200809159600</comments>
    <slash:comments>0</slash:comments>
    <guid isPermaLink="true">http://kaineci.blog.163.com/blog/static/26896330200809159600</guid>
    <pubDate>Wed, 9 Jan 2008 13:05:09 +0800</pubDate>
    <dcterms:modified>2008-01-09T13:05:09+08:00</dcterms:modified>
  </item>    
  <item>
  	<title><![CDATA[google software ]]></title>	
    <link>http://kaineci.blog.163.com/blog/static/2689633020080332730563</link>
    <description><![CDATA[<div><div >Base Technologies: Internal Development</div>

<div ><b>Google relies on internally developed software.</b></div><br>
<br><!-- BEGIN BODY OF ARTICLE -->
<table border="0" cellpadding="1" cellspacing="0" width="100%">
<tbody>
<tr valign="top">
<td  align="left">
<p>Google primarily relies on its own internally developed software for
data and network management and has a reputation for being skeptical of
"not invented here" technologies, so relatively few vendors can claim
it as a customer.</p>
<table border="0" cellpadding="0" cellspacing="0" width="100%">
<tbody>
<tr>
<td bgcolor="#333333">
<table border="0" cellpadding="4" cellspacing="1" width="100%">
<tbody>
<tr bgcolor="#f7f7e9">
<td  valign="top"><b>APPLICATION</b> </td>
<td  valign="top"><b>PRODUCT</b></td>
<td  valign="top"><b>SUPPLIER</b></td></tr>
<tr bgcolor="#e0e0e0">
<td  valign="top">Distributed file system </td>
<td  valign="top">Google File System</td>
<td  valign="top">Google proprietary</td></tr>
<tr bgcolor="#f7f7e9">
<td  valign="top">Distributed scheduling </td>
<td  valign="top">Global Work Queue</td>
<td  valign="top">Google proprietary</td></tr>
<tr bgcolor="#e0e0e0">
<td  valign="top">Very large database management systems </td>
<td  valign="top">BigTable,Berkeley DB</td>
<td  valign="top">Google proprietary, Sleepycat Software/Oracle</td></tr>
<tr bgcolor="#f7f7e9">
<td  valign="top">Server operating system </td>
<td  valign="top">Red Hat Linux (with kernel-level modifications by Google)</td>
<td  valign="top">Red Hat, Google</td></tr>
<tr bgcolor="#e0e0e0">
<td  valign="top">Web protocol accelerator </td>
<td  valign="top">NetScaler Application Delivery</td>
<td  valign="top">Citrix Systems</td></tr>
<tr bgcolor="#f7f7e9">
<td  valign="top">Web content translation </td>
<td  valign="top">Rosette Language Analyzers for Chinese, Japanese and Korean (used in combination with Google proprietary translation technology)</td>
<td  valign="top">Basis Technology</td></tr>
<tr bgcolor="#e0e0e0">
<td  valign="top">File conversion and content extraction </td>
<td  valign="top">Outside In </td>
<td  valign="top">Stellent</td></tr></tbody></table></td></tr></tbody></table><i>Google's
primary programming languages include C/C++, java and python. Guido Van
Rossum, Python's creator, went to work for google at the end of 2005.
The company also has created sawzall, a special-purpose distributed
computing job preparation language.</i><br></td></tr></tbody></table>
<div >Courting the Enterprise</div>
<div ><span >By</span>&nbsp;<a  href="http://www.baselinemag.com/author_bio/0,1541,a=1059,00.asp">David F. Carr</a></div>
<div ><b>Can Google move beyond its ad business?</b></div><br>
<br><!-- BEGIN BODY OF ARTICLE -->
<table border="0" cellpadding="1" cellspacing="0" width="100%">
<tbody>
<tr valign="top">
<td  align="left">
<p>While there is no doubt about the power Google.com commands among
advertisers and Webmasters, Google the enterprise vendor is another
thing entirely.</p></td></tr></tbody></table>
<p>Since 99% of Google's $6 billion in revenue continues to come from
advertising, the Google Enterprise division represents a tiny part of
the overall business. This is the group that wants to sell you a little
bit of Google in a box—the Search Appliance product line—that embeds a
variation of the Google.com search engine software that powers
Google.com in a yellow server box or blue blade server that enterprise
customers can plug into their own data centers.</p>
<p><b>The appliance product line includes the entry-level google mini</b>,
which starts at $1,995 for a model capable of indexing 100,000
documents. It is often used to provide the search capability for public
Web sites, although it is also used internally by small enterprises or
departments of larger ones. Beyond that level, the Search Appliance
line includes the Google Appliance GB-1001, which can handle up to a
million documents; and the GB-5005 and GB-8008, which, when delivered
in the form of multiple servers in a rack and working together as a
Google File System cluster, can handle many millions of documents. "It
really is like a little Google data center in a box," says Matt
Glotzbach, head of products for Google Enterprise.</p>
<p>Because the technology is delivered in the form of an appliance,
customers aren't supposed to crack open the case and tinker with the
technology inside, and they aren't provided with root access to the
server operating environment.</p>
<p><!-- Vignette V6 Thu Aug 03 04:05:33 2006 --><!--WEB 6--><!-- RELATED LINKS -->
</p><p>Instead, Google provides a set of Web-based administration
screens, as well as application programming interfaces for modifying
the appliance's behavior. The most significant development on that
front is <b>the introduction of the onebox application programming interface</b>,
which allows data drawn from other systems that otherwise wouldn't be
indexed by the search appliance to be displayed at the top of the
search results.</p>
<p>The enterprise version of OneBox is modeled after the feature of
Google's public Web site that inserts links to data from weather
reports, phone listings or maps into search results when the search
engine recognizes a pattern associated with an address, a city name or
a phone number in the keywords entered by the user. Similarly, the
appliance can be programmed to recognize patterns associated with
purchase order numbers or common business queries, and insert links to
related data. For example, a search for quarterly sales data could go
beyond searching intranet Web content and pop up a link to a more
structured data source, such as a Cognos financial analysis application.</p>
<p>Dave Girouard, vice president and general manager of the Google
Enterprise business unit, says Google has been beefing up its
capabilities to address enterprise requirements, making the appliances
easier to install, use and manage. </p>
<p><b>However, google still needs to do a better job of addressing enterprise requirements</b>,
particularly in terms of support, according to Gartner's Whit Andrews,
an authority on the evolution of search technology. "It has taken
Google a while to recognize that it needs to do business with the
enterprise in a different way from how it does business with the
advertiser," he says. </p>
<p>Enterprises that have bought the appliances often give positive
reports on the value Google delivers for the money and on the ease of
setup and administration, Andrews says. But support is another story,
according to Andrews, who has talked to customers who say Google's
response to problems with the appliances is too often along the lines
of, "Yeah, we know about that, we'll get back to you."</p>
<p>Google says it is investing in improved support. Andrews agrees that the support is better, but says it's still not enough.</p>
<p>However, Google can point to many happy customers. Brown Rudnick
Berlack Israels, an international law firm based in Boston, got the
Google Mini it purchased up and running in about an hour, says Keith
Schultz, who manages the firm's Web sites. "We've really had no
problems with it at all," Schultz says.</p>
<p>&nbsp;</p>
<div >Red Hat: Still Savvy</div>
<div ><span >By</span>&nbsp;<span >Brian P. Watson</span></div>
<div ><b>Red Hat sells and supports open-source Linux software, which Google has been using to build and run its operating system.</b></div>

<p><br><!-- BEGIN BODY OF ARTICLE -->
</p>



<p>Forging ahead with the same business model for more than 12 years
might seem old hat to some in the constantly changing world of
information technology, but business customers say Red Hat wears it
well. </p></div>]]></description>
	    <author><![CDATA[kaineci]]></author>
	    <comments>http://kaineci.blog.163.com/blog/static/2689633020080332730563</comments>
    <slash:comments>0</slash:comments>
    <guid isPermaLink="true">http://kaineci.blog.163.com/blog/static/2689633020080332730563</guid>
    <pubDate>Thu, 3 Jan 2008 15:27:30 +0800</pubDate>
    <dcterms:modified>2008-01-03T15:27:30+08:00</dcterms:modified>
  </item>    
  <item>
  	<title><![CDATA[视频网站突破]]></title>	
    <link>http://kaineci.blog.163.com/blog/static/268963302008028302783</link>
    <description><![CDATA[<div>还是要靠广告，或者制片或者出专辑。<br>制片，出专辑这个还是草根中蕴藏着大量的能人，但是他们没有能力去出片。视频网站能否对接上下游，一边是草根中蕴藏的娱乐智慧 ，一边是大量中小型制片公司。在中间架设桥梁。<br>主要说广告，现在的广告基本上和视频无关，顶多根据视频的类别，还有tag搞一下。其实，这里面最有潜力的还是content-based广告。比方：色戒中xx给了个 内裤的特写。如果你能提取出这些关键桢，打内裤广告。哈哈，你说能不火吗？不过这个科技含量比较高。msra我看行。<br></div>]]></description>
	    <author><![CDATA[kaineci]]></author>
	    <comments>http://kaineci.blog.163.com/blog/static/268963302008028302783</comments>
    <slash:comments>0</slash:comments>
    <guid isPermaLink="true">http://kaineci.blog.163.com/blog/static/268963302008028302783</guid>
    <pubDate>Wed, 2 Jan 2008 20:30:27 +0800</pubDate>
    <dcterms:modified>2008-01-02T20:30:27+08:00</dcterms:modified>
  </item>    
  <item>
  	<title><![CDATA[Managing volatility]]></title>	
    <link>http://kaineci.blog.163.com/blog/static/26896330200711245158232</link>
    <description><![CDATA[<div><p><a href="http://www.ibm.com/developerworks/java/library/j-jtp06197.html?S_TACT=105AGX52&amp;S_CMP=cn-a-j#author">Brian Goetz</a> (<a href="mailto:brian.goetz@sun.com?subject=Managing%20volatility">brian.goetz@sun.com</a>), Senior Staff Engineer, Sun Microsystems<br></p><p> 19 Jun  2007</p><blockquote>The Java&#8482; language contains two intrinsic
    synchronization mechanisms: synchronized blocks (and methods) and
    volatile variables. Both are provided for the purpose of rendering
    code thread-safe. Volatile variables are the weaker (but sometimes
    simpler or less expensive) of the two -- but also easier to use
    incorrectly. In this installment of <i>Java theory and practice</i>,
    Brian Goetz explores some patterns for using volatile variables correctly
    and offers some warnings about the limits of its
    applicability.</blockquote><!--START RESERVED FOR FUTURE USE INCLUDE FILES--><!--END RESERVED FOR FUTURE USE INCLUDE FILES-->

            <p>
Volatile variables in the Java language can be thought of as
"<code>synchronized</code> lite"; they require less coding to use than <code>synchronized</code>
blocks and often have less runtime overhead, but they can only be used
to do a subset of the things that <code>synchronized</code> can. This article presents some patterns for using volatile variables effectively --
and some warnings about when not to use them.
</p>
            <p>
Locks offer two primary features: <i>mutual exclusion</i> and
<i>visibility</i>. Mutual exclusion means that only one thread at a
time may hold a given lock, and this property can be used to implement
protocols for coordinating access to shared data such that only one
thread at a time will be using the shared data. Visibility is more
subtle and has to do with ensuring that changes made to shared data
prior to releasing a lock are made visible to another thread that
subsequently acquires that lock -- without the visibility guarantees
provided by synchronization, threads could see stale or inconsistent
values for shared variables, which could cause a host of serious
problems.
</p>
            <p><a><span>Volatile variables</span></a></p>

            <p>
Volatile variables share the visibility features of <code>synchronized</code>,
but
none of the atomicity features. This means that threads will
automatically see the most up-to-date value for volatile variables.
They can be used to provide thread safety, but only in a very
restricted set of
cases: those that do not impose constraints between multiple
variables or between a variable's current value and its future
values. So volatile alone is not strong enough to
implement a counter, a mutex, or any class that has invariants that
relate multiple variables (such as "start &lt;=end"). </p>
            <p>
You might prefer to use volatile variables instead of locks for one of two
principal reasons: simplicity or scalability. Some idioms are easier
to code and read when they use volatile variables instead of
locks. In addition, volatile variables (unlike locks) cannot cause a
thread to block, so they are less likely to cause scalability
problems. In situations where reads greatly outnumber writes, volatile
variables may also provide a performance advantage over locking.
</p>
            <p><a><span>Conditions for correct use of volatile</span></a></p>
            <p>
You can use volatile variables instead of locks only under a
restricted set of circumstances. Both of the following criteria must
be met for volatile variables to provide the desired thread-safety:

</p><ul><li>Writes to the variable do not depend on its current value.</li><li>The variable does not participate in invariants with other variables.</li></ul>
            
            <p>
Basically, these conditions state that the set of valid values that
can be written to a volatile variable is independent of any other
program state, including the variable's current state.
</p>

            <p>
The first condition disqualifies volatile variables from being used as
thread-safe counters. While the increment operation (<code>x++</code>)
may look like a single operation, it is really a compound
read-modify-write sequence of operations that must execute atomically
-- and volatile does not provide the necessary atomicity. Correct
operation would require that the value of <code>x</code> stay
unchanged for the duration of the operation, which cannot be achieved
using volatile variables. (However, if you can arrange that the value
is only ever written from a single thread, then you can ignore the first condition.)
</p>

            <p>
Most programming situations will fall afoul of either the first or
second condition, making volatile variables a less commonly applicable approach to
achieving thread-safety than <code>synchronized</code>. Listing 1 shows a
non-thread-safe number range class. It contains an invariant -- that
the lower bound is always less than or equal to the upper bound.
</p>
            <br><a><b>Listing 1. Non-thread-safe number range class</b></a><br><table border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td><pre>                <br>@NotThreadSafe <br>public class NumberRange {<br>    private int lower, upper;<br><br>    public int getLower() { return lower; }<br>    public int getUpper() { return upper; }<br><br>    public void setLower(int value) { <br>        if (value &gt; upper) <br>            throw new IllegalArgumentException(...);<br>        lower = value;<br>    }<br><br>    public void setUpper(int value) { <br>        if (value &lt; lower) <br>            throw new IllegalArgumentException(...);<br>        upper = value;<br>    }<br>}<br></pre></td></tr></tbody></table><br>

            <p>
Because the state variables of the range are constrained in this
manner, making the <code>lower</code> and upper fields
volatile would not be sufficient to make the class thread-safe;
synchronization would still be needed. Otherwise, with some unlucky
timing, two threads executing <code>setLower</code> and
<code>setUpper</code> with inconsistent values could leave the range
in an inconsistent state. For example, if the initial state is
<code>(0, 5)</code>, and thread A calls <code>setLower(4)</code> at
the same time that thread B calls <code>setUpper(3)</code>, and the
operations are interleaved just wrong, both could pass the checks that
are supposed to protect the invariant and end up with the range
holding <code>(4, 3)</code> -- an invalid value. We need to make the
<code>setLower()</code> and <code>setUpper()</code> operations atomic
with respect to other operations on the range -- and making the fields
volatile can't do this for us.
</p>
            <p><a><span>Performance considerations</span></a></p>
            <p>
The primary motivation for using volatile variables is simplicity: In
some situations, using a volatile variable is just simpler than using
the corresponding locking. A secondary motivation for using volatile
variables is performance: In some situations, volatile variables may
be a better-performing synchronization mechanism than locking.
</p>

            <p>
It is exceedingly difficult to make accurate, general statements of
the form "X is always faster than Y," especially when it comes to
intrinsic JVM operations. (For example, the VM may be able to remove
locking entirely in some situations, which makes it hard to talk about
the relative cost of <code>volatile</code>
vs. <code>synchronized</code> in the abstract.) That said, on most
current processor architectures, volatile reads are cheap -- nearly as
cheap as nonvolatile reads. Volatile writes are considerably more
expensive than nonvolatile writes because of the memory fencing
required to guarantee visibility but still generally cheaper than lock acquisition.
</p>
            <p>
Unlike locking, volatile operations will never block, so volatiles
offer some scalability advantages over locking in the cases where they
can be used safely. In cases where reads greatly outnumber writes,
volatile variables can often reduce the performance cost of
synchronization compared to locking.
</p>
            <p><a><span>Patterns for using volatile correctly</span></a></p>
            <p>
Many concurrency experts tend to guide users away from using volatile
variables at all, because they are harder to use correctly than
locks. However, some well-defined patterns exist, which, if you follow
them carefully, can be used safely in a wide variety of
situations. Always keep in mind the rules about the limits of where
volatile can be used -- only use volatile for state that is truly
independent of everything else in your program -- and this should keep
you from trying to extend these patterns into dangerous territory.
</p>
            <p><a><span>Pattern #1: status flags</span></a></p>
            <p>
Perhaps the canonical use of volatile variables is simple boolean
status flags, indicating that an important one-time life-cycle event
has happened, such as initialization has completed or shutdown has
been requested.
</p>
            <p>
Many applications include a control construct of the form, "While
we're not ready to shut down, do more work," as shown in Listing 2:
</p>
            <br><a><b>Listing 2. Using a volatile variable as a status flag</b></a><br><table border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td><pre>                <br>volatile boolean shutdownRequested;<br><br>...<br><br>public void shutdown() { shutdownRequested = true; }<br><br>public void doWork() { <br>    while (!shutdownRequested) { <br>        // do stuff<br>    }<br>}<br></pre></td></tr></tbody></table><br>
            <p>
It is likely that the <code>shutdown()</code> method is going to be
called from somewhere outside the loop -- in another thread -- and as
such, some form of synchronization is required to ensure the proper
visibility of the <code>shutdownRequested</code> variable. (It might be called from
a JMX listener, an action listener in the GUI event thread, through
RMI, through a Web service, and so on.) However, coding the loop with
<code>synchronized</code> blocks would be much more cumbersome than
coding it with a volatile status flag as in Listing 2. Because
volatile simplifies the coding, and the status flag does not depend on
any other state in the program, this is a good use for volatile.
</p>

            <p>
One common characteristic of status flags of this type is that there
is typically only one state transition; the
<code>shutdownRequested</code> flag goes from <code>false</code> to
<code>true</code> and then the program shuts down. This pattern can
be extended to state flags that can change back and forth, but only if
it is acceptable for a transition cycle (from <code>false</code> to
<code>true</code> to <code>false</code>) to go undetected. Otherwise,
some sort of atomic state transition mechanism is needed, such as
atomic variables.
</p>
            <p><a><span>Pattern #2: one-time safe publication</span></a></p>
            <p>
The visibility failures that are possible in the absence of
synchronization can get even trickier to reason about when writing to
object references instead of primitive values. In the absence of
synchronization, it is possible to see an up-to-date value for an
object reference that was written by another thread and still see
stale values for that object's state. (This hazard is the root of the
problem with the infamous double-checked-locking idiom, where an
object reference is read without synchronization, and the risk is that
you could see an up-to-date reference but still observe a partially
constructed object through that reference.)
</p>
            <p>
One technique for safely publishing an object is to make the object
reference volatile. Listing 3 shows an example where during startup, a
background thread loads some data from a database. Other code,
when it might be able to make use of this data, checks to see if it
has been published before trying to use it.
</p>
            <br><a><b>Listing 3. Using a volatile variable for safe one-time publication</b></a><br><table border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td><pre>                <br>public class BackgroundFloobleLoader {<br>    public volatile Flooble theFlooble;<br><br>    public void initInBackground() {<br>        // do lots of stuff<br>        theFlooble = new Flooble();  // this is the only write to theFlooble<br>    }<br>}<br><br>public class SomeOtherClass {<br>    public void doWork() {<br>        while (true) { <br>            // do some stuff...<br>            // use the Flooble, but only if it is ready<br>            if (floobleLoader.theFlooble != null) <br>                doSomething(floobleLoader.theFlooble);<br>        }<br>    }<br>}<br></pre></td></tr></tbody></table><br>
            <p>
Without the <code>theFlooble</code> reference being volatile, the code
in <code>doWork()</code> would be at risk for seeing a partially constructed
<code>Flooble</code> as it dereferences the <code>theFlooble</code>
reference.
</p>
            <p>
A key requirement for this pattern is that the object being published
must either be thread-safe or effectively immutable (effectively
immutable means that its state is never modified after its
publication). The volatile reference may guarantee the visibility of
the object in its as-published form, but if the state of the object is
going to change after publication, then additional synchronization is
required.
</p>
            <p><a><span>Pattern #3: independent observations</span></a></p>
            <p>
Another simple pattern for safely using volatile is when observations
are periodically "published" for consumption within the program. For
example, say there is an environmental sensor that senses the current
temperature. A background thread might read this sensor every few
seconds and update a volatile variable containing the current
temperature.  Then, other threads can read this variable knowing that they will always see the most up-to-date value. 
</p>

            <p>
Another application for this pattern is gathering statistics about the
program. Listing 4 shows how an authentication mechanism might
remember the name of the last user to have logged on. The <code>lastUser</code>
reference will be repeatedly used to publish a value for consumption
by the rest of the program.
</p>
            <br><a><b>Listing 4. Using a volatile variable for multiple publications of independent observations</b></a><br><table border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td><pre>                <br>public class UserManager {<br>    public volatile String lastUser;<br><br>    public boolean authenticate(String user, String password) {<br>        boolean valid = passwordIsValid(user, password);<br>        if (valid) {<br>            User u = new User();<br>            activeUsers.add(u);<br>            lastUser = user;<br>        }<br>        return valid;<br>    }<br>} <br></pre></td></tr></tbody></table><br>
            <p>
This pattern is an extension of the previous one; a value is being
published for use elsewhere within the program, but instead of
publication being a one-time event, it is a series of independent
events. This pattern requires that the value being published be
effectively immutable -- that its state not change after
publication. Code consuming the value should be aware that it might
change at any time.
</p>
            <p><a><span>Pattern #4: the "volatile bean" pattern</span></a></p>

            <p>
The volatile bean pattern is applicable in frameworks that use
JavaBeans as "glorified structs." In the volatile bean pattern, a
JavaBean is used as a container for a group of independent properties
with getters and/or setters. The rationale for the volatile bean
pattern is that many frameworks provide containers for mutable data
holders (for instance, <code>HttpSession</code>), but the objects
placed in those containers must be thread safe.
</p>

            <p>
In the volatile bean pattern, all the data members of the JavaBean are
volatile, and the getters and setters must be trivial -- they must
contain no logic other than getting or setting the appropriate
property. Further, for data members that are object references, the
referred-to objects must be effectively immutable. (This prohibits
having array-valued properties, as when an array reference is declared
<code>volatile</code>, only the reference, not the elements
themselves, have volatile semantics.) As with any volatile variable,
there may be no invariants or constraints involving the properties of
the JavaBean. An example of a JavaBean obeying the volatile bean
pattern is shown in Listing 5:
</p>
            <br><a><b>Listing 5. A Person object obeying the volatile bean pattern</b></a><br><table border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td><pre>                <br>@ThreadSafe<br>public class Person {<br>    private volatile String firstName;<br>    private volatile String lastName;<br>    private volatile int age;<br><br>    public String getFirstName() { return firstName; }<br>    public String getLastName() { return lastName; }<br>    public int getAge() { return age; }<br><br>    public void setFirstName(String firstName) { <br>        this.firstName = firstName;<br>    }<br><br>    public void setLastName(String lastName) { <br>        this.lastName = lastName;<br>    }<br><br>    public void setAge(int age) { <br>        this.age = age;<br>    }<br>}<br></pre></td></tr></tbody></table><br>
            <p><a><span>Advanced patterns for volatile</span></a></p>
            <p>
The patterns in the previous section cover most of the basic cases
where the use of volatile is sensible and straightforward. This section looks at a more advanced pattern where volatile might
offer a performance or scalability benefit.
</p>
            <p>
The more advanced patterns for using volatile can be extremely
fragile. It is critical that your assumptions be carefully documented
and these patterns strongly encapsulated because very small changes
can break your code! Also, given that the primary motivation for the
more advanced volatile use cases is performance, be sure that you
actually have a demonstrated need for the purported performance gain
before you start applying them. These patterns are trade-offs that give
up readability or maintainability in exchange for a possible performance boost -- if
you don't need the performance boost (or can't prove you need it
through a rigorous measurement program), then it is probably a bad
trade because you're giving up something of value and getting
something of lesser value in return.
</p>
            <p><a><span>Pattern #5: The cheap read-write lock trick</span></a></p>

            <p>
By now, it should be well-known that volatile is not strong enough to
implement a counter. Because <code>++x</code> is really shorthand for
three operations (read, add, store), with some unlucky timing it is
possible for updates to be lost if multiple threads tried to increment
a volatile counter at once.
</p>

            <p>
However, if reads greatly outnumber modifications, you can combine
intrinsic locking and volatile variables to reduce the cost on the
common code path. Listing 6 shows a thread-safe counter that uses
<code>synchronized</code> to ensure that the increment operation is
atomic and uses <code>volatile</code> to guarantee the visibility of
the current result. If updates are infrequent, this approach may
perform better as the overhead on the read path is only a volatile
read, which is generally cheaper than an uncontended lock acquisition.
</p>
            <br><a><b>Listing 6. Combining volatile and synchronized to form a "cheap read-write lock"</b></a><br><br>
            <p>
The reason this technique is called the "cheap read-write lock" is
that you are using different synchronization mechanisms for reads and
writes. Because the writes in this case violate the first condition for using volatile, you cannot
use volatile to safely implement the counter -- you must use
locking. However, you can use volatile to ensure the <i>visibility</i>
of the current value when reading, so you use locking for all mutative
operations and volatile for read-only operations. Where locks only
allow one thread to access a value at once, volatile reads allow more
than one, so when you use volatile to guard the read code path, you get
a higher degree of sharing than you would were you to use locking for
all code paths -- just like a read-write lock. However, bear in mind
the fragility of this pattern: With two competing synchronization
mechanisms, this can get very tricky if you branch out beyond the most
basic application of this pattern.
</p>
            <p><a><span>Summary</span></a></p>
            <table style="width: 150px; height: 26px;" align="right" border="0" cellpadding="0" cellspacing="0"><tbody><tr><td width="10"><img alt="" src="http://www.ibm.com/i/c.gif" height="1" width="10"></td><td><br></td></tr></tbody></table>

            <p>
Volatile variables are a simpler -- but weaker -- form of
synchronization than locking, which in some cases offers better
performance or scalability than intrinsic locking. If you follow the
conditions for using volatile safely -- that the variable is truly
independent of both other variables and its own prior values -- you can
sometimes simplify code by using <code>volatile</code> instead of
<code>synchronized</code>. However, code using <code>volatile</code>
is often more fragile than code using locking. The patterns offered
here cover the most common cases where <code>volatile</code> is a
sensible alternative to <code>synchronized</code>. Following these
patterns -- taking care not to push them beyond their limits -- should
help you safely cover the majority of cases where volatile variables
are a win.
</p></div>]]></description>
	    <author><![CDATA[kaineci]]></author>
	    <comments>http://kaineci.blog.163.com/blog/static/26896330200711245158232</comments>
    <slash:comments>0</slash:comments>
    <guid isPermaLink="true">http://kaineci.blog.163.com/blog/static/26896330200711245158232</guid>
    <pubDate>Mon, 24 Dec 2007 17:15:08 +0800</pubDate>
    <dcterms:modified>2007-12-24T17:15:50+08:00</dcterms:modified>
  </item>    
  <item>
  	<title><![CDATA[后进搜索引擎如何赶超？]]></title>	
    <link>http://kaineci.blog.163.com/blog/static/26896330200711214419326</link>
    <description><![CDATA[<div>1。技术上绝对压倒。<br>2。思路上创新。<br>3。推广有效。<br>推广主要从用户的写，查，看，用入手。<br>其中写，当然是指输入法等工具，典型：sogou。查可以借助开发词典，典型：yodao。看可以借助辅助软件，也可以注册右键，例如跟adobe reader协议等。用可以从与网民喜闻乐见的软件设置是小的共享软件入手，不必装插件。比如：foxmail右键菜单，eclipse的帮助，如果代码例子之类的。或者一些软件的错误代码，直接捆绑。用，可以利用播放器，编辑器，下载工具，读书工具等集成查找功能。典型：google xunlei, qihoo 360。比如写个mp3播放器或者wplayer的小插件，根据正在放的歌曲，找歌词，找歌星8g新闻，推荐歌曲。<br>你可以不用开发这么多工具，可以找小的共享软件，这可是蓝海啊。利润可以与小软件分成麻。总之，要么你足够牛让用户用你。要不然，就让用户不知不觉用你。<br></div>]]></description>
	    <author><![CDATA[kaineci]]></author>
	    <comments>http://kaineci.blog.163.com/blog/static/26896330200711214419326</comments>
    <slash:comments>0</slash:comments>
    <guid isPermaLink="true">http://kaineci.blog.163.com/blog/static/26896330200711214419326</guid>
    <pubDate>Fri, 21 Dec 2007 16:41:09 +0800</pubDate>
    <dcterms:modified>2008-01-02T20:20:32+08:00</dcterms:modified>
  </item>    
  <item>
  	<title><![CDATA[harddisk]]></title>	
    <link>http://kaineci.blog.163.com/blog/static/268963302007112071036383</link>
    <description><![CDATA[<div>S现在的技术发展实在太快，很多人还没弄清SATA到底有什么好，SATA II又来了。在传统的IDE、潮流的SATA与前卫的SATA II硬盘之间，到底有着什么样的区别？几种不同的硬盘各自价格等方面又是怎么样？相信很多朋友都想知道。<br><br>在深入了解新标准之前，有必要回顾一下原有的技术。长期以来，硬盘技术的进步，都着重于传输速度和容量两个方面。基本上认识电脑以来，大家就一直在使用Ultra ATA。这种延用已久的接口技术，有好些方面都显得过时而需要改进了：<br>大家都知道，数据线太粗，安装不方便，严重影响机箱内空气流通，不利于机箱散热，