文章基本信息

标题：Analysis of string representations for a modern programming language.
作者：Naugler, David
期刊名称：Transactions of the Missouri Academy of Science
印刷版ISSN：0544-540X
出版年度：2005
期号：January
语种：English
出版社：Missouri Academy of Science
摘要：If you are designing a language that will provide only one primitive string type that subsumes character and supports Unicode, what is the best internal representation? Should it be mutable or immutable? Which encoding should it use: UTF-8, UTF-16, UTF-32, some hybrid, or multiple encodings? Should the length be encoded as part of the string, and if so, how? Should the string support list-like head/tail recursive algorithms? Should strings be interned (stored in a global hash table) to save space and provide constant-time equality checks? If so, how should the hashing work? In general, should strings be viewed as suitable data structures for most common text operations, or are they opaque containers that must be converted to some other type (list, vector, deque, etc.) for processing? No perfect solution exists, but I analyze the alternatives and justify the string representation I use for my programming language Rune.
关键词：Programming languages;Software engineering

Analysis of string representations for a modern programming language.

Naugler, David

If you are designing a language that will provide only one primitive string type that subsumes character and supports Unicode, what is the best internal representation? Should it be mutable or immutable? Which encoding should it use: UTF-8, UTF-16, UTF-32, some hybrid, or multiple encodings? Should the length be encoded as part of the string, and if so, how? Should the string support list-like head/tail recursive algorithms? Should strings be interned (stored in a global hash table) to save space and provide constant-time equality checks? If so, how should the hashing work? In general, should strings be viewed as suitable data structures for most common text operations, or are they opaque containers that must be converted to some other type (list, vector, deque, etc.) for processing? No perfect solution exists, but I analyze the alternatives and justify the string representation I use for my programming language Rune.

* Shade, E. Computer Science Department, Southwest Missouri State University.