【Java】String、StringBuffer & StringBuilder

在JAVA中,对字符串的操作可以通过StringStringBufferStringBuilder来进行。三者虽然功能相似,但在实际开发中,不同场景下使用三者的性能会存在一定差异,三者各有其较适宜使用的场景。本文将通过分析三者的源代码,分析主要数据组织及功能实现上的区别、设计方式的影响,并分析三者分别适合使用的场景。

源代码分析

String类

主要结构

​ String类的结构如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
public final class String
implements java.io.Serializable, Comparable<String>, CharSequence {
private final byte[] value;
private final byte coder;
private int hash; // Default to 0
private static final long serialVersionUID = -6849794470754667710L;
static final boolean COMPACT_STRINGS;
static {
COMPACT_STRINGS = true;
}
private static final ObjectStreamField[] serialPersistentFields =
new ObjectStreamField[0];

...

}

​ String类被final修饰,说明它不可被继承。

​ String的值由byte[] value来存储,即String的本质是一个byte的数组。被final修饰,说明String是个不可变类,即存储的值不可改变。

构造函数

​ String类提供的构造函数有很多种,在下面进行一些简单的分析:

  1. 无参构造,创建空字符串
1
2
3
4
public String() {
this.value = "".value;
this.coder = "".coder;
}
  1. 提供一个String作为参数,进行深拷贝
1
2
3
4
5
public String(String original) {
this.value = original.value;
this.coder = original.coder;
this.hash = original.hash;
}
  1. 提供一个字符数组作为参数,构造新的String
1
2
3
public String(char value[]) {
this(value, 0, value.length, null);
}
  1. 通过一个现有数组中截取一段子数组,构造新的String

​ 数组可能是字符或者各种形式的编码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
public String(char value[], int offset, int count) {
this(value, offset, count, rangeCheck(value, offset, count));
}

public String(int[] codePoints, int offset, int count) {
checkBoundsOffCount(offset, count, codePoints.length);
if (count == 0) {
this.value = "".value;
this.coder = "".coder;
return;
}
if (COMPACT_STRINGS) {
byte[] val = StringLatin1.toBytes(codePoints, offset, count);
if (val != null) {
this.coder = LATIN1;
this.value = val;
return;
}
}
this.coder = UTF16;
this.value = StringUTF16.toBytes(codePoints, offset, count);
}

public String(byte bytes[], int offset, int length, String charsetName)
throws UnsupportedEncodingException {
if (charsetName == null)
throw new NullPointerException("charsetName");
checkBoundsOffCount(offset, length, bytes.length);
StringCoding.Result ret =
StringCoding.decode(charsetName, bytes, offset, length);
this.value = ret.value;
this.coder = ret.coder;
}

public String(byte bytes[], int offset, int length, Charset charset) {
if (charset == null)
throw new NullPointerException("charset");
checkBoundsOffCount(offset, length, bytes.length);
StringCoding.Result ret =
StringCoding.decode(charset, bytes, offset, length);
this.value = ret.value;
this.coder = ret.coder;
}

public String(byte bytes[], String charsetName)
throws UnsupportedEncodingException {
this(bytes, 0, bytes.length, charsetName);
}

public String(byte bytes[], Charset charset) {
this(bytes, 0, bytes.length, charset);
}

public String(byte bytes[], int offset, int length) {
checkBoundsOffCount(offset, length, bytes.length);
StringCoding.Result ret = StringCoding.decode(bytes, offset, length);
this.value = ret.value;
this.coder = ret.coder;
}
  1. 通过byte数组或者StringBuffer、StringBuilder来构造新的String
1
2
3
4
5
6
7
8
9
10
11
public String(byte[] bytes) {
this(bytes, 0, bytes.length);
}

public String(StringBuffer buffer) {
this(buffer.toString());
}

public String(StringBuilder builder) {
this(builder, null);
}

常用方法

  1. 长度
1
2
3
public int length() {
return value.length >> coder();
}
  1. 判断是否为空
1
2
3
public boolean isEmpty() {
return value.length == 0;
}
  1. 取某位置的字符
1
2
3
4
5
6
7
public char charAt(int index) {
if (isLatin1()) {
return StringLatin1.charAt(value, index);
} else {
return StringUTF16.charAt(value, index);
}
}
  1. 判断两个字符串是否相等
1
2
3
4
5
6
7
8
9
10
11
12
13
public boolean equals(Object anObject) {
if (this == anObject) {
return true;
}
if (anObject instanceof String) {
String aString = (String)anObject;
if (coder() == aString.coder()) {
return isLatin1() ? StringLatin1.equals(value, aString.value)
: StringUTF16.equals(value, aString.value);
}
}
return false;
}
  1. 比较两个字符串
1
2
3
4
5
6
7
8
9
10
public int compareTo(String anotherString) {
byte v1[] = value;
byte v2[] = anotherString.value;
if (coder() == anotherString.coder()) {
return isLatin1() ? StringLatin1.compareTo(v1, v2)
: StringUTF16.compareTo(v1, v2);
}
return isLatin1() ? StringLatin1.compareToUTF16(v1, v2)
: StringUTF16.compareToLatin1(v1, v2);
}
  1. 判断起始/终结字符
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
public boolean startsWith(String prefix, int toffset) {
// Note: toffset might be near -1>>>1.
if (toffset < 0 || toffset > length() - prefix.length()) {
return false;
}
byte ta[] = value;
byte pa[] = prefix.value;
int po = 0;
int pc = pa.length;
if (coder() == prefix.coder()) {
int to = isLatin1() ? toffset : toffset << 1;
while (po < pc) {
if (ta[to++] != pa[po++]) {
return false;
}
}
} else {
if (isLatin1()) { // && pcoder == UTF16
return false;
}
// coder == UTF16 && pcoder == LATIN1)
while (po < pc) {
if (StringUTF16.getChar(ta, toffset++) != (pa[po++] & 0xff)) {
return false;
}
}
}
return true;
}

public boolean startsWith(String prefix) {
return startsWith(prefix, 0);
}

public boolean endsWith(String suffix) {
return startsWith(suffix, length() - suffix.length());
}
  1. 找某字符的下标
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
public int indexOf(int ch) {
return indexOf(ch, 0);
}

public int indexOf(String str) {
if (coder() == str.coder()) {
return isLatin1() ? StringLatin1.indexOf(value, str.value)
: StringUTF16.indexOf(value, str.value);
}
if (coder() == LATIN1) { // str.coder == UTF16
return -1;
}
return StringUTF16.indexOfLatin1(value, str.value);
}

public int indexOf(String str, int fromIndex) {
return indexOf(value, coder(), length(), str, fromIndex);
}

static int indexOf(byte[] src, byte srcCoder, int srcCount,
String tgtStr, int fromIndex) {
byte[] tgt = tgtStr.value;
byte tgtCoder = tgtStr.coder();
int tgtCount = tgtStr.length();

if (fromIndex >= srcCount) {
return (tgtCount == 0 ? srcCount : -1);
}
if (fromIndex < 0) {
fromIndex = 0;
}
if (tgtCount == 0) {
return fromIndex;
}
if (tgtCount > srcCount) {
return -1;
}
if (srcCoder == tgtCoder) {
return srcCoder == LATIN1
? StringLatin1.indexOf(src, srcCount, tgt, tgtCount, fromIndex)
: StringUTF16.indexOf(src, srcCount, tgt, tgtCount, fromIndex);
}
if (srcCoder == LATIN1) { // && tgtCoder == UTF16
return -1;
}
// srcCoder == UTF16 && tgtCoder == LATIN1) {
return StringUTF16.indexOfLatin1(src, srcCount, tgt, tgtCount, fromIndex);
}
  1. 取子串
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
public String substring(int beginIndex) {
if (beginIndex < 0) {
throw new StringIndexOutOfBoundsException(beginIndex);
}
int subLen = length() - beginIndex;
if (subLen < 0) {
throw new StringIndexOutOfBoundsException(subLen);
}
if (beginIndex == 0) {
return this;
}
return isLatin1() ? StringLatin1.newString(value, beginIndex, subLen)
: StringUTF16.newString(value, beginIndex, subLen);
}

public String substring(int beginIndex, int endIndex) {
int length = length();
checkBoundsBeginEnd(beginIndex, endIndex, length);
int subLen = endIndex - beginIndex;
if (beginIndex == 0 && endIndex == length) {
return this;
}
return isLatin1() ? StringLatin1.newString(value, beginIndex, subLen)
: StringUTF16.newString(value, beginIndex, subLen);
}
  1. 将String中的指定内容进行替换,需要注意的是,由于String不可变,这种”替换”实际上是新建了一个字符串进行返回,而非改变原来的字符串。
1
2
3
4
5
6
7
8
9
10
public String replace(char oldChar, char newChar) {
if (oldChar != newChar) {
String ret = isLatin1() ? StringLatin1.replace(value, oldChar, newChar)
: StringUTF16.replace(value, oldChar, newChar);
if (ret != null) {
return ret;
}
}
return this;
}

AbstractStringBuilder类

​ 分析源码发现,StringBuilder类和StringBuffer类都继承自AbstractStringBuilder父类,故先分析父类的结构和主要功能。

主要结构

​ 与String类一样,使用byte数组来存储字符串内容。增加count变量记录字符串的实际长度。

​ 另一个与String不同的点在于,AbstractStringBuilder是一个可变的类。

1
2
3
4
5
6
7
8
abstract class AbstractStringBuilder implements Appendable, CharSequence { 
byte[] value;
byte coder;
int count;

...

}

构造函数

​ 有参构造通过传入字符数组的容量作为参数,构造指定容量的字符数组。

1
2
3
4
5
6
7
8
9
10
11
12
13
AbstractStringBuilder() {
value = EMPTYVALUE;
}

AbstractStringBuilder(int capacity) {
if (COMPACT_STRINGS) {
value = new byte[capacity];
coder = LATIN1;
} else {
value = StringUTF16.newBytesFor(capacity);
coder = UTF16;
}
}

常用方法

  1. 字符串间比较
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
int compareTo(AbstractStringBuilder another) {
if (this == another) {
return 0;
}

byte val1[] = value;
byte val2[] = another.value;
int count1 = this.count;
int count2 = another.count;

if (coder == another.coder) {
return isLatin1() ? StringLatin1.compareTo(val1, val2, count1, count2)
: StringUTF16.compareTo(val1, val2, count1, count2);
}
return isLatin1() ? StringLatin1.compareToUTF16(val1, val2, count1, count2)
: StringUTF16.compareToLatin1(val1, val2, count1, count2);
}
  1. 获取字符串长度,由于count变量被用于记录字符串长度,因此只要返回count值即可
1
2
3
public int length() {
return count;
}
  1. 由于构造的字符串可变,源码中提供改变字符串所需的函数
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
//用于保证容量至少与给定的最小值相等
public void ensureCapacity(int minimumCapacity) {
if (minimumCapacity > 0) {
ensureCapacityInternal(minimumCapacity);
}
}
//用于扩大字符串容量
private int newCapacity(int minCapacity) {
// overflow-conscious code
int oldCapacity = value.length >> coder;
int newCapacity = (oldCapacity << 1) + 2;
if (newCapacity - minCapacity < 0) {
newCapacity = minCapacity;
}
int SAFE_BOUND = MAX_ARRAY_SIZE >> coder;
return (newCapacity <= 0 || SAFE_BOUND - newCapacity < 0)
? hugeCapacity(minCapacity)
: newCapacity;
}
//用于将容量减小至已用容量
public void trimToSize() {
int length = count << coder;
if (length < value.length) {
value = Arrays.copyOf(value, length);
}
}
//设置字符串的长度
public void setLength(int newLength) {
if (newLength < 0) {
throw new StringIndexOutOfBoundsException(newLength);
}
ensureCapacityInternal(newLength);
if (count < newLength) {
if (isLatin1()) {
StringLatin1.fillNull(value, count, newLength);
} else {
StringUTF16.fillNull(value, count, newLength);
}
}
count = newLength;
}
  1. 返回某一下标处的字符
1
2
3
4
5
6
7
public char charAt(int index) {
checkIndex(index, count);
if (isLatin1()) {
return (char)(value[index] & 0xff);
}
return StringUTF16.charAt(value, index);
}
  1. 改变某一下标处的字符
1
2
3
4
5
6
7
8
9
10
11
public void setCharAt(int index, char ch) {
checkIndex(index, count);
if (isLatin1() && StringLatin1.canEncode(ch)) {
value[index] = (byte)ch;
} else {
if (isLatin1()) {
inflate();
}
StringUTF16.putCharSB(value, index, ch);
}
}
  1. 在现有字符串后面添加新的字符串
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
public AbstractStringBuilder append(String str) {
if (str == null) {
return appendNull();
}
int len = str.length();
ensureCapacityInternal(count + len);
putStringAt(count, str);
count += len;
return this;
}

//各种形式的重载
public AbstractStringBuilder append(Object obj)
public AbstractStringBuilder append(StringBuffer sb)
AbstractStringBuilder append(AbstractStringBuilder asb)
public AbstractStringBuilder append(CharSequence s)
public AbstractStringBuilder append(CharSequence s, int start, int end)
public AbstractStringBuilder append(char[] str)
public AbstractStringBuilder append(char str[], int offset, int len)
public AbstractStringBuilder append(boolean b)
public AbstractStringBuilder append(char c)
public AbstractStringBuilder append(int i)
public AbstractStringBuilder append(long l)
public AbstractStringBuilder append(float f)
public AbstractStringBuilder append(double d)
  1. 替换,此处的替换与String类中的不同,并非返回新的字符串,而是在原有字符串基础上进行改变并返回
1
2
3
4
5
6
7
8
9
10
11
12
13
14
public AbstractStringBuilder replace(int start, int end, String str) {
int count = this.count;
if (end > count) {
end = count;
}
checkRangeSIOOBE(start, end, count);
int len = str.length();
int newCount = count + len - (end - start);
ensureCapacityInternal(newCount);
shift(end, newCount - count);
this.count = newCount;
putStringAt(start, str);
return this;
}
  1. 插入/删除
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
public AbstractStringBuilder insert(int index, char[] str, int offset,
int len)
{
checkOffset(index, count);
checkRangeSIOOBE(offset, offset + len, str.length);
ensureCapacityInternal(count + len);
shift(index, len);
count += len;
putCharsAt(index, str, offset, offset + len);
return this;
}
//一些重载
public AbstractStringBuilder insert(int offset, Object obj)
public AbstractStringBuilder insert(int offset, String str)
public AbstractStringBuilder insert(int offset, char[] str)
public AbstractStringBuilder insert(int dstOffset, CharSequence s)
public AbstractStringBuilder insert(int dstOffset, CharSequence s, int start, int end)
public AbstractStringBuilder insert(int offset, boolean b)
public AbstractStringBuilder insert(int offset, char c)
public AbstractStringBuilder insert(int offset, int i)
public AbstractStringBuilder insert(int offset, long l)
public AbstractStringBuilder insert(int offset, float f)
public AbstractStringBuilder insert(int offset, double d)
  1. 返回子串
1
2
3
4
5
6
7
8
9
10
11
public String substring(int start) {
return substring(start, count);
}

public String substring(int start, int end) {
checkRangeSIOOBE(start, end, count);
if (isLatin1()) {
return StringLatin1.newString(value, start, end - start);
}
return StringUTF16.newString(value, start, end - start);
}

StringBuilder类

主要结构

​ StringBuilder类继承于AbstractStringBuilder,不同于String,是一个可变类。其主要结构如下:

1
2
3
4
5
6
7
public final class StringBuilder
extends AbstractStringBuilder
implements java.io.Serializable, Comparable<StringBuilder>, CharSequence
{
static final long serialVersionUID = 4383685877147921099L;
...
}

构造函数

​ StringBuilder的构造函数大多调用父类的接口完成。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
public StringBuilder() {
super(16);
}

public StringBuilder(int capacity) {
super(capacity);
}

public StringBuilder(String str) {
super(str.length() + 16);
append(str);
}

public StringBuilder(CharSequence seq) {
this(seq.length() + 16);
append(seq);
}

常用方法

​ StringBuilder的方法(append、insert等)大多调用父类接口,因此代码很短,实现较为方便。这里不再赘述其代码实现方法。

StringBuffer类

主要结构

​ 和StringBuilder一样,StringBuffer也继承于AbstractStringBuilder类,也是可变类。

1
2
3
4
5
6
7
8
public final class StringBuffer
extends AbstractStringBuilder
implements java.io.Serializable, Comparable<StringBuffer>, CharSequence
{
private transient String toStringCache;
static final long serialVersionUID = 3388685877147921107L;
...
}

构造函数

​ StringBuffer的构造函数形式和StringBuilder几乎完全一样,都是以继承父类为主

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
public StringBuffer() {
super(16);
}

public StringBuffer(int capacity) {
super(capacity);
}

public StringBuffer(String str) {
super(str.length() + 16);
append(str);
}

public StringBuffer(CharSequence seq) {
this(seq.length() + 16);
append(seq);
}

常用方法

​ StringBuffer与StringBuilder的主要方法结构都非常相似,不同点在于StringBuffer的方法前都有synchronized关键字作为修饰。即,StringBuffer类的方法每次只有一个线程可以访问,实现了线程安全。

String、StringBuffer、StringBuilder的对比

​ 在分析完源代码之后,可以得出三者的主要区别如下:

​ String为不可变类,即其值为一个常量,不可以被改变。其所在的内存区域为常量池。

​ StringBuilder为可变类,其值可以通过已定义的函数接口发生改变,分配的内存区域为堆。

​ StringBuffer为可变类,定义的方法每次只能由一个线程访问,实现线程安全,分配的内存区域为堆。

​ 三者都不可以被继承。

​ 由于三者结构和功能存在区别,它们适合的场景也有一定区别。

​ 由于每次对String的值进行改变时(例如连接、替换等),JVM将会生成一个新的字符串,将原来String的名字链接到新的字符串上,并且回收原有字符串。而在对StringBuffer和StringBuilder进行操作时,只是简单改变其自身的值。这导致了改变字符串时,String的性能和效率显著低于另外两个类。

​ 因此,String适合的场景为,对少量字符串进行操作,并且操作较少的情况。StringBuffer和StringBuilder比较适合对字符串进行复杂操作的场景。其中StringBuffer由于实现线程安全,比较适合在多线程的场景中使用,而StringBuilder在单线程场景中使用效率会比较高。

思考题

1
2
3
4
5
String s1 = "Welcome to Java";
String s2 = new String("Welcome to Java");
String s3 = "Welcome to Java";
System.out.println("s1 == s2 is " + (s1 == s2));
System.out.println("s1 == s3 is " + (s1 == s3));

Q: 为什么s1\==s2 返回false,而s1==s3返回true?

A: 由于String的对象存储在常量池中,s1在常量池中对应”Welcome to Java”,创建s2时由于使用new,创建了一个新的常量,与s1所对应常量不为同一个,故返回false。而s3被直接对应到常量池中”Welcome to Java”,与s1对应常量相同,故返回true。