Android Baidu Text To Speech

Text To Speech (TTS) converts text into human-like speech. Baidu TTS is a free TTS SDK,we use this sdk to develop TTS app.

Step 1: Create Application

First, registered account at Baidu.
After that,log in to the Baidu Voice Developer Platform and create an application.

choose speech technology

choose create application

input information,

package name must be match the package name of your app.

Step 2: Download SDK and library

Check download sdk button in left hand side, then choose speech synthesis and download SDK for android.

Unzip sdk than copy Baidu-TTS-Android-2.3.5.20180713_6101c2a/app/src/main/jniLibs/armeabi folder into your project jniLibs folder.

And copy Baidu-TTS-Android-2.3.5.20180713_6101c2a/app/libs/com.baidu.tts_2.3.2.jar into your project libs folder.

Step 3: Import jar

in build.gradle, add as follow

build.gradle
1
2
3
4
dependencies {
...
compile files('libs/com.baidu.tts_2.3.2.jar')
}

Step 4: Init TTS

init SpeechSynthesizer

1
2
3
4
private SpeechSynthesizer mSpeechSynthesizer;

mSpeechSynthesizer = SpeechSynthesizer.getInstance();
mSpeechSynthesizer.setContext(this);

setup TTS Listener

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
public class MainActivity extends AppCompatActivity implements SpeechSynthesizerListener {

private void initTTS() {
...
mSpeechSynthesizer.setSpeechSynthesizerListener(this);
}

@Override
public void onSynthesizeStart(String s) { }

@Override
public void onSynthesizeDataArrived(String s, byte[] bytes, int i) { }

@Override
public void onSynthesizeFinish(String s) { }

@Override
public void onSpeechStart(String s) { }

@Override
public void onSpeechProgressChanged(String s, int i) { }

@Override
public void onSpeechFinish(String s) { }

@Override
public void onError(String s, SpeechError speechError) { }
}

Step 5: set AppId, AppKey 和 AppSecretKey

In Baidu website, this application is built, you will get AppId, AppKey and AppSecretKey.

1
2
int result = mSpeechSynthesizer.setAppId(appId);
result = mSpeechSynthesizer.setApiKey(appKey, secretKey);

Step 6: Verify and Download authorized file

TtsMode.ONLINE : pure online, download authorized file automatically.
TtsMode.MIX : From online fusion, online priority;

1
2
3
4
5
6
7
8
9
10
private boolean checkAuth() {
AuthInfo authInfo = mSpeechSynthesizer.auth(ttsMode);
if (!authInfo.isSuccess()) {
String errorMsg = authInfo.getTtsError().getDetailMessage();
return false;
} else {
Log.i(TAG, "checkAuth success!");
return true;
}
}

Step 7: Import TTS Model

copy Baidu-TTS-Android-2.3.5.20180713_6101c2a/app/src/main/assets into your_project/src/main/assets

Before using TTS, copy model files into sd card folder.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
private String MODEL_FILENAME ;
private String TEXT_FILENAME ;
private static final String SPEECH_FEMALE_MODEL_NAME = "bd_etts_common_speech_f7_mand_eng_high_am-mix_v3.0.0_20170512.dat";
private static final String TEXT_MODEL_NAME = "bd_etts_text.dat";

@Override
protected void onResume() {
super.onResume();
copyModelFileToSD();
initTTS();
}

private void copyModelFileToSD() {
String folder = MainActivity.this.getFilesDir().getAbsolutePath();
MODEL_FILENAME = folder + "/" + TEXT_MODEL_NAME;
TEXT_FILENAME = folder + "/" + SPEECH_FEMALE_MODEL_NAME;

InputStream is = null;
FileOutputStream fos = null;
try {
Context context = MainActivity.this.getApplicationContext();
File textFile = new File(TEXT_FILENAME);
File modelFile = new File(MODEL_FILENAME);

if (!textFile.exists()) {
textFile.createNewFile();
is = context.getAssets().open(TEXT_MODEL_NAME);
fos = new FileOutputStream(textFile);

copyFile(is, fos);
} else {
//ignore
}

if (!modelFile.exists()) {
modelFile.createNewFile();
is = context.getAssets().open(SPEECH_FEMALE_MODEL_NAME);
fos = new FileOutputStream(modelFile);

copyFile(is, fos);
} else {
//ignore
}
} catch (IOException e) {
Log.e(TAG, "Error: " + e.toString());
} finally {
closeObject(is);
closeObject(fos);
}
}

private void copyFile(InputStream is, FileOutputStream fos) throws IOException {
byte[] buffer = new byte[2048];
int byteCount = 0;
while((byteCount=is.read(buffer))!=-1) {
fos.write(buffer, 0, byteCount);
}
fos.flush();
}

private void closeObject(Closeable obj) {
try {
if (null != obj) {
obj.close();
}
} catch (IOException e) {
Log.e(TAG, "Error: " + e.toString());
}
}

After check authorization, setup parameters for speach model.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
private void setupParam() {
mSpeechSynthesizer.setParam(SpeechSynthesizer.PARAM_TTS_TEXT_MODEL_FILE, TEXT_FILENAME);
mSpeechSynthesizer.setParam(SpeechSynthesizer.PARAM_TTS_SPEECH_MODEL_FILE, MODEL_FILENAME);

//Voice type, 0 : female, 1:male, 2:Speical male, 3:
emotion male, 4:child
mSpeechSynthesizer.setParam(SpeechSynthesizer.PARAM_SPEAKER, "0");
//Volumn: 0 ~ 9
mSpeechSynthesizer.setParam(SpeechSynthesizer.PARAM_VOLUME, "9");
//Speed: 0~9
mSpeechSynthesizer.setParam(SpeechSynthesizer.PARAM_SPEED, "4");
//Pitch: 0 ~ 9
mSpeechSynthesizer.setParam(SpeechSynthesizer.PARAM_PITCH, "4");
//Request Mode
mSpeechSynthesizer.setParam(SpeechSynthesizer.PARAM_MIX_MODE, SpeechSynthesizer.MIX_MODE_DEFAULT);
}

MIX_MODE_DEFAULT: If wifi connection, using TtsMode.ONLINE, else using TtsMode.MIX. In TtsMode.ONLINE, if request time more than 6 second, it will change to TtsMode.MIX mode automatically.

MIX_MODE_HIGH_SPEED_SYNTHESIZE_WIFI: the same as MIX_MODE_DEFAULT,but request time more than 1.2 second, it will change to TtsMode.MIX mode automatically.

MIX_MODE_HIGH_SPEED_NETWORK: can use 3G/4G or Wifi, but request time more than 1.2 second, it will change to TtsMode.MIX mode automatically.

Initialize TTS flow as follow

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
private void initTTS() {
mSpeechSynthesizer = SpeechSynthesizer.getInstance();
mSpeechSynthesizer.setContext(this);
mSpeechSynthesizer.setSpeechSynthesizerListener(this);

int result = mSpeechSynthesizer.setAppId(appId);
result = mSpeechSynthesizer.setApiKey(appKey, secretKey);
if (!checkAuth()) {
return;
}

setupParam();
result = mSpeechSynthesizer.loadModel(TEXT_FILENAME, MODEL_FILENAME);
result = mSpeechSynthesizer.initTts(TtsMode.MIX);
if (result != 0) {
Log.e(TAG, "init failed");
} else {
Log.e(TAG, "init success");
}
}

Step 8: Speech

If speach text immediately, you can use speak api.
Using synthesize api can synthesis text, than using speak api to read out.

1
2
String text = "test baidu TTS";
mSpeechSynthesizer.speak(text);
Author

Nick Lin

Posted on

2019-04-12

Updated on

2023-01-18

Licensed under


Comments