Turian, J.; Ratinov, L.; and Bengio, Y. 2010. Word representations: a simple and general method for semi-supervised learning[C]. In ACL2010.
 Melamud O, Levy O, Dagan I, et al. A simple word embedding model for lexical substitution[C].Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing. 2015: 1-7.
 Palangi H, Deng L, Shen Y, et al. Deep sentence embedding using long short-term memory networks: Analysis and application to information retrieval[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 2016, 24(4): 694-707.
 Yoshua Bengio, Réjean Ducharme, and Pascal Vincent. A neural probabilistic language model[C]. In NIPS2001.
 Yoshua Bengio, Réjean Ducharme, et al.. A Neural Probabilistic Language Model[J]. The Journal of Machine Learning Research, 3:1137–1155, 2003.
 Tomas Mikolov, Ilya Sutskever, et al. Distributed representations of words and phrases and their compositionality[C]. In NIPS2013.
 Matthew E. Peters, Mark Neumann, et al. Deep contextualized word representations[C]. In NAACL2018.
 Gottlob Frege. On Sense and Reference. Function - Term - Meaning, 1892.
 Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom. A convolutional neural network for modelling sentences[C]. In ACL2014.
 Yoon Kim. Convolutional neural networks for sentence classification[C]. In EMNLP2014.
 Richard Socher, Eric H Huang, et al.. Dynamic pooling and unfolding recursive autoencoders for paraphrase detection[C]. In NIPS2011.
 Richard Socher, Alex Perelygin, et al. Recursive deep models for semantic compositionality over a sentiment treebank[C]. In EMNLP13.
 Ilya Sutskever, Oriol Vinyals, Quoc V. Le, Sequence to Sequence Learning with Neural Networks[C] In NIPS2014.
 Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio. Neural Machine Translation by Jointly Learning to Align and Translate[C]. In ICLR2015.
 Ashish Vaswani, Noam Shazeer, et al. Attention is All You Need[C]. In NIPS2017.
 Alec Radford, Karthik Narasimhan, et al. Improving Language Understanding by Generative Pre-Training.
 Jacob Devlin, Ming-Wei Chang, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805, 2018.
 Hanqing Tao, Shiwei Tong, et al. A Radical-aware Attention-based Model for Chinese Text Classification[C], In AAAI2019.
 Kun Zhang, Guangyi Lv, et al. Image-Enhanced Multi-Level Sentence Representation Net for Natural Language Inference[C], In ICDM2018.
 Guangyi Lv, Tong Xu, et al. Reading the Videos: Temporal Labeling for Crowdsourced Time-Sync Videos based on Semantic Embedding[C], In AAAI2016.