...ictionary=DEFAULT_DICT) 新建自定义分词器，可用于同时使用不同词典。jieba.dt 为默认分词器，所有全局分词相关函数都是该分词器的映射。代码示例 encoding=utf-8import jiebajieba.enable_paddle() 启动paddle模式。 0.40版之后开始支持，早期版本不支持strs=["我来到北京清华大学","乒乓球拍卖完了","中国科学技术大学"]for str in strs:seg_list = jieba.cut(str,use_paddle=True) 使用paddle模式print("Paddle Mode: " + '/'.join(list(seg_list)))seg_list = jieba.cut("我来到北京清华大学", cut_all=True)print("Full Mode: " + "/ ".join(seg_list)) 全模式seg_list = jieba.cut("我来到北京清华大学", cut_all=False)print("Default Mode: " + "/ ".join(seg_list)) 精确模式seg_list = jieba.cut("他来到了网易杭研大厦") 默认是精确模式print(", ".join(seg_list))seg_list = jieba.cut_for_search("小明硕士毕业于中国科学院计算所，后在日本京都大学深造") 搜索引擎模式print(", ".join(seg_list)) 输出: 【全模式】: 我/ 来到/ 北京/ 清华/ 清华大学/ 华大/ 大学【精确模式】: 我/ 来到/ 北京/ 清华大学【新词识别】：他, 来到, 了, 网易, 杭研, 大厦 (此处，“杭研”并没有在词典中，但是也被Viterbi算法识别出来了)【搜索引擎模式】：小明, 硕士, 毕业, 于, 中国, 科学, 学院, 科学院, 中国科学院, 计算, 计算所, 后, 在, 日本, 京都, 大学, 日本京都大学, 深造添加自定义词典载入词典开发者可以指定自己自定义的词典，以便包含 jieba 词库里没有的词。虽然 jieba 有新词识别能力，但是自行添加新词可以保证更高的正确率用法： jieba.load_userdict(file_name) file_name 为文件类对象或自定义词典的路径词典格式和 dict.txt 一样，一个词占一行；每一行分三部分：词语、词频（可省略）、词性（可省略），用空格隔开，顺序不可颠倒。file_name 若为路径或二进制方式打开的文件，则文件必须为 UTF-8 编码。词频省略时使用自动计算的能保证分出该词的词频。例如：创新办 3 i云计算 5凱特琳 nz台中更改分词器（默认为 jieba.dt）的 tmp_dir 和 cache_file 属性，可分别指定缓存文件所在的文件夹及其文件名，用于受限的文件系统。范例：自定义词典：https://github.com/fxsjy/jieba/blob/master/test/userdict.txt 用法示例：https://github.com/fxsjy/jieba/blob/master/test/test_userdict.py 之前：李小福 / 是 / 创新 / 办 / 主任 / 也 / 是 / 云 / 计算 / 方面 / 的 / 专家 / 加载自定义词库后：　李小福 / 是 / 创新办 / 主任 / 也 / 是 / 云计算 / 方面 / 的 / 专家 / 调整词典使用 add_word(word, freq=None, tag=None) 和 del_word(word) 可在程序中动态修改词典。使用 suggest_freq(segment, tune=True) 可调节单个词语的词频，使其能（或不能）被分出来。注意：自动计算的词频在使用 HMM 新词发现功能时可能无效。代码示例： >>> print('/'.join(jieba.cut('如果放到post中将出错。', HMM=False)))如果/放到/post/中将/出错/。>>> jieba.suggest_freq(('中', '将'), True)494>>> print('/'.join(jieba.cut('如果放到post中将出错。', HMM=False)))如果/放到/post/中/将/出错/。>>> print('/'.join(jieba.cut('「台中」正确应该不会被切开', HMM=False)))「/台/中/」/正确/应该/不会/被/切开>>> jieba.suggest_freq('台中', True)69>>> print('/'.join(jieba.cut('「台中」正确应该不会被切开', HMM=False)))「/台中/」/正确/应该/不会/被/切开 “通过用户自定义词典来增强歧义纠错能力” — https://github.com/fxsjy/jieba/issues/14 关键词提取基于 TF-IDF 算法的关键词抽取 import jieba.analyse jieba.analyse.extract_tags(sentence, topK=20, withWeight=False, allowPOS=()) sentence 为待提取的文本 topK 为返回几个 TF/IDF 权重最大的关键词，默认值为 20 withWeight 为是否一并返回关键词权重值，默认值为 False allowPOS 仅包括指定词性的词，默认值为空，即不筛选 jieba.analyse.TFIDF(idf_path=None) 新建 TFIDF 实例，idf_path 为 IDF 频率文件代码示例（关键词提取） https://github.com/fxsjy/jieba/blob/master/test/extract_tags.py 关键词提取所使用逆向文件频率（IDF）文本语料库可以切换成自定义语料库的路径用法： jieba.analyse.set_idf_path(file_name) file_name为自定义语料库的路径自定义语料库示例：https://github.com/fxsjy/jieba/blob/master/extra_dict/idf.txt.big 用法示例：https://github.com/fxsjy/jieba/blob/master/test/extract_tags_idfpath.py 关键词提取所使用停止词（Stop Words）文本语料库可以切换成自定义语料库的路径用法： jieba.analyse.set_stop_words(file_name) file_name为自定义语料库的路径自定义语料库示例：https://github.com/fxsjy/jieba/blob/master/extra_dict/stop_words.txt 用法示例：https://github.com/fxsjy/jieba/blob/master/test/extract_tags_stop_words.py 关键词一并返回关键词权重值示例用法示例：https://github.com/fxsjy/jieba/blob/master/test/extract_tags_with_weight.py 基于 TextRank 算法的关键词抽取 jieba.analyse.textrank(sentence, topK=20, withWeight=False, allowPOS=(‘ns’, ‘n’, ‘vn’, ‘v’)) 直接使用，接口相同，注意默认过滤词性。 jieba.analyse.TextRank() 新建自定义 TextRank 实例算法论文： TextRank: Bringing Order into Texts 基本思想: 将待抽取关键词的文本进行分词以固定窗口大小(默认为5，通过span属性调整)，词之间的共现关系，构建图计算图中节点的PageRank，注意是无向带权图使用示例: 见 test/demo.py 词性标注 jieba.posseg.POSTokenizer(tokenizer=None) 新建自定义分词器，tokenizer 参数可指定内部使用的 jieba.Tokenizer 分词器。jieba.posseg.dt 为默认词性标注分词器。标注句子分词后每个词的词性，采用和 ictclas 兼容的标记法。除了jieba默认分词模式，提供paddle模式下的词性标注功能。paddle模式采用延迟加载方式，通过enable_paddle()安装paddlepaddle-tiny，并且import相关代码；用法示例 >>> import jieba>>> import jieba.posseg as pseg>>> words = pseg.cut("我爱北京天安门") jieba默认模式>>> jieba.enable_paddle() 启动paddle模式。 0.40版之后开始支持，早期版本不支持>>> words = pseg.cut("我爱北京天安门",use_paddle=True) paddle模式>>> for word, flag in words:... print('%s %s' % (word, flag))...我 r爱 v北京 ns天安门 ns paddle模式词性标注对应表如下： paddle模式词性和专名类别标签集合如下表，其中词性标签 24 个（小写字母），专名类别标签 4 个（大写字母）。标签含义标签含义标签含义标签含义 n 普通名词 f 方位名词 s 处所名词 t 时间 nr 人名 ns 地名 nt 机构名 nw 作品名 nz 其他专名 v 普通动词 vd 动副词 vn 名动词 a 形容词 ad 副形词 an 名形词 d 副词 m 数量词 q 量词 r 代词 p 介词 c 连词 u 助词 xc 其他虚词 w 标点符号 PER 人名 LOC 地名 ORG 机构名 TIME 时间并行分词原理：将目标文本按行分隔后，把各行文本分配到多个 Python 进程并行分词，然后归并结果，从而获得分词速度的可观提升基于 python 自带的 multiprocessing 模块，目前暂不支持 Windows 用法： jieba.enable_parallel(4) 开启并行分词模式，参数为并行进程数 jieba.disable_parallel() 关闭并行分词模式例子：https://github.com/fxsjy/jieba/blob/master/test/parallel/test_file.py 实验结果：在 4 核 3.4GHz Linux 机器上，对金庸全集进行精确分词，获得了 1MB/s 的速度，是单进程版的 3.3 倍。注意：并行分词仅支持默认分词器 jieba.dt 和 jieba.posseg.dt。 Tokenize：返回词语在原文的起止位置注意，输入参数只接受 unicode 默认模式 result = jieba.tokenize(u'永和服装饰品有限公司')for tk in result:print("word %s\t\t start: %d \t\t end:%d" % (tk[0],tk[1],tk[2])) word 永和 start: 0 end:2word 服装 start: 2 end:4word 饰品 start: 4 end:6word 有限公司 start: 6 end:10 搜索模式 result = jieba.tokenize(u'永和服装饰品有限公司', mode='search')for tk in result:print("word %s\t\t start: %d \t\t end:%d" % (tk[0],tk[1],tk[2])) word 永和 start: 0 end:2word 服装 start: 2 end:4word 饰品 start: 4 end:6word 有限 start: 6 end:8word 公司 start: 8 end:10word 有限公司 start: 6 end:10 ChineseAnalyzer for Whoosh 搜索引擎引用： from jieba.analyse import ChineseAnalyzer 用法示例：https://github.com/fxsjy/jieba/blob/master/test/test_whoosh.py 命令行分词使用示例：python -m jieba news.txt > cut_result.txt 命令行选项（翻译）：使用: python -m jieba [options] filename结巴命令行界面。固定参数:filename 输入文件可选参数:-h, --help 显示此帮助信息并退出-d [DELIM], --delimiter [DELIM]使用 DELIM 分隔词语，而不是用默认的' / '。若不指定 DELIM，则使用一个空格分隔。-p [DELIM], --pos [DELIM]启用词性标注；如果指定 DELIM，词语和词性之间用它分隔，否则用 _ 分隔-D DICT, --dict DICT 使用 DICT 代替默认词典-u USER_DICT, --user-dict USER_DICT使用 USER_DICT 作为附加词典，与默认词典或自定义词典配合使用-a, --cut-all 全模式分词（不支持词性标注）-n, --no-hmm 不使用隐含马尔可夫模型-q, --quiet 不输出载入信息到 STDERR-V, --version 显示版本信息并退出如果没有指定文件名，则使用标准输入。 --help 选项输出： $> python -m jieba --helpJieba command line interface.positional arguments:filename input fileoptional arguments:-h, --help show this help message and exit-d [DELIM], --delimiter [DELIM]use DELIM instead of ' / ' for word delimiter; or aspace if it is used without DELIM-p [DELIM], --pos [DELIM]enable POS tagging; if DELIM is specified, use DELIMinstead of '_' for POS delimiter-D DICT, --dict DICT use DICT as dictionary-u USER_DICT, --user-dict USER_DICTuse USER_DICT together with the default dictionary orDICT (if specified)-a, --cut-all full pattern cutting (ignored with POS tagging)-n, --no-hmm don't use the Hidden Markov Model-q, --quiet don't print loading messages to stderr-V, --version show program's version number and exitIf no filename specified, use STDIN instead. 延迟加载机制 jieba 采用延迟加载，import jieba 和 jieba.Tokenizer() 不会立即触发词典的加载，一旦有必要才开始加载词典构建前缀字典。如果你想手工初始 jieba，也可以手动初始化。 import jiebajieba.initialize() 手动初始化（可选）在 0.28 之前的版本是不能指定主词典的路径的，有了延迟加载机制后，你可以改变主词典的路径: jieba.set_dictionary('data/dict.txt.big') 例子： https://github.com/fxsjy/jieba/blob/master/test/test_change_dictpath.py 其他词典占用内存较小的词典文件 https://github.com/fxsjy/jieba/raw/master/extra_dict/dict.txt.small 支持繁体分词更好的词典文件 https://github.com/fxsjy/jieba/raw/master/extra_dict/dict.txt.big 下载你所需要的词典，然后覆盖 jieba/dict.txt 即可；或者用 jieba.set_dictionary('data/dict.txt.big') 其他语言实现结巴分词 Java 版本作者：piaolingxue 地址：https://github.com/huaban/jieba-analysis 结巴分词 C++ 版本作者：yanyiwu 地址：https://github.com/yanyiwu/cppjieba 结巴分词 Rust 版本作者：messense, MnO2 地址：https://github.com/messense/jieba-rs 结巴分词 Node.js 版本作者：yanyiwu 地址：https://github.com/yanyiwu/nodejieba 结巴分词 Erlang 版本作者：falood 地址：https://github.com/falood/exjieba 结巴分词 R 版本作者：qinwf 地址：https://github.com/qinwf/jiebaR 结巴分词 iOS 版本作者：yanyiwu 地址：https://github.com/yanyiwu/iosjieba 结巴分词 PHP 版本作者：fukuball 地址：https://github.com/fukuball/jieba-php 结巴分词 .NET(C) 版本作者：anderscui 地址：https://github.com/anderscui/jieba.NET/ 结巴分词 Go 版本作者: wangbin 地址: https://github.com/wangbin/jiebago 作者: yanyiwu 地址: https://github.com/yanyiwu/gojieba 结巴分词Android版本作者 Dongliang.W 地址：https://github.com/452896915/jieba-android 友情链接 https://github.com/baidu/lac 百度中文词法分析（分词+词性+专名）系统 https://github.com/baidu/AnyQ 百度FAQ自动问答系统 https://github.com/baidu/Senta 百度情感识别系统系统集成 Solr: https://github.com/sing1ee/jieba-solr 分词速度 1.5 MB / Second in Full Mode 400 KB / Second in Default Mode 测试环境: Intel® Core™ i7-2600 CPU @ 3.4GHz；《围城》.txt 常见问题 1. 模型的数据是如何生成的？详见： https://github.com/fxsjy/jieba/issues/7 2. “台中”总是被切成“台中”？（以及类似情况） P(台中) ＜ P(台)×P(中)，“台中”词频不够导致其成词概率较低解决方法：强制调高词频 jieba.add_word('台中') 或者 jieba.suggest_freq('台中', True) 3. “今天天气不错”应该被切成“今天天气不错”？（以及类似情况）解决方法：强制调低词频 jieba.suggest_freq(('今天', '天气'), True) 或者直接删除该词 jieba.del_word('今天天气') 4. 切出了词典中没有的词语，效果不理想？解决方法：关闭新词发现 jieba.cut('丰田太省了', HMM=False) jieba.cut('我们中出了一个叛徒', HMM=False) 更多问题请点击：https://github.com/fxsjy/jieba/issues?sort=updated&state=closed 修订历史 https://github.com/fxsjy/jieba/blob/master/Changelog jieba “Jieba” (Chinese for “to stutter”) Chinese text segmentation: built to be the best Python Chinese word segmentation module. Features Support three types of segmentation mode: Accurate Mode attempts to cut the sentence into the most accurate segmentations, which is suitable for text analysis. Full Mode gets all the possible words from the sentence. Fast but not accurate. Search Engine Mode, based on the Accurate Mode, attempts to cut long words into several short words, which can raise the recall rate. Suitable for search engines. Supports Traditional Chinese Supports customized dictionaries MIT License Online demo http://jiebademo.ap01.aws.af.cm/ (Powered by Appfog) Usage Fully automatic installation: easy_install jieba or pip install jieba Semi-automatic installation: Download http://pypi.python.org/pypi/jieba/ , run python setup.py install after extracting. Manual installation: place the jieba directory in the current directory or python site-packages directory. import jieba. Algorithm Based on a prefix dictionary structure to achieve efficient word graph scanning. Build a directed acyclic graph (DAG) for all possible word combinations. Use dynamic programming to find the most probable combination based on the word frequency. For unknown words, a HMM-based model is used with the Viterbi algorithm. Main Functions Cut The jieba.cut function accepts three input parameters: the first parameter is the string to be cut; the second parameter is cut_all, controlling the cut mode; the third parameter is to control whether to use the Hidden Markov Model. jieba.cut_for_search accepts two parameter: the string to be cut; whether to use the Hidden Markov Model. This will cut the sentence into short words suitable for search engines. The input string can be an unicode/str object, or a str/bytes object which is encoded in UTF-8 or GBK. Note that using GBK encoding is not recommended because it may be unexpectly decoded as UTF-8. jieba.cut and jieba.cut_for_search returns an generator, from which you can use a for loop to get the segmentation result (in unicode). jieba.lcut and jieba.lcut_for_search returns a list. jieba.Tokenizer(dictionary=DEFAULT_DICT) creates a new customized Tokenizer, which enables you to use different dictionaries at the same time. jieba.dt is the default Tokenizer, to which almost all global functions are mapped. Code example: segmentation encoding=utf-8import jiebaseg_list = jieba.cut("我来到北京清华大学", cut_all=True)print("Full Mode: " + "/ ".join(seg_list)) 全模式seg_list = jieba.cut("我来到北京清华大学", cut_all=False)print("Default Mode: " + "/ ".join(seg_list)) 默认模式seg_list = jieba.cut("他来到了网易杭研大厦")print(", ".join(seg_list))seg_list = jieba.cut_for_search("小明硕士毕业于中国科学院计算所，后在日本京都大学深造") 搜索引擎模式print(", ".join(seg_list)) Output: [Full Mode]: 我/ 来到/ 北京/ 清华/ 清华大学/ 华大/ 大学[Accurate Mode]: 我/ 来到/ 北京/ 清华大学[Unknown Words Recognize] 他, 来到, 了, 网易, 杭研, 大厦 (In this case, "杭研" is not in the dictionary, but is identified by the Viterbi algorithm)[Search Engine Mode]：小明, 硕士, 毕业, 于, 中国, 科学, 学院, 科学院, 中国科学院, 计算, 计算所, 后, 在, 日本, 京都, 大学, 日本京都大学, 深造 Add a custom dictionary Load dictionary Developers can specify their own custom dictionary to be included in the jieba default dictionary. Jieba is able to identify new words, but you can add your own new words can ensure a higher accuracy. Usage： jieba.load_userdict(file_name) file_name is a file-like object or the path of the custom dictionary The dictionary format is the same as that of dict.txt: one word per line; each line is divided into three parts separated by a space: word, word frequency, POS tag. If file_name is a path or a file opened in binary mode, the dictionary must be UTF-8 encoded. The word frequency and POS tag can be omitted respectively. The word frequency will be filled with a suitable value if omitted. For example: 创新办 3 i云计算 5凱特琳 nz台中 Change a Tokenizer’s tmp_dir and cache_file to specify the path of the cache file, for using on a restricted file system. Example: 云计算 5李小福 2创新办 3[Before]：李小福 / 是 / 创新 / 办 / 主任 / 也 / 是 / 云 / 计算 / 方面 / 的 / 专家 /[After]：　李小福 / 是 / 创新办 / 主任 / 也 / 是 / 云计算 / 方面 / 的 / 专家 / Modify dictionary Use add_word(word, freq=None, tag=None) and del_word(word) to modify the dictionary dynamically in programs. Use suggest_freq(segment, tune=True) to adjust the frequency of a single word so that it can (or cannot) be segmented. Note that HMM may affect the final result. Example: >>> print('/'.join(jieba.cut('如果放到post中将出错。', HMM=False)))如果/放到/post/中将/出错/。>>> jieba.suggest_freq(('中', '将'), True)494>>> print('/'.join(jieba.cut('如果放到post中将出错。', HMM=False)))如果/放到/post/中/将/出错/。>>> print('/'.join(jieba.cut('「台中」正确应该不会被切开', HMM=False)))「/台/中/」/正确/应该/不会/被/切开>>> jieba.suggest_freq('台中', True)69>>> print('/'.join(jieba.cut('「台中」正确应该不会被切开', HMM=False)))「/台中/」/正确/应该/不会/被/切开 Keyword Extraction import jieba.analyse jieba.analyse.extract_tags(sentence, topK=20, withWeight=False, allowPOS=()) sentence: the text to be extracted topK: return how many keywords with the highest TF/IDF weights. The default value is 20 withWeight: whether return TF/IDF weights with the keywords. The default value is False allowPOS: filter words with which POSs are included. Empty for no filtering. jieba.analyse.TFIDF(idf_path=None) creates a new TFIDF instance, idf_path specifies IDF file path. Example (keyword extraction) https://github.com/fxsjy/jieba/blob/master/test/extract_tags.py Developers can specify their own custom IDF corpus in jieba keyword extraction Usage： jieba.analyse.set_idf_path(file_name) file_name is the path for the custom corpus Custom Corpus Sample：https://github.com/fxsjy/jieba/blob/master/extra_dict/idf.txt.big Sample Code：https://github.com/fxsjy/jieba/blob/master/test/extract_tags_idfpath.py Developers can specify their own custom stop words corpus in jieba keyword extraction Usage： jieba.analyse.set_stop_words(file_name) file_name is the path for the custom corpus Custom Corpus Sample：https://github.com/fxsjy/jieba/blob/master/extra_dict/stop_words.txt Sample Code：https://github.com/fxsjy/jieba/blob/master/test/extract_tags_stop_words.py There’s also a TextRank implementation available. Use: jieba.analyse.textrank(sentence, topK=20, withWeight=False, allowPOS=('ns', 'n', 'vn', 'v')) Note that it filters POS by default. jieba.analyse.TextRank() creates a new TextRank instance. Part of Speech Tagging jieba.posseg.POSTokenizer(tokenizer=None) creates a new customized Tokenizer. tokenizer specifies the jieba.Tokenizer to internally use. jieba.posseg.dt is the default POSTokenizer. Tags the POS of each word after segmentation, using labels compatible with ictclas. Example: >>> import jieba.posseg as pseg>>> words = pseg.cut("我爱北京天安门")>>> for w in words:... print('%s %s' % (w.word, w.flag))...我 r爱 v北京 ns天安门 ns Parallel Processing Principle: Split target text by line, assign the lines into multiple Python processes, and then merge the results, which is considerably faster. Based on the multiprocessing module of Python. Usage: jieba.enable_parallel(4) Enable parallel processing. The parameter is the number of processes. jieba.disable_parallel() Disable parallel processing. Example: https://github.com/fxsjy/jieba/blob/master/test/parallel/test_file.py Result: On a four-core 3.4GHz Linux machine, do accurate word segmentation on Complete Works of Jin Yong, and the speed reaches 1MB/s, which is 3.3 times faster than the single-process version. Note that parallel processing supports only default tokenizers, jieba.dt and jieba.posseg.dt. Tokenize: return words with position The input must be unicode Default mode result = jieba.tokenize(u'永和服装饰品有限公司')for tk in result:print("word %s\t\t start: %d \t\t end:%d" % (tk[0],tk[1],tk[2])) word 永和 start: 0 end:2word 服装 start: 2 end:4word 饰品 start: 4 end:6word 有限公司 start: 6 end:10 Search mode result = jieba.tokenize(u'永和服装饰品有限公司',mode='search')for tk in result:print("word %s\t\t start: %d \t\t end:%d" % (tk[0],tk[1],tk[2])) word 永和 start: 0 end:2word 服装 start: 2 end:4word 饰品 start: 4 end:6word 有限 start: 6 end:8word 公司 start: 8 end:10word 有限公司 start: 6 end:10 ChineseAnalyzer for Whoosh from jieba.analyse import ChineseAnalyzer Example: https://github.com/fxsjy/jieba/blob/master/test/test_whoosh.py Command Line Interface $> python -m jieba --helpJieba command line interface.positional arguments:filename input fileoptional arguments:-h, --help show this help message and exit-d [DELIM], --delimiter [DELIM]use DELIM instead of ' / ' for word delimiter; or aspace if it is used without DELIM-p [DELIM], --pos [DELIM]enable POS tagging; if DELIM is specified, use DELIMinstead of '_' for POS delimiter-D DICT, --dict DICT use DICT as dictionary-u USER_DICT, --user-dict USER_DICTuse USER_DICT together with the default dictionary orDICT (if specified)-a, --cut-all full pattern cutting (ignored with POS tagging)-n, --no-hmm don't use the Hidden Markov Model-q, --quiet don't print loading messages to stderr-V, --version show program's version number and exitIf no filename specified, use STDIN instead. Initialization By default, Jieba don’t build the prefix dictionary unless it’s necessary. This takes 1-3 seconds, after which it is not initialized again. If you want to initialize Jieba manually, you can call: import jiebajieba.initialize() (optional) You can also specify the dictionary (not supported before version 0.28) : jieba.set_dictionary('data/dict.txt.big') Using Other Dictionaries It is possible to use your own dictionary with Jieba, and there are also two dictionaries ready for download: A smaller dictionary for a smaller memory footprint: https://github.com/fxsjy/jieba/raw/master/extra_dict/dict.txt.small There is also a bigger dictionary that has better support for traditional Chinese (繁體): https://github.com/fxsjy/jieba/raw/master/extra_dict/dict.txt.big By default, an in-between dictionary is used, called dict.txt and included in the distribution. In either case, download the file you want, and then call jieba.set_dictionary('data/dict.txt.big') or just replace the existing dict.txt. Segmentation speed 1.5 MB / Second in Full Mode 400 KB / Second in Default Mode Test Env: Intel® Core™ i7-2600 CPU @ 3.4GHz；《围城》.txt 本篇文章为转载内容。原文链接：https://blog.csdn.net/yegeli/article/details/107246661。该文由互联网用户投稿提供，文中观点代表作者本人意见，并不代表本站的立场。作为信息平台，本站仅提供文章转载服务，并不拥有其所有权，也不对文章内容的真实性、准确性和合法性承担责任。如发现本文存在侵权、违法、违规或事实不符的情况，请及时联系我们，我们将第一时间进行核实并删除相应内容。

2023-12-02 10:38:37

500

转载

转载文章

[转载]前端三件套系例之BootStrap——BootStrap基础、 BootStrap布局

...="btn btn-default">Submit</button></form> 6.2 内联表单为 <form> 元素添加 .form-inline 类可使其内容左对齐并且表现为 inline-block 级别的控件。只适用于视口（viewport）至少在 768px 宽度时（视口宽度再小的话就会使表单折叠） 6.3 水平排列的表单通过为表单添加 .form-horizontal 类，并联合使用 Bootstrap 预置的栅格类，可以将 label 标签和控件组水平并排布局。这样做将改变 .form-group 的行为，使其表现为栅格系统中的行（row），因此就无需再额外添加 .row 了 <form class="form-horizontal"><div class="form-group"><label for="inputEmail3" class="col-sm-2 control-label">Email</label><div class="col-sm-10"><input type="email" class="form-control" id="inputEmail3" placeholder="Email"></div></div><div class="form-group"><label for="inputPassword3" class="col-sm-2 control-label">Password</label><div class="col-sm-10"><input type="password" class="form-control" id="inputPassword3" placeholder="Password"></div></div><div class="form-group"><div class="col-sm-offset-2 col-sm-10"><div class="checkbox"><label><input type="checkbox"> Remember me</label></div></div></div><div class="form-group"><div class="col-sm-offset-2 col-sm-10"><button type="submit" class="btn btn-default">Sign in</button></div></div></form> 6.4 表单控件输入框包括大部分表单控件、文本输入域控件，还支持所有 HTML5 类型的输入控件： text、password、datetime、datetime-local、date、month、time、week、number、email、url、search、tel 和 color。只有正确设置了 type 属性的输入控件才能被赋予正确的样式。文本域支持多行文本的表单控件。可根据需要改变 rows 属性。多选和单选框默认样式 <div class="checkbox"><label><input type="checkbox" value="">Option one is this and that—be sure to include why it's great</label></div><div class="checkbox disabled"><label><input type="checkbox" value="" disabled>Option two is disabled</label></div><div class="radio"><label><input type="radio" name="optionsRadios" id="optionsRadios1" value="option1" checked>Option one is this and that—be sure to include why it's great</label></div><div class="radio"><label><input type="radio" name="optionsRadios" id="optionsRadios2" value="option2">Option two can be something else and selecting it will deselect option one</label></div><div class="radio disabled"><label><input type="radio" name="optionsRadios" id="optionsRadios3" value="option3" disabled>Option three is disabled</label></div> 内联单选和多选框 <label class="checkbox-inline"><input type="checkbox" id="inlineCheckbox1" value="option1"> 1</label><label class="checkbox-inline"><input type="checkbox" id="inlineCheckbox2" value="option2"> 2</label><label class="checkbox-inline"><input type="checkbox" id="inlineCheckbox3" value="option3"> 3</label><label class="radio-inline"><input type="radio" name="inlineRadioOptions" id="inlineRadio1" value="option1"> 1</label><label class="radio-inline"><input type="radio" name="inlineRadioOptions" id="inlineRadio2" value="option2"> 2</label><label class="radio-inline"><input type="radio" name="inlineRadioOptions" id="inlineRadio3" value="option3"> 3</label> 不带文本的Checkbox 和 radio <label><input type="checkbox" id="blankCheckbox" value="option1" aria-label="..."></label></div><div class="radio"><label><input type="radio" name="blankRadio" id="blankRadio1" value="option1" aria-label="..."></label></div> 下拉列表 <select class="form-control"><option>1</option><option>2</option><option>3</option><option>4</option><option>5</option></select> 静态内容如果需要在表单中将一行纯文本和 label 元素放置于同一行，为 <p> 元素添加 .form-control-static 类即可 <form class="form-horizontal"><div class="form-group"><label class="col-sm-2 control-label">Email</label><div class="col-sm-10"><p class="form-control-static">email@example.com</p></div></div><div class="form-group"><label for="inputPassword" class="col-sm-2 control-label">Password</label><div class="col-sm-10"><input type="password" class="form-control" id="inputPassword" placeholder="Password"></div></div></form> 帮助文字 <label class="sr-only" for="inputHelpBlock">Input with help text</label><input type="text" id="inputHelpBlock" class="form-control" aria-describedby="helpBlock">...<span id="helpBlock" class="help-block">A block of help text that breaks onto a new line and may extend beyond one line.</span> 校验状态 Bootstrap 对表单控件的校验状态，如 error、warning 和 success 状态，都定义了样式。使用时，添加 .has-warning、.has-error或 .has-success 类到这些控件的父元素即可。任何包含在此元素之内的 .control-label、.form-control 和 .help-block 元素都将接受这些校验状态的样式。 <div class="form-group has-success"><label class="control-label" for="inputSuccess1">Input with success</label><input type="text" class="form-control" id="inputSuccess1" aria-describedby="helpBlock2"><span id="helpBlock2" class="help-block">A block of help text that breaks onto a new line and may extend beyond one line.</span></div><div class="form-group has-warning"><label class="control-label" for="inputWarning1">Input with warning</label><input type="text" class="form-control" id="inputWarning1"></div><div class="form-group has-error"><label class="control-label" for="inputError1">Input with error</label><input type="text" class="form-control" id="inputError1"></div><div class="has-success"><div class="checkbox"><label><input type="checkbox" id="checkboxSuccess" value="option1">Checkbox with success</label></div></div><div class="has-warning"><div class="checkbox"><label><input type="checkbox" id="checkboxWarning" value="option1">Checkbox with warning</label></div></div><div class="has-error"><div class="checkbox"><label><input type="checkbox" id="checkboxError" value="option1">Checkbox with error</label></div></div> 添加额外的图标你还可以针对校验状态为输入框添加额外的图标。只需设置相应的 .has-feedback 类并添加正确的图标即可 <div class="form-group has-success has-feedback"><label class="control-label" for="inputSuccess2">Input with success</label><input type="text" class="form-control" id="inputSuccess2" aria-describedby="inputSuccess2Status"><span class="glyphicon glyphicon-ok form-control-feedback" aria-hidden="true"></span><span id="inputSuccess2Status" class="sr-only">(success)</span></div> 控件尺寸通过 .input-lg 类似的类可以为控件设置高度，通过 .col-lg- 类似的类可以为控件设置宽度。高度尺寸创建大一些或小一些的表单控件以匹配按钮尺寸 <input class="form-control input-lg" type="text" placeholder=".input-lg"><input class="form-control" type="text" placeholder="Default input"><input class="form-control input-sm" type="text" placeholder=".input-sm"><select class="form-control input-lg">...</select><select class="form-control">...</select><select class="form-control input-sm">...</select> 水平排列的表单组的尺寸通过添加 .form-group-lg 或 .form-group-sm 类，为 .form-horizontal 包裹的 label 元素和表单控件快速设置尺寸。 <form class="form-horizontal"><div class="form-group form-group-lg"><label class="col-sm-2 control-label" for="formGroupInputLarge">Large label</label><div class="col-sm-10"><input class="form-control" type="text" id="formGroupInputLarge" placeholder="Large input"></div></div><div class="form-group form-group-sm"><label class="col-sm-2 control-label" for="formGroupInputSmall">Small label</label><div class="col-sm-10"><input class="form-control" type="text" id="formGroupInputSmall" placeholder="Small input"></div></div></form> 7 按钮 7.1 可作为按钮使用的标签或元素为 <a>、<button> 或 <input> 元素添加按钮类（button class）即可使用 Bootstrap 提供的样式 <a class="btn btn-default" href="" role="button">Link</a><button class="btn btn-default" type="submit">Button</button><input class="btn btn-default" type="button" value="Input"><input class="btn btn-default" type="submit" value="Submit"> 7.2 预定义样式 <button type="button" class="btn btn-default">（默认样式）Default</button><button type="button" class="btn btn-primary">（首选项）Primary</button><button type="button" class="btn btn-success">（成功）Success</button><button type="button" class="btn btn-info">（一般信息）Info</button><button type="button" class="btn btn-warning">（警告）Warning</button><button type="button" class="btn btn-danger">（危险）Danger</button><button type="button" class="btn btn-link">（链接）Link</button> 7.3 尺寸需要让按钮具有不同尺寸吗？使用 .btn-lg、.btn-sm 或 .btn-xs 就可以获得不同尺寸的按钮。通过给按钮添加 .btn-block 类可以将其拉伸至父元素100%的宽度，而且按钮也变为了块级（block）元素。 7.4 激活状态添加 .active 类 7.5 禁用状态为 <button> 元素添加 disabled 属性，使其表现出禁用状态。为基于 <a> 元素创建的按钮添加 .disabled 类。 8 图片 8.1 响应式图片在 Bootstrap 版本 3 中，通过为图片添加 .img-responsive 类可以让图片支持响应式布局。其实质是为图片设置了 max-width: 100%;、 height: auto; 和 display: block; 属性，从而让图片在其父元素中更好的缩放。如果需要让使用了 .img-responsive 类的图片水平居中，请使用 .center-block 类，不要用 .text-center <img src="..." class="img-responsive" alt="Responsive image"> 8.2 图片形状 <img src="..." alt="..." class="img-rounded"><img src="..." alt="..." class="img-circle"><img src="..." alt="..." class="img-thumbnail"> 9 辅助类 9.1 文本颜色 <p class="text-muted">...</p><p class="text-primary">...</p><p class="text-success">...</p><p class="text-info">...</p><p class="text-warning">...</p><p class="text-danger">...</p> 9.2 背景色 <p class="bg-primary">...</p><p class="bg-success">...</p><p class="bg-info">...</p><p class="bg-warning">...</p><p class="bg-danger">...</p> 9.3 三角符号 <span class="caret"></span> 9.4 浮动 <div class="pull-left">...</div><div class="pull-right">...</div> 9.5 让内容块居中 <div class="center-block">...</div> 9.6 清除浮动通过为父元素添加 .clearfix 类可以很容易地清除浮动（float） <div class="clearfix">...</div> 9.7 显示或隐藏内容 <div class="show">...</div><div class="hidden">...</div> 9.10 图片替换使用 .text-hide 类或对应的 mixin 可以用来将元素的文本内容替换为一张背景图。 <h1 class="text-hide">Custom heading</h1> 10 响应式工具 10.1 不同视口下隐藏显示 .visible-xs- .visible-sm- .visible-md- .visible-lg- .hidden-xs .hidden-sm .hidden-md .hidden-lg.visible--block .visible--inline .visible--inline-block 10.2 打印类 .visible-print-block.visible-print-inline.visible-print-inline-block.hidden-print 打印机下隐藏本篇文章为转载内容。原文链接：https://blog.csdn.net/m0_67155975/article/details/123351126。该文由互联网用户投稿提供，文中观点代表作者本人意见，并不代表本站的立场。作为信息平台，本站仅提供文章转载服务，并不拥有其所有权，也不对文章内容的真实性、准确性和合法性承担责任。如发现本文存在侵权、违法、违规或事实不符的情况，请及时联系我们，我们将第一时间进行核实并删除相应内容。

2023-10-18 14:41:25

150

转载

Java

java中受保护和友好的区别

...protected、default和private。其中，protected和default被称为保护和默认访问权限修饰符。 public class Animal { public String name; protected int age; String color; private String type; } 上述代码中，Animal类中定义了四个属性，分别是public种类的name、protected种类的age、default种类的color和private种类的type。其中protected种类的age和default种类的color就是我们涉及的保护和默认访问权限修饰符。保护访问权限修饰符 protected访问权限修饰符只能在相同包内或子类中访问，其他类不允许访问。使用protected修饰的属性或函数可以被派生类继承并在派生类内访问。 public class Dog extends Animal { public void bark() { System.out.println("汪汪汪"); System.out.println("我叫" + name); System.out.println("我今年" + age + "岁"); System.out.println("我的皮毛颜色是" + color); //System.out.println("我的种类是" + type); //引发错误，不允许访问基类内private属性 } } 在上述代码中，Dog类从...继承Animal类，能够访问Animal类中保护种类的age属性，同时也能够访问Animal类中的public和default种类的属性。默认访问权限修饰符 default访问权限修饰符就是无明确修饰符，即其他访问控制符都没有写。使用default修饰的属性或函数可以在相同包内访问，但不同包内的类不允许访问。 package com.example; public class Cat { public void meow() { Animal animal = new Animal(); System.out.println("喵喵喵"); System.out.println("我叫" + animal.name); //System.out.println("我今年" + animal.age); //引发错误，不允许访问基类内protected属性 System.out.println("我的皮毛颜色是" + animal.color); //System.out.println("我的种类是" + animal.type); //引发错误，不允许访问基类内private属性 } } 在上述代码中，Cat类和Animal类在相同包内，可以访问Animal类中的default和public属性，但不允许访问Animal类中的protected和private属性。

2023-05-18 18:06:08

371

键盘勇士

Java

java中模块和类模块的区别

...herModule exports com.example.mymodule; // 表示导出com.example.mymodule这个包 } // 类组件示例 public class MyClass { private int x; public void printX() { System.out.println(x); } } 总的来说，在Java中，组件和类组件的差异是很明显的。组件是一组相关的类和接口的组合，并且能够倚赖别的组件和导出接口和代码。而一个类组件只是一个独立的类或接口，不能够倚赖别的类组件和导出接口和代码。

2023-01-11 20:51:19

578

代码侠

ClickHouse

ClickHouse表的自动增长列错误：在数据分析场景下的插入数据问题与默认值解决方案

...ue UInt32 DEFAULT 0, name String ) ENGINE = MergeTree() ORDER BY id; 在这个例子中，value列的默认值被设置为了0，这样我们就无需在插入数据时手动指定它的值了。 2. 插入完整数据另一种避免这种错误的方法是在插入数据时提供所有列的值。例如： sql INSERT INTO test (id, value, name) VALUES (1, 0, 'test'); 在这个例子中，我们在插入数据时提供了value列的值，因此ClickHouse不会抛出错误。四、总结通过以上分析，我们可以看出“表的列出现自动增长错误”实际上是因为我们在插入数据时不提供完整的信息导致的。要搞定这个问题，关键点在于得把所有列的数值都清清楚楚地填上，或者，对于那种会自动增长的列，给它设定一个默认的初始值就搞定了。只要我们遵循这些规则，就可以有效地避免这个错误。五、建议在使用ClickHouse进行数据分析时，我们应该始终注意保持数据的一致性和完整性。这不仅能让我们彻底告别“表的列自动增长出错”的烦恼，更能实实在在地提升咱们的工作效率，让数据分析的质量蹭蹭上涨。六、结语 ClickHouse是一款强大的实时数据分析工具，但是在使用它的时候也会遇到各种各样的问题。不过，只要我们把这些小问题背后的“猫腻”摸清楚，再掌握几招解决它们的窍门，那咱们就能更溜地运用ClickHouse，让它帮咱们把数据分析的事儿做得妥妥的。

2023-07-20 08:25:08

553

林中小径-t

Docker

docker无法下载镜像(群晖docker无法下载镜像)

...inx Using default tag: latest Error response from daemon: Get https://registry-1.拉取镜像.io/v2/: net/http: request canceled (Client.Timeout exceeded while awaiting headers) 由于这个错误信息的信息比较简单，我首先检查了自己的网络访问，确认自己的网络确实是稳定的。然后，我怀疑是防火墙导致的问题，于是关闭了防火墙。但是，这个问题依然存在。我尝试了许多方法，比如修改拉取镜像的DNS设置、刷新拉取镜像的缓存等等，但都没有效果。最后，我发现这个问题的原因是拉取镜像环境中的一个配置项，即‘registry-mirrors’。 $ 拉取镜像 info ... Registry Mirrors: https://...:/ https://...:/ ... 我的问题是因为registry-mirrors设置了错误的映像库房地址，导致不能获取映像。在我的拉取镜像环境中，registry-mirrors配置文件存放的位置为/etc/拉取镜像/daemon.json。我打开这个文件，发现我的映像库房地址已经被设置为错误的地址。我修改这个地址后，重新运行拉取镜像 pull指令，成功地获取了需要的映像。 $ sudo vim /etc/拉取镜像/daemon.json { "registry-mirrors": ["https://registry.拉取镜像-cn.com"] } 总之，这个问题还是比较诡异的，因为我并没有修改什么拉取镜像的配置项，却产生了这样的问题。如果你也碰到了类似的问题，可以先检查一下映像库房地址是否正确，或者检查拉取镜像的一些其他配置项。

2023-04-18 10:38:27

371

算法侠

Linux

Linux环境中Python文件导入模块路径的设置与sys.path管理实践

...设置在终端中输入export PYTHONPATH=$PYTHONPATH:/path/to/your/module，其中"/path/to/your/module"是你想要添加的模块路径。 2. 配置文件设置在Python源代码文件中添加一行sys.path.append('/path/to/your/module')，然后运行你的程序。四、Python文件导入模块路径的应用实例接下来，我将通过一个具体的实例，来演示Python文件导入模块路径的使用。假设我们在/home/user目录下有一个名为my_module.py的Python模块，我们需要在另一个Python文件中导入这个模块。首先，我们需要确保Python解释器能够找到这个模块。 1. 设置环境变量在终端中输入export PYTHONPATH=$PYTHONPATH:/home/user，这样Python解释器就可以在/home/user目录下查找my_module.py了。 2. 导入模块在另一个Python文件中，我们可以这样导入my_module.py： python import my_module 如果你在执行这段代码后，没有收到任何错误信息，那么就说明Python已经成功地找到了并导入了my_module.py。五、结论在Linux环境下，理解Python文件导入模块路径的概念以及如何设置它是十分必要的。知道并灵活运用这个概念，就像解锁了一项新技能，能让我们在打理Python项目时更加得心应手，轻松有序地把项目管理得井井有条。以上就是我对Linux环境下的Python文件导入模块路径的理解和实践，希望能对你有所帮助。如果你在实际操作过程中遇到问题，欢迎随时提问。让我们一起学习和进步！

2023-03-09 18:38:16

107

时光倒流_t

MySQL

怎么在mysql中建表6

...ne=InnoDB default charset=Unicode字符集 comment='表备注'; 以上代码是在MySQL中创建表6的基本语法，下面我们详细解释每一行代码的含义。 create table table_name ：表示创建一个名为“table_name”的新建表。 id int unsigned auto_increment primary key comment '主关键字id' ：表示设定一个自动增长的非负整数类型的主关键字id，可以通过“auto_increment”关键字达成自动递增，同时将该列设置为主关键字，并可以为该列加上备注。 column1 字段类型 comment '列1', column2 字段类型 comment '列2', column3 字段类型 comment '列3' ：表示设定三列各自为“列1”、“列2”、“列3”，并可以为每一列加上备注。 engine=InnoDB ：表示使用InnoDB存储引擎来对该表进行存储。 default charset=Unicode字符集：表示设置预设字符编码为Unicode字符集。 comment='表备注' ：表示为该表加上备注。以上是关于如何在MySQL中创建表6的详尽解释，我们按照上述语句结构就可以创建一个新的表，对于实际应用中的表结构设计和字段字段类型选择，需要根据实际情况进行调整。

2023-10-30 22:22:20

117

码农

转载文章

[转载]@CrossOrigin Enabling CORS

...ed=false, defaultValue="World") String name) {System.out.println("==== in greeting ====");return new Greeting(counter.incrementAndGet(), String.format(template, name));} This @CrossOrigin annotation enables cross-origin requests only for this specific method. By default, its allows all origins, all headers, the HTTP methods specified in the @RequestMapping annotation and a maxAge of 30 minutes is used. You can customize this behavior by specifying the value of one of the annotation attributes: origins, methods, allowedHeaders, exposedHeaders, allowCredentials or maxAge. In this example, we only allow http://localhost:8080 to send cross-origin requests. @CrossOrigin注解是被注解的方法具备接受跨域请求的功能。默认情况下，它使方法具备接受所有域，所有请求消息头的请求。。。。这个例子中，我们仅接受 http://localhost:8080发送来的跨域请求。本篇文章为转载内容。原文链接：https://blog.csdn.net/qq_38765404/article/details/78777934。该文由互联网用户投稿提供，文中观点代表作者本人意见，并不代表本站的立场。作为信息平台，本站仅提供文章转载服务，并不拥有其所有权，也不对文章内容的真实性、准确性和合法性承担责任。如发现本文存在侵权、违法、违规或事实不符的情况，请及时联系我们，我们将第一时间进行核实并删除相应内容。

2023-11-11 12:31:12

330

转载

JQuery

jquery插件都怎么写

...xtend({}, defaults, options); //循环每个节点 this.each(function(){ //编写扩展功能 //... }); //回馈当前的jQuery对象以便连续调用 return this; } //预设选项 var defaults = { option1: value1, option2: value2, //... } 代码中使用$.fn来增加jQuery集合，其中myPlugin是插件名。options参数接收用户传入的参数，并与预设选项进行整合。this代表当前的选中的节点，使用each方法循环每个节点，为每个节点编写扩展功能。最后，回馈当前的jQuery对象以便连续调用。以上是一个简单的框架，我们可以根据实际需要进行修改和增加。下面是一个实现点击按钮使文本加粗的例子： //声明插件 $.fn.bold = function(){ //循环每个节点 this.each(function(){ //获取当前的节点 var $this = $(this); //添加点击事件 $this.on('click', function(){ //判断当前的状态 if($this.css('font-weight') == "bold"){ $this.css('font-weight', 'normal'); }else{ $this.css('font-weight', 'bold'); } }); }); //回馈当前的jQuery对象以便连续调用 return this; } 代码中声明了一个名为bold的插件，使用on方法为节点添加了点击事件。点击事件里判断当前的节点是否加粗，然后切换加粗状态。最后，回馈当前的jQuery对象以便连续调用。

2023-12-24 23:53:36

419

程序媛

转载文章

[转载]MySql数据库报错SQLSTATE[HY000]: General error: 1364 Field ‘xxxxx‘ doesn‘t have a default value解决方案

...'t have a default value 后来发现是配置文件中有一个值默认出错最终找到办法，就是mysql设置的问题，有my.ini的就找这个文件，没有的就找my.cnf（这个一般都在/ect/my.conf）本作者使用的CentOS7.6系统：然后打开MySql配置文件然后找到[MySql] 然后找 sql-mode=STRICT_TRANS_TABLESNO_ENGINE_SUBSTITUTION 问题原因：主要是MySQL使用了严格验证方式：解决方法：直接把sql-mode模式改变下这个可能你我的不相同，你只要找到sql-mode 就好然后把这句删掉，改成： sql-mode=NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION 然后在重启数据库 service mysqld restart 完美解决更多教程：www.zcxsmart.com 本篇文章为转载内容。原文链接：https://blog.csdn.net/LizmWintac/article/details/126901852。该文由互联网用户投稿提供，文中观点代表作者本人意见，并不代表本站的立场。作为信息平台，本站仅提供文章转载服务，并不拥有其所有权，也不对文章内容的真实性、准确性和合法性承担责任。如发现本文存在侵权、违法、违规或事实不符的情况，请及时联系我们，我们将第一时间进行核实并删除相应内容。

2023-12-02 23:16:25

289

转载

Apache Solr

Solr JVM调优实践：优化堆内存、垃圾收集器与线程池参数以降低内存占用

...olr.in.sh export JAVA_HOME=/path/to/java export SOLR_HOME=/path/to/solr export CLASSPATH=$SOLR_HOME/bin/bootstrap.jar:$SOLR_HOME/bin/solr.jar export CATALINA_OPTS="-server -Xms4g -Xmx8g" 2. 调整垃圾收集器的参数垃圾收集器是负责回收Java程序中不再使用的内存的部分。在Solr中，可以通过修改solr.in.sh文件中的-XX:+UseConcMarkSweepGC参数来启用并发标记清除算法，这种算法可以在不影响程序运行的情况下，高效地回收无用内存。 bash solr.in.sh export JAVA_HOME=/path/to/java export SOLR_HOME=/path/to/solr export CLASSPATH=$SOLR_HOME/bin/bootstrap.jar:$SOLR_HOME/bin/solr.jar export CATALINA_OPTS="-server -XX:+UseConcMarkSweepGC" 3. 调整线程池的参数线程池是Java程序中用于管理和调度线程的工具。在使用Solr的时候，如果你想要提升垃圾回收的效率，有个小窍门可以试试。你只需打开solr.in.sh这个配置文件，找到其中关于-XX:ParallelGCThreads的参数，然后对它进行修改，就可以调整并行垃圾收集线程的数量了。这样一来，Solr就能调动更多的“小工”同时进行垃圾清理工作，从而让你的系统运行更加流畅、高效。 bash solr.in.sh export JAVA_HOME=/path/to/java export SOLR_HOME=/path/to/solr export CLASSPATH=$SOLR_HOME/bin/bootstrap.jar:$SOLR_HOME/bin/solr.jar export CATALINA_OPTS="-server -XX:+UseConcMarkSweepGC -XX:ParallelGCThreads=4" 4. 配置JVM的其他参数除了上述参数外，还可以通过其他一些JVM参数来进一步优化Solr的性能。比如说，我们可以调整一个叫-XX:MaxTenuringThreshold的参数，这个参数就像个开关一样，能控制对象从年轻代晋升到老年代的“毕业标准”。这样一来，就能有效降低垃圾回收的频率，让程序运行更加流畅。 bash solr.in.sh export JAVA_HOME=/path/to/java export SOLR_HOME=/path/to/solr export CLASSPATH=$SOLR_HOME/bin/bootstrap.jar:$SOLR_HOME/bin/solr.jar export CATALINA_OPTS="-server -XX:+UseConcMarkSweepGC -XX:ParallelGCThreads=4 -XX:MaxTenuringThreshold=8" 五、结论通过以上的JVM调优技巧，我们可以有效地降低Solr的内存占用，从而提高其运行效率和性能。不过要注意，不同的使用场景可能需要咱们采取不同的优化招数。所以，在实际操作时，我们得像变戏法一样，根据实际情况灵活调整策略，才能把事情做得更漂亮。

2023-01-02 12:22:14

468

飞鸟与鱼-t

转载文章

[转载]全国地址SQL数据文件（精确到区县）

... NOT NULL DEFAULT CURRENT_TIMESTAMP,last_modified_date datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,display_order int(11) DEFAULT NULL,name varchar(100) COLLATE utf8_unicode_ci NOT NULL,pid bigint(20) DEFAULT NULL,PRIMARY KEY (id),KEY FK_Reference_02 (pid),CONSTRAINT com_area_ibfk_1 FOREIGN KEY (pid) REFERENCES com_area (id)) ENGINE=InnoDB AUTO_INCREMENT=3924 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;-- ------------------------------ Records of com_area-- ----------------------------INSERT INTO com_area VALUES ('1', '2016-10-29 08:07:39', '2016-10-29 08:07:39', '0', '1', null);INSERT INTO com_area VALUES ('2', '2016-10-29 08:07:44', '2016-10-29 08:07:44', '110000', '北京市', '1');INSERT INTO com_area VALUES ('3', '2016-10-29 08:07:44', '2016-10-29 08:07:44', '110101', '东城区', '2');...... 下载地址： http://download.csdn.net/detail/wangfei0904306/9748322 本篇文章为转载内容。原文链接：https://blog.csdn.net/wangfei0904306/article/details/54895475。该文由互联网用户投稿提供，文中观点代表作者本人意见，并不代表本站的立场。作为信息平台，本站仅提供文章转载服务，并不拥有其所有权，也不对文章内容的真实性、准确性和合法性承担责任。如发现本文存在侵权、违法、违规或事实不符的情况，请及时联系我们，我们将第一时间进行核实并删除相应内容。

2023-06-30 09:11:08

转载

Impala

Efficient Data Import & Export with Impala: Leveraging CSV Files, HDFS Compression, and Partitioning for Enhanced SQL Query Processing in Big Data Scenarios

这篇文章针对Impala的数据导入与导出，详细介绍了如何高效操作。首先，通过SQL命令将CSV等格式文件导入Impala表中，并展示了如何从Impala表导出数据至CSV文件。为提升效率，文章提出了两种实用方法：一是利用HDFS进行大文件压缩传输，有效降低网络带宽需求；二是运用Impala的分区功能对大表进行分割，以加快数据的导入和导出速度。这些技巧旨在帮助用户在大数据处理场景下，借助Impala优化SQL查询性能，实现更高效的数据流转管理。

2023-10-21 15:37:24

511

梦幻星空-t

Groovy

如何在Groovy中使用闭包作为函数的返回值：实例详解

...; a - b } default: return { a, b -> a b } // 默认为乘法操作 } } def add = getOperation('add') def subtract = getOperation('subtract') def multiply = getOperation('multiply') // 注意这里会触发默认情况 println(add(5, 3)) // 输出: 8 println(subtract(5, 3)) // 输出: 2 println(multiply(5, 3)) // 输出: 15 在这个例子中，我们定义了一个getOperation函数，它根据传入的操作类型返回不同的闭包。这样，我们就可以动态地选择执行哪种操作，而无需通过if-else语句来判断了。这种方法不仅使代码更简洁，也更容易扩展。 4. 小结与思考通过以上几个例子，相信你已经对如何在Groovy中使用闭包作为返回值有了一个基本的理解。闭包作为一种强大的工具，不仅可以帮助我们封装逻辑，还能让我们以一种更灵活的方式组织代码。嘿，话说回来，闭包这玩意儿确实挺强大的，但你要是用得太多，就会搞得代码一团乱，别人看着也头疼，自己以后再看可能也会懵圈。所以啊，在用闭包的时候，咱们得好好想想，确保它们真的能让代码变好，而不是捣乱。希望今天的分享对你有所帮助！如果你有任何疑问或者想了解更多关于Groovy的知识，请随时留言交流。让我们一起探索更多编程的乐趣吧！ --- 这篇文章旨在通过具体的例子和口语化的表达方式，帮助读者更好地理解和应用Groovy中的闭包作为返回值的概念。希望这样的内容能让学习过程更加生动有趣！

2024-12-16 15:43:22

148

人生如戏

Kibana

Kibana中数据展示问题的精确解决策略：从Elasticsearch数据源、配置到字段类型匹配与缺失值处理

....extend({ defaults: { type: 'chart', title: 'Events Over Time' }, init: function(params) { this.valueField = params.value_field || 'value'; this.timeField = params.time_field || 'time'; }, render: function() { return {renderChart(this.data)} ; }, data: function() { var events = this.state.events; return [{ key: 'data', values: events.map(function(event) { return [new Date(event[this.timeField]), event[this.valueField]]; }, this) }]; } }); 2. 问题数据显示错误解决方案：检查Kibana配置，确保你已经正确地设置了时间字段，确

2023-06-30 08:50:55

317

半夏微凉-t

Kafka

SASL身份验证与授权机制在Kafka中的应用：配置参数、安全连接及资源保护实操

...rizer"], "default_acls": [ { "host": "", "operation": "[\"DescribeTopics\",\"CreateTopics\"]", "permission_type": "Allow", "principal": "User:Alice" }, { "host": "", "operation": "[\"DescribeGroups\",\"ListConsumer\",\"DescribeConsumer\"]", "permission_type": "Deny", "principal": "User:Bob" } ] } 在这个示例中，Alice被允许创建和描述主题，而Bob则被拒绝执行这些操作。六、结论 SASL身份验证和授权是保护Kafka资源的重要手段。要是把SASL给整对了，咱们就能妥妥地挡掉那些没经过许可就想偷偷摸摸访问和操作的小动作。在实际操作的时候，我们得看情况，瞅准需求和环境，像变戏法一样灵活挑选并设置SASL的各种参数和选项。七、小结希望通过这篇文章，你能更好地了解如何通过SASL身份验证和授权来保护Kafka资源。如果你还有任何问题，欢迎留言交流。让我们一起探索更多有趣的Kafka知识！

2023-09-20 20:50:41

482

追梦人-t

DorisDB

DorisDB中用户与角色权限管理实践：从设置SELECT、INSERT权限到密码加密保障数据安全

... DATABASE default TO user1; SET ROLE NONE; 上述命令首先创建了一个名为admin的角色，然后创建了一个名为user1的用户，并将其分配给了admin角色。最后，我们将用户user1授权为默认数据库的所有者。要查看用户分配的角色，请使用以下命令： sql SHOW ROLES; 如果要查看某个角色拥有的所有权限，请使用以下命令： sql SHOW GRANTS FOR ROLE admin; 3. 权限管理在DorisDB中，我们可以使用GRANT和REVOKE语句来管理和控制用户的权限。例如，如果我们想要撤销用户user1在my_table上的SELECT权限，可以使用以下命令： sql REVOKE SELECT ON TABLE my_table FROM user1; 同样，我们也可以使用GRANT语句来授予用户新的权限。例如，如果我们想要授予用户user1在my_table上的INSERT权限，可以使用以下命令： sql GRANT INSERT ON TABLE my_table TO user1; 4. 安全设置在DorisDB中，除了管理用户权限之外，还需要注意安全设置。比如，我们可以用ENCRYPTED PASSWORD这个小功能，给用户的密码加上一层保护壳，这样一来，安全性就大大提升了，就像是给密码穿了件防弹衣一样。此外，我们还可以使用防火墙等工具来限制对DorisDB的访问。总的来说，DorisDB提供了一套强大的用户权限管理系统，可以帮助我们有效地管理和保护数据安全。希望本文能对你有所帮助！

2024-01-22 13:14:46

454

春暖花开-t

Saiku

Saiku报表导出至Excel时样式丢失问题：原因分析与CSS类、JavaScript动态加载的解决方案及VBA宏修复方法

...u 自带了一个名为“Export to Excel”的功能，可以方便地将报表导出为 Excel 文件。在这一整个过程中，Saiku这家伙可机灵了，它会主动帮咱们把所有和样式有关的小细节都给妥妥地搞定，这样一来，我们就完全不必为丢失样式的问题而头疼啦！以下是使用 Saiku 导出报表的代码示例： javascript saiku.model.exportToXLSX(); 这个函数会直接将当前报表导出为一个名为“report.xlsx”的 Excel 文件，文件中包含了所有的数据和样式。 3.2 方法二：手动修改 Excel 文件如果我们必须使用 Excel 进行导出，那么我们可以尝试手动修改 Excel 文件，使其包含正确的样式信息。以下是一个简单的示例，展示了如何通过 VBA 宏来修复样式丢失的问题： vba Sub FixStyle() ' 找到所有丢失样式的单元格 Dim rng As Range Set rng = ActiveSheet.UsedRange For Each cell In rng If cell.Font.Bold Then cell.Font.Bold = False End If If cell.Font.Italic Then cell.Font.Italic = False End If ' 添加其他样式... Next cell End Sub 这段代码会在 Excel 中遍历所有已使用的单元格，然后检查它们是否缺少某些样式。如果发现了缺失的样式，那么就将其添加回来。四、结论总的来说，Saiku 报表导出为 Excel 格式时丢失样式设置，主要是因为 Excel 不支持动态加载的 CSS 类。不过呢，咱其实有办法解决这个问题的。要么试试看用 Saiku 的那个导出功能，它能帮上忙；要么就亲自操刀，手动修改一下 Excel 文件，这样也行得通。这两种方法各有优缺点，具体选择哪种方法取决于我们的需求和实际情况。

2023-10-07 10:17:51

繁华落尽-t

c++

CMakeLists.txt在CMake构建系统中的作用：配置源代码编译、管理依赖关系与静态库、动态链接的实现

...set(CMAKE_EXPORT_COMPILE_COMMANDS ON) file(GLOB_RECURSE SOURCES ".cpp") add_library(mylib STATIC ${SOURCES}) 以上代码会将所有的.cpp文件编译成一个静态库，并将其命名为mylib.a。 2. 指定编译选项我们还可以通过CMakeLists.txt文件来指定编译选项，如优化级别、警告级别等。例如，如果我们要开启编译器的所有警告，可以在CMakeLists.txt文件中添加以下指令： set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -Wextra") 以上代码会在编译C++代码时开启所有警告。 3. 定义依赖关系除了上面提到的一些基本功能之外，CMakeLists.txt文件还可以用来定义项目的依赖关系。比方说，假设我们有个库叫A，而恰好有个库B对它特别依赖，就像大树离不开土壤一样。那么，为了让这两个库能够和谐共处，互相明白对方的需求，我们就可以在CMakeLists.txt这个“说明书”里，详细地写清楚它们之间的这种依赖关系，就像是画出一张谁也离不开谁的地图一样。具体做法如下： find_package(A REQUIRED) target_link_libraries(B PRIVATE A::A) 以上代码会查找名为A的库，并确保B的目标链接了该库。四、总结总的来说，CMakeLists.txt是一个非常强大的工具，它可以帮助我们更好地管理和构建C++项目。当你真正地钻透它，并且灵活玩转，就能让咱们的C++项目跑得更溜、更稳当、更靠谱。

2024-01-03 23:32:17

429

灵动之光_t

Go-Spring

Go-Spring框架下SQL查询语句无效语法问题的排查与修复：使用GORM ORM、预编译SQL及日志调试实践

...gorm.Get("default") user := User{Username: "test", Password: "password"} db.Create(&user) // 此处假设数据库表结构正确，若SQL语法有误，将抛出Invalid syntax错误 } 3. SQL查询中的常见无效语法问题及其解决方案 3.1 单引号未正确闭合在编写包含字符串的SQL查询时，单引号是非常容易出错的地方。比如： sql SELECT FROM users WHERE username = 'test; 上述SQL语句中，由于单引号未闭合，因此会引发"Invalid syntax"错误。修正后的版本应为： sql SELECT FROM users WHERE username = 'test'; 3.2 缺少必要的关键字或运算符假设我们在Go-Spring中构建如下查询： go db.Where("username = test").Find(&users) 这段代码会导致SQL语法错误，因为我们在比较字符串时没有使用等号两侧的引号。正确的写法应该是： go db.Where("username = ?", "test").Find(&users) 4. Go-Spring中调试和预防SQL无效语法的方法 4.1 使用预编译SQL Go-Spring通过其集成的ORM库如GORM，可以支持预编译SQL，从而减少因语法错误导致的问题。例如： go stmt := db.Statement.Create.Table("users").Where("username = ?", "test") db.Exec(stmt.SQL, stmt.Vars...) 4.2 日志记录与审查开启Go-Spring的SQL日志记录功能，可以帮助我们实时查看实际执行的SQL语句，及时发现并纠正语法错误。 5. 结语面对“Invalid syntax in SQL query”这个看似棘手的问题，理解其背后的原因并掌握相应的排查技巧至关重要。在使用Go-Spring这个框架时，配上一把锋利的ORM工具，再加上咱们滴严谨编程习惯，完全可以轻松把这类问题扼杀在摇篮里，让咱对数据库的操作溜得飞起，效率蹭蹭上涨！下次再遇到此类问题时，希望你能快速定位，从容应对，就如同解开一道有趣的谜题般充满成就感！

2023-07-20 11:25:54

454

时光倒流

Impala

并发查询性能实测：Impala在分布式数据库系统中的SQL兼容性与资源利用率优化

...database='default') 创建游标 cur = conn.cursor() 执行查询 for i in range(10): cur.execute("SELECT FROM my_table LIMIT 10") 关闭连接 cur.close() conn.close() 我们可以运行这个脚本，在不同的查询线程数量下，重复测试几次，然后计算平均查询时间，以此来评估并发查询性能。 4. 实际应用中的并发查询性能在实际的应用中，我们通常会遇到一些挑战，例如查询结果需要满足一定的精度，或者查询需要考虑到性能和资源之间的平衡等。在这种情况下，我们需要对并发查询性能有一个深入的理解。比如，在上面那个Python代码里头，如果我们想要让查询跑得更快、更溜些，我们完全可以尝试增加查询线程的数量，这样就能提高整体的性能表现。但是，如果我们光盯着查询的准确性，却对资源消耗情况视而不见，那么就有可能遇到查询半天没反应或者内存撑爆了这样的麻烦事儿。 5. 总结对于Impala的并发查询性能，我们可以从理论和实践两个方面来进行评估。从实际情况来看，Impala这家伙真的很擅长同时处理多个查询任务，这主要是因为在设计它的时候，就已经充分考虑到了并行处理的需求，让它在这方面表现得相当出色。然而，在实际操作时，咱们得灵活点儿，根据实际情况因地制宜地调整并发查询的那些参数设置，这样才能让性能跑到最优，资源利用率达到最高。总的来说，Impala这家伙处理并发查询的能力那可真是杠杠的，实打实的优秀。咱们在日常工作中绝对值得尝试一把，把它运用起来，效果肯定错不了。

2023-08-25 17:00:28

807

烟雨江南-t

知识学习

实践的时候请根据实际情况谨慎操作。

随机学习一条linux命令：

pgrep process_name - 查找与进程名匹配的进程ID。