编码与解码及乱码处理方案

编码与解码.png

码表：

码表	解释
ASCII	美国标准信息交换码。用一个字节的7位可以表示。 -128~127 256
ISO8859-1	拉丁码表。欧洲码表，用一个字节的8位表示。又称Latin-1(拉丁编码)或者“西欧语言”。ASCII码是包含的仅仅是英文字母，并且没有完全占满256个编码位置，所以它以ASCII为基础，在空置的0xA0-0xFF的范围内，加入192个字母及符号，藉以供使用变音符号的拉丁字母语言使用。从而支持德文，法文等。因此它仍然是一个单字节编码，只是比ASCII更全面。
GB2312	中国的中文编码表。英文占一个字节，中文占两个字节。
GBK	中国的中文编码表更新，融合了更多的中文文字符号。
Unicode	国际标准码，融合了多种文字。所有文字都用两个字节来表示，Java语言使用的就是unicode。
UTF-8	英文占一个字节，中文占三个字节。最多用三个字节来表示一个字符。
UTF-16	英文中文都是占两个字节。

注意：Unicode不是一个码表，只是一个规范。

一、编码

编码：把看得懂的字符变成看不懂码值这个过程我们称作为编码。

字符串--->字节数组
String类的getBytes() 方法进行编码，将字符串转为对应的二进制，并且这个方法可以指定编码表。假如没有指定码表，该方法会使用操作系统默认码表。

注意：中国大陆的Windows系统上默认的编码一般为GBK。在Java程序中可以使用System.getProperty("file.encoding")方式得到当前的默认编码。

二、解码

解码：把码值查找对应的字符，我们把这个过程称作为解码。
字节数组--->字符串
String类的构造函数完成。
String(byte[] bytes) 使用系统默认码表
String(byte[],charset)指定码表

注意：我们使用什么字符集（码表）进行编码，就应该使用什么字符集进行解码，否则很有可能出现乱码（兼容字符集不会）。

public class Demo7 {        public static void main(String[] args) throws Exception {        /*        String str = "中国";        // getBytes() ：使用的是平台默认的编码表---gbk编码表。 一个中文占两个字节        byte[] buf = str.getBytes("utf-8"); //编码过程        System.out.println("数组的元素："+Arrays.toString(buf));                 str = new String(buf,"utf-8");  //默认使用了gbk码表去解码。 假如解码过程与编码是不一样的码表，就会产生乱码        System.out.println("解码后的字符串："+ str);                */                        /*String str = "a中国"; //[-2,-1,0,97,78,45,86,-3]        String str = "中国";//[-2,-1,78,45,86,-3]        byte[] buf = str.getBytes("unicode");  //编码与解码的时候指定的码表是unicode，实际上是用了utf-16.        System.out.println("数组的内容："+ Arrays.toString(buf)); //[-2,-1,0,97,78,45,86,-3]        */        //-2和-1是utf-16自己另外加的，作为UTF-16的标志                String str = "大家好";        byte[] buf = str.getBytes(); //使用平台默认的编码gbk进行编码，但是要以文件本身的编码为准                System.out.println("字节数组："+ Arrays.toString(buf));  // -76, -13, -68, -46, -70, -61                str = new String(buf,"iso8859-1"); //乱码 ?¤§????￥?        //在iso8859-1码表中，每个数字都有对应不同的字符，是唯逐个个填满了的码表                // 复原：使用这一串特殊的字符找回之前的数字，再使用gbk进行编码        byte[] buf2 = str.getBytes("iso8859-1");        str = new String(buf2,"gbk");         System.out.println(str);    }    }

注意：编码与解码一般都使用统一的码表。否则非常容易出乱码。

不是所有的乱码都可以复原的：
以上情况拿着数字去iso8859-1去找，由于该码表每个数字都有对应的字符，但是假如对于少量其余码表，对应的数字没有该字符，就会造成数据丢失。

三、乱码处理方案

中文乱码问题是编码不一致导致的，只需保证了前台（页面使用meta标记utf-8），后台(对参数的解析、与连接库的连接)，和数据库（数据库的编码格式）都使用统一的编码，一般不会出现乱码问题。

1.查看页面能否使用utf-8编码

①JSP页面

<%@ page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>

②HTML页面

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

2.在数据库连接url后面加上unicode参数

①使用的是非properties文件（如在hibernate.cfg.xml中配置）：

<property name="hibernate.connection.url">     jdbc:mysql://localhost:3306/数据库名?useUnicode=true&amp;characterEncoding=UTF-8</property>

②使用的是properties文件

jdbcUrl=jdbc:mysql://localhost:3306/数据库名?useUnicode=true&characterEncoding=UTF-8

注意：假如使用的是properties文件配置数据库的连接信息，参数连接要用&，不要使用& amp;

3.在struts.xml中使用国际化

<constant name="struts.i18n.encoding" value="UTF-8" />

4.使用全局中文乱码过滤器

GlobalFilter类：

/** * 全局中文过滤器 * 适用get和post请求参数的中文乱码问题，从此不同在servlet中对参数做解决 */public class GlobalFilter implements Filter {    @Override    public void destroy() {    }    @Override    public void doFilter(ServletRequest request, ServletResponse response,            FilterChain chain) throws IOException, ServletException {        HttpServletRequest req = (HttpServletRequest) request;        // 处理POST请求参数乱码问题        // request.setCharacterEncoding("UTF-8");        req = new MyRequest(req);        chain.doFilter(req, response);    }    @Override    public void init(FilterConfig filterConfig) throws ServletException {    }}

MyRequest:

/** * 使用装饰模式包装HttpServletRequest，处理getParamter中文乱码问题 */class MyRequest extends HttpServletRequestWrapper {    private HttpServletRequest req;    private boolean flag = true;// 标记能否getParameterMap方法还未被调用过（假如在同个servlet中调用了2次getParameter等方法2次，没有用flag做标记的话，会对参数进行2次编码，结果第2次得到的参数会是乱码）    public MyRequest(HttpServletRequest request) {        super(request);        req = request;    }    @Override    public String getParameter(String name) {        return getParameterMap().get(name)[0];    }    @Override    public String[] getParameterValues(String name) {        return getParameterMap().get(name);    }    @Override    public Map<String, String[]> getParameterMap() {        Map<String, String[]> map = req.getParameterMap();        if (flag) {            for (Map.Entry<String, String[]> entry : map.entrySet()) {                String[] value = entry.getValue();                for (int i = 0; i < value.length; i++) {                    try {                        value[i] = new String(value[i].getBytes("iso-8859-1"),                                "UTF-8");                    } catch (UnsupportedEncodingException e) {                        e.printStackTrace();                    }                }            }            flag = false;        }        return map;    }}

web.xml:

<?xml version="1.0" encoding="UTF-8"?><web-app version="3.0" xmlns="http://java.sun.com/xml/ns/javaee"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/web-app_3_0.xsd">    <!-- filter要放在servlet之前 -->    <filter>        <filter-name>GlobalFilter</filter-name>        <filter-class>com.java.filter.GlobalFilter</filter-class>    </filter>    <filter-mapping>        <filter-name>GlobalFilter</filter-name>        <url-pattern>/*</url-pattern>    </filter-mapping>    <servlet>        ...    </servlet>    <servlet-mapping>        ...    </servlet-mapping></web-app>